Jens O. Meiert | 1 Feb 2010 06:34
Favicon
Gravatar

Re: Hi

> Error [html5]: "Internal encoding declaration windows-1251 disagrees
> with the actual encoding of the document (utf-8)."
> the actual encoding is windows-1251 it's in server responce header and
> in meta tag.

Where does this error show up (could not reproduce it [1,2])?

What’s more, in the document itself it says “<meta
http-equiv="Content-Type" content="text/html; charset=utf-8">”.

[1] http://validator.w3.org/check?uri=http%3A%2F%2Fwww.magazun.com%2F&charset=%28detect+automatically%29&doctype=HTML5&group=0&user-agent=W3C_Validator%2F1.654
[2] http://validator.nu/?doc=http%3A%2F%2Fwww.magazun.com%2F

--

-- 
Jens Meiert
http://meiert.com/en/

Leif Halvard Silli | 1 Feb 2010 07:19
Picon

HTML4 + <script><![CDATA[ </ENDTAG> ]]></script>

The validator doesn't consider the following code as valid HTML4 (HTML 
four):

<script type="text/javascript">//<![CDATA[
    document.write("<aa><bb></bb></aa>");
//]]></script>

At the same time, this is considered valid HTML 4 (four) (but invalid 
HTML5 (five)):

<p><![CDATA[
<aa><bb></bb></aa>
]]></p>

Both code examples should be considered valid HTML4 (four) syntax.

There are 3 reasons why this bug is important to fix:

(1) That the validator wrongly stamps the first example as invalid 
creates the impression that it is very difficult to embed javascript in 
a way that is valid both inside XHTML and inside HTML4. 

(2) In addition, it is also useful within HTML4! Because: the HTML4 
specification (as well as the validator) requires that end tags inside 
the <script> element are escaped - in order to be valid SGML. The HTML4 
spec gives the following example as example of _one_ way that one can 
escape the code so that the code is valid SGML both before and after 
script execution: "<\/b>". *However*, the <![CDATA[ ... ]]> syntax for 
marking up a section where escaping is not necessary is documented in 
the HTML4 specification as well. And is safe - and much simpler - to 
(Continue reading)

David Dorward | 1 Feb 2010 08:56
Picon
Favicon
Gravatar

Re: HTML4 + <script><![CDATA[ </ENDTAG> ]]></script>


On 1 Feb 2010, at 06:19, Leif Halvard Silli wrote:

> The validator doesn't consider the following code as valid HTML4 (HTML 
> four):
> 
> <script type="text/javascript">//<![CDATA[
>   document.write("<aa><bb></bb></aa>");
> //]]></script>

Since <script> elements are defined as containing CDATA, I assume the <![CDATA marker is (supposed to be)
treated as character data and not markup. The </ of </bb> is then considered to be an end tag which fails to
match the opening <script> tag.

> At the same time, this is considered valid HTML 4 (four) (but invalid 
> HTML5 (five)):
> 
> <p><![CDATA[
> <aa><bb></bb></aa>
> ]]></p>

In HTML 4 the CDATA flag operates as expected (except in browsers, which don't generally support it).

Meanwhile, HTML5 has its own set of parsing rules that are distinct from those of SGML, so I'm not surprised
that this isn't allowed. 

> There are 3 reasons why this bug is important to fix:
> 
> (1) That the validator wrongly stamps the first example as invalid 
> creates the impression that it is very difficult to embed javascript in 
(Continue reading)

David Dorward | 1 Feb 2010 08:48
Picon
Favicon
Gravatar

Re: HTML4 + <script><![CDATA[ </ENDTAG> ]]></script>


On 1 Feb 2010, at 06:19, Leif Halvard Silli wrote:

> The validator doesn't consider the following code as valid HTML4 (HTML 
> four):
> 
> <script type="text/javascript">//<![CDATA[
>    document.write("<aa><bb></bb></aa>");
> //]]></script>

Since <script> elements are defined as containing CDATA, I assume the <![CDATA marker is (supposed to be)
treated as character data and not markup. The </ of </bb> is then considered to be an end tag which fails to
match the opening <script> tag.

> At the same time, this is considered valid HTML 4 (four) (but invalid 
> HTML5 (five)):
> 
> <p><![CDATA[
> <aa><bb></bb></aa>
> ]]></p>

In HTML 4 the CDATA flag operates as expected (except in browsers, which don't generally support it).

Meanwhile, HTML5 has its own set of parsing rules that are distinct from those of SGML, so I'm not surprised
that this isn't allowed. 

> There are 3 reasons why this bug is important to fix:
> 
> (1) That the validator wrongly stamps the first example as invalid 
> creates the impression that it is very difficult to embed javascript in 
(Continue reading)

David Dorward | 1 Feb 2010 08:55
Picon
Favicon
Gravatar

Re: Self closing <div> is valid, but shouldn't be


On 29 Jan 2010, at 00:02, Briere, William J wrote:

If you enter html via direct input it states that a self closing div is valid, but it shouldn’t be.

I've just tested it. It reports it as invalid.

In XHTML, it reports it as valid, but that is correct (it isn't HTML compatible, but that isn't a matter of validity).

Check out the w3c reference here.http://www.w3schools.com/tags/ref_byfunc.asp

Note that W3Schools is not affiliated with the W3C ... and is generally of low quality.

However I can not see any claim on that page that self-closing tag syntax is forbidden for div elements.


-- 
David Dorward

Leif Halvard Silli | 1 Feb 2010 11:03
Picon

Re: HTML4 + <script><![CDATA[ </ENDTAG> ]]></script>

David Dorward, Mon, 1 Feb 2010 07:56:15 +0000:
> On 1 Feb 2010, at 06:19, Leif Halvard Silli wrote:
> 
>> The validator doesn't consider the following code as valid HTML4 (HTML 
>> four):
>> 
>> <script type="text/javascript">//<![CDATA[
>>   document.write("<aa><bb></bb></aa>");
>> //]]></script>
> 
> Since <script> elements are defined as containing CDATA, I assume the 
> <![CDATA marker is (supposed to be) treated as character data and not 
> markup. The </ of </bb> is then considered to be an end tag which 
> fails to match the opening <script> tag.

So you say that placing "<![CDATA[" there is i valid, but without 
effect on the escaping needs ... 

>> There are 3 reasons why this bug is important to fix:
>> 
>> (1) That the validator wrongly stamps the first example as invalid 
>> creates the impression that it is very difficult to embed javascript in 
>> a way that is valid both inside XHTML and inside HTML4. 
> 
> It is difficult. The HTML compatibility guidelines for XHTML 
> recommend using external scripts.

You meant: "It _is_ difficult", I presume. ;-)

But never the less: This means that the HTML4 parser compatibility 
guidelines are incomplete.

http://www.w3.org/TR/xhtml-media-types/
http://www.w3.org/TR/xhtml1/guidelines.html

Or, why doesn't the compatibility guidelines mention that, for 
embedding, then one should use BOTH "\/" and "<![CDATA[ ...]]>" 
simultaneously? Instead it jumps on to say that you should instead use 
external scripts?!

So: If you develop a script to be embedded freely both in HTML4 
documents as well as in XHTML documents, then you must escape both the 
HTML4 way and the XHTML1 way:

<script type="text/javascript"><![CDATA[
<abc><\/abc>
]]></script>

And, in addition, you should as well take care of the javascript 
interpreter - instead of recommending to not use HTML comments at all, 
like the guidelines does/do, I would recommend this:

<script type="text/javascript"><!---><![CDATA[
document.write('<abc>abc<\/abc>');
<!---->]]></script>

Because, the javascript interpreters doesn't require more than that 
line beings with a "<!--" in order accept that the first line is a 
comment. If the code also ends with a "-->", then it can as well be 
interpreted as a valid HTML comment.

>> (2) In addition, it is also useful within HTML4! Because: the HTML4 
>> specification (as well as the validator) requires that end tags inside 
>> the <script> element are escaped - in order to be valid SGML. The HTML4 
>> spec gives the following example as example of _one_ way that one can 
>> escape the code so that the code is valid SGML both before and after 
>> script execution: "<\/b>".
> 
> Yes

See above.

>> *However*, the <![CDATA[ ... ]]> syntax for 
>> marking up a section where escaping is not necessary is documented in 
>> the HTML4 specification as well.
> 
> But overruled, I believe, by: "Although the STYLE and SCRIPT elements 
> use CDATA for their data model, for these elements, CDATA must be 
> handled differently by user agents. Markup and entities must be 
> treated as raw text" 
> <http://www.w3.org/TR/html4/types.html#type-cdata>

OK, thanks David. Much appreciated. It would really be helpful if the 
HTML4 validator recommended escaping the "\/" instead of (only) telling 
us to use external script files. Or instead of the cryptic message that 
the HTML4 validation serves now gives us. In addition, there should be 
some *XHTML* compatibility guidelines for *HTML4*. ;-)
--

-- 
leif halvard silli

David Dorward | 1 Feb 2010 11:51
Picon
Favicon
Gravatar

Re: HTML4 + <script><![CDATA[ </ENDTAG> ]]></script>


On 1 Feb 2010, at 10:03, Leif Halvard Silli wrote:
>> Since <script> elements are defined as containing CDATA, I assume the 
>> <![CDATA marker is (supposed to be) treated as character data and not 
>> markup. The </ of </bb> is then considered to be an end tag which 
>> fails to match the opening <script> tag.
> 
> So you say that placing "<![CDATA[" there is i valid, but without 
> effect on the escaping needs ... 

As far as I can tell, it should be treated as any other character data. 

i.e. passed to the JS engine where the prefixing // causes it to be handled as an single line comment

> Or, why doesn't the compatibility guidelines mention that, for 
> embedding, then one should use BOTH "\/" and "<![CDATA[ ...]]>" 
> simultaneously?

Because they are concerned with writing XHTML that is compatible with HTML parsers, not writing code that
is both valid XHTML and HTML (which is impossible once you start dealing with XHTML self-closing tag
syntax in the <head>).

> And, in addition, you should as well take care of the javascript 
> interpreter - instead of recommending to not use HTML comments at all, 
> like the guidelines does/do, I would recommend this:
> 
> <script type="text/javascript"><!---><![CDATA[
> document.write('<abc>abc<\/abc>');
> <!---->]]></script>
> 
> Because, the javascript interpreters doesn't require more than that 
> line beings with a "<!--" in order accept that the first line is a 
> comment. If the code also ends with a "-->", then it can as well be 
> interpreted as a valid HTML comment.

... or you could just use a JS comment rather than depending on hacks designed to avoid having Netscape 2 and
friends render JS as text (which would break if the script was placed in an external file).

> OK, thanks David. Much appreciated. It would really be helpful if the 
> HTML4 validator recommended escaping the "\/" instead of (only) telling 
> us to use external script files.

It does.

•  Line 6, Column 30: end tag for element "DIV" which is not open
   document.write('<div></div>');

The Validator found an end tag for the above element, but that element is not currently open. This is often
caused by a leftover end tag from an element that was removed during editing, or by an implicitly closed
element (if you have an error related to an element being used where it is not allowed, this is almost
certainly the case). In the latter case this error will disappear as soon as you fix the original problem.

If this error occurred in a script section of your document, you should probably read this FAQ entry.

From said FAQ entry:

Authors should avoid using strings such as "</P>" in their embedded scripts. In JavaScript, authors may
use a backslash to prevent the string from being parsed as markup:

(Followed by an example)

--

-- 
David Dorward
http://dorward.me.uk

Leif Halvard Silli | 1 Feb 2010 17:42
Picon

Re: HTML4 + <script><![CDATA[ </ENDTAG> ]]></script>

David Dorward, Mon, 1 Feb 2010 10:51:16 +0000:
> On 1 Feb 2010, at 10:03, Leif Halvard Silli wrote:

>> Or, why doesn't the compatibility guidelines mention that, for 
>> embedding, then one should use BOTH "\/" and "<![CDATA[ ...]]>" 
>> simultaneously?
> 
> Because they are concerned with writing XHTML that is compatible with 
> HTML parsers, not writing code that is both valid XHTML and HTML 
> (which is impossible once you start dealing with XHTML self-closing 
> tag syntax in the <head>).

Well, the right hand should care a little bit about what the left hand 
is doing.

Why does the HTML4 validator care about theoretical SGML while the 
XHTML guidelines care about real text/HTML parsers, only? They should 
try to live up to the same standard/reality.

(It is sometimes said that having to wrap the script in <![CDATA[ ]]> 
is a reason to use external scripts [e.g. you said so!]. This thing is 
also been used as argument for saying that it is more difficult to 
embed javascript in XHTML than in HTML4. However, in reality, there is 
even more reason to place HTML4 scripts in external files, due the to 
issue with the \/ escaping.)
 
>> And, in addition, you should as well take care of the javascript 
>> interpreter - instead of recommending to not use HTML comments at all, 
>> like the guidelines does/do, I would recommend this:
>> 
>> <script type="text/javascript"><!---><![CDATA[
>> document.write('<abc>abc<\/abc>');
>> <!---->]]></script>
>> 
>> Because, the javascript interpreters doesn't require more than that 
>> line beings with a "<!--" in order accept that the first line is a 
>> comment. If the code also ends with a "-->", then it can as well be 
>> interpreted as a valid HTML comment.
> 
> ... or you could just use a JS comment rather than depending on hacks 
> designed to avoid having Netscape 2 and friends render JS as text 
> (which would break if the script was placed in an external file).

I believe all user agents support this so called hack - so I wonder why 
it isn't upgraded from being considered hack. I suppose I will continue 
to us <!--->, as may be compatible with a wider range of scripting 
languages. (After all, most scripting languages for the web has some 
knowledge about the HTML syntax.)
 
>> OK, thanks David. Much appreciated. It would really be helpful if the 
>> HTML4 validator recommended escaping the "\/" instead of (only) telling 
>> us to use external script files.
> 
> It does.

The validator? Yes. In in cryptic words - as you documented below.

> •  Line 6, Column 30: end tag for element "DIV" which is not open
>    document.write('<div></div>');
  [ snip ] 
> If this error occurred in a script section of your document, you 
> should probably read this FAQ entry.

Right. There is just a single, cryptic sentence about the entire issue.

>> From said FAQ entry:
> 
> Authors should avoid using strings such as "</P>" in their embedded 
> scripts. In JavaScript, authors may use a backslash to prevent the 
> string from being parsed as markup:
> 
> (Followed by an example)

You say that you quoted the FAQ. However, you did not quote the FAQ - 
you quoted the WDG page which the FAQ points to. The FAQ only says this:

]]
The validator complains about something in my JavaScript!
Most probably, you should read the script section of WDG's excellent 
Common HTML Validation Problems document.
[[

So the FAQ page is - or appears - so cocky that it cannot explain this 
in its own words.  The FAQ has a unnecessary gotcha attitude. The W3 
validator did not need to point to the FAQ - it could have explained 
this issue itself - and it could have pointed to HTML4 directly - 
<http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.2.1>. Or at least 
the FAQ could have done so. And the validator could also have mentioned 
a word about the discrepancy between user agents and SGML requirements.

Btw, the WDG page is wrong when it says that "in XHTML, authors must 
_also_ take care when using start tags within a script element".  
Because, as we have discussed in this thread: in XHTML, as long as you 
wrap the script in <![CDATA[ ]], then you do in principle not _also_ 
have to use "\/" - except, of course for compatibility with HTML4.
-- 
leif halvard silli
Yaroslav Samchuk | 3 Feb 2010 19:23
Picon

Re: Malformed JSON output format

There is another issue, which appears in both v0.8.{5,6} if you have doctype option set to HTML5 – "lastLine" JSON field isn't set sometimes:

          {
              "lastLine": ,
              
              "message": "The Content-Type was text/html. Using the HTML parser.",
              "messageid": "html5",
              "explanation": "...",
              "type": "info"
          },


 I guess, the same approach as the one used for the "lastColumn" field can be used:

--- json_output.tmpl.orig 2010-02-03 19:07:30.000000000 +0100
+++ json_output.tmpl 2010-02-03 19:07:54.000000000 +0100
<at> <at> -10,7 +10,7 <at> <at>
     "messages": [
         <TMPL_LOOP NAME="file_errors">
           {
-              "lastLine": <TMPL_VAR NAME="line">,
+              <TMPL_IF NAME="line">"lastLine": <TMPL_VAR NAME="line">,</TMPL_IF>
               <TMPL_IF NAME="char">"lastColumn": <TMPL_VAR NAME="char">,</TMPL_IF>
               "message": <TMPL_VAR NAME="msg">,
               <TMPL_IF NAME="num">"messageid": "<TMPL_VAR NAME="num">",</TMPL_IF>

In our environment we're still using v0.8.5, since there is no official v0.8.6 release yet (or at least there is no validator-0_8_6-release tag). For the v0.8.5 branch I personally use this patch:

Index: share/templates/en_US/json_output.tmpl
===================================================================
RCS file: /sources/public/validator/share/templates/en_US/json_output.tmpl,v
retrieving revision 1.1
diff -b -r1.1 json_output.tmpl
8d7
14,18c13,14
<               <TMPL_IF NAME="err_type_err">"type": "error",</TMPL_IF>
<               <TMPL_IF NAME="err_type_warn">"type": "info",
<               "subtype": "warning"</TMPL_IF>
<               "lastLine": "<TMPL_VAR NAME="line">",
<               "lastColumn": <TMPL_VAR NAME="char">,
---
>               <TMPL_IF NAME="line">"lastLine": <TMPL_VAR NAME="line">,</TMPL_IF>
>               <TMPL_IF NAME="char">"lastColumn": <TMPL_VAR NAME="char">,</TMPL_IF>
20,22c16,20
<               "messageid": <TMPL_VAR NAME="num">,
<               "explanation": "<TMPL_VAR ESCAPE="JS" NAME="expl">",
<           }
---
>               <TMPL_IF NAME="num">"messageid": "<TMPL_VAR NAME="num">",</TMPL_IF>
>               <TMPL_IF NAME="expl">"explanation": "<TMPL_VAR ESCAPE="JS" NAME="expl">",</TMPL_IF>
>               "type": <TMPL_IF NAME="err_type_err">"error"<TMPL_ELSE>"info"<TMPL_IF NAME="err_type_warn">,
>               "subtype": "warning"</TMPL_IF></TMPL_IF>
>           }<TMPL_UNLESS NAME="__last__">,</TMPL_UNLESS>
24d21
<         
30d26

We discussed similar patch with Ville Skytta, and he pointed, that ESCAPE="JS" part might be a flaw here. But I personally haven't seen any escaping issues so far.

On 28.01.2010, at 18:09, Ville Skyttä wrote:

On Wednesday 27 January 2010, Yaroslav Samchuk wrote:
When requesting the JSON output, returned data seems to be malformed:

That's right, it's a bug, AFAIK fixed in the upcoming 0.8.6 release.
http://www.w3.org/Bugs/Public/show_bug.cgi?id=7000


Ville Skyttä | 3 Feb 2010 20:22
Picon
Picon
Favicon

Re: Malformed JSON output format

On Wednesday 03 February 2010, Yaroslav Samchuk wrote:
> There is another issue, which appears in both v0.8.{5,6} if you have
> doctype option set to HTML5 – "lastLine" JSON field isn't set sometimes:
[...]
>   I guess, the same approach as the one used for the "lastColumn"
> field can be used:

Applied in CVS, thanks.


Gmane