Sébastien LEVER | 1 Dec 15:50 2004

html parser

Bonjour,
je me tourne vers vous car j'ai des soucis concernant l'implémentation de HTMLParser. je m'explique:
mon but est de transformer un source html en source jsp struts par exemple:
 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<body bgcolor="#FFB747" text="#000000" background="images/orange.gif">
</body>
</html>
 
devient:
 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html:html>
<body bgcolor="#FFB747" text="#000000" background="images/orange.gif">
</body>
</html:html>
 
aussi j'aimerais ajouter de nouveaux types de nodes dans la bibliothèque déjà existente.
 
quant je tombe sur un node html qui pourais se faire remplacer par un node struts, je crée mon node struts avec les paramètre récupérés du node html et je recompose mon arborescence.
 
qd je fait un new Html() par exemple, le tag indiqué avec la fonction toHtml() est <HTML> et il n'y a pas de tag de fermeture (</html>)
 
j'espère que vous aurez compris. est ce que je m'y prends mal ? comment faire ?
 
merci

Cordialement.

Sébastien LEVER

Derrick Oswald | 3 Dec 13:45 2004

Re: html parser

J'espère vous besoin:

public class HtmlPlus extends CompositeTag
{
    /**
     * The set of names handled by this tag.
     */
    private static final String[] mIds = new String[] {"html:html"};

    /**
     * Create a new html:html tag.
     */
    public HtmlPlus ()
    {
         TagNode end = new TagNode ();
         end.setTagName ("/html:html");
         setEndTag (end);
    }

    /**
     * Return the set of names handled by this tag.
     *  <at> return The names to be matched that create tags of this type.
     */
    public String[] getIds ()
    {
        return (mIds);
    }
}

Derrick

Sébastien LEVER wrote:

> Bonjour,
> je me tourne vers vous car j'ai des soucis concernant l'implémentation 
> de HTMLParser. je m'explique:
> mon but est de transformer un source html en source jsp struts par 
> exemple:
>  
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
> <html>
> <body bgcolor="#FFB747" text="#000000" background="images/orange.gif">
> </body>
> </html>
>  
> devient:
>  
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
> <html:html>
> <body bgcolor="#FFB747" text="#000000" background="images/orange.gif">
> </body>
> </html:html>
>  
> aussi j'aimerais ajouter de nouveaux types de nodes dans la 
> bibliothèque déjà existente.
>  
> quant je tombe sur un node html qui pourais se faire remplacer par un 
> node struts, je crée mon node struts avec les paramètre récupérés du 
> node html et je recompose mon arborescence.
>  
> qd je fait un new Html() par exemple, le tag indiqué avec la fonction 
> toHtml() est <HTML> et il n'y a pas de tag de fermeture (</html>)
>  
> j'espère que vous aurez compris. est ce que je m'y prends mal ? 
> comment faire ?
>  
> merci
>
> Cordialement.
>
> Sébastien LEVER
>

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
Sébastien LEVER | 6 Dec 11:05 2004

RE: Htmlparser-user digest, Vol 1 #444 - 1 msg

thanx a lot for your reply,
but, I decided to test another html parser, I'll try your solution later.
thanx again

sebastien lever

-----Message d'origine-----
De : htmlparser-user-admin <at> lists.sourceforge.net
[mailto:htmlparser-user-admin <at> lists.sourceforge.net]De la part de
htmlparser-user-request <at> lists.sourceforge.net
Envoyé : samedi 4 décembre 2004 05:05
À : htmlparser-user <at> lists.sourceforge.net
Objet : Htmlparser-user digest, Vol 1 #444 - 1 msg

Send Htmlparser-user mailing list submissions to
	htmlparser-user <at> lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
	https://lists.sourceforge.net/lists/listinfo/htmlparser-user
or, via email, send a message with subject or body 'help' to
	htmlparser-user-request <at> lists.sourceforge.net

You can reach the person managing the list at
	htmlparser-user-admin <at> lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Htmlparser-user digest..."

Today's Topics:

   1. Re: html parser (Derrick Oswald)

--__--__--

Message: 1
Date: Fri, 03 Dec 2004 07:45:20 -0500
From: Derrick Oswald <DerrickOswald <at> Rogers.com>
To:  htmlparser-user <at> lists.sourceforge.net
Subject: Re: [Htmlparser-user] html parser
Reply-To: htmlparser-user <at> lists.sourceforge.net

J'espère vous besoin:

public class HtmlPlus extends CompositeTag
{
    /**
     * The set of names handled by this tag.
     */
    private static final String[] mIds = new String[] {"html:html"};

    /**
     * Create a new html:html tag.
     */
    public HtmlPlus ()
    {
         TagNode end = new TagNode ();
         end.setTagName ("/html:html");
         setEndTag (end);
    }

    /**
     * Return the set of names handled by this tag.
     *  <at> return The names to be matched that create tags of this type.
     */
    public String[] getIds ()
    {
        return (mIds);
    }
}

Derrick

Sébastien LEVER wrote:

> Bonjour,
> je me tourne vers vous car j'ai des soucis concernant l'implémentation
> de HTMLParser. je m'explique:
> mon but est de transformer un source html en source jsp struts par
> exemple:
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
> <html>
> <body bgcolor="#FFB747" text="#000000" background="images/orange.gif">
> </body>
> </html>
>
> devient:
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
> <html:html>
> <body bgcolor="#FFB747" text="#000000" background="images/orange.gif">
> </body>
> </html:html>
>
> aussi j'aimerais ajouter de nouveaux types de nodes dans la
> bibliothèque déjà existente.
>
> quant je tombe sur un node html qui pourais se faire remplacer par un
> node struts, je crée mon node struts avec les paramètre récupérés du
> node html et je recompose mon arborescence.
>
> qd je fait un new Html() par exemple, le tag indiqué avec la fonction
> toHtml() est <HTML> et il n'y a pas de tag de fermeture (</html>)
>
> j'espère que vous aurez compris. est ce que je m'y prends mal ?
> comment faire ?
>
> merci
>
> Cordialement.
>
> Sébastien LEVER
>

--__--__--

_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user

End of Htmlparser-user Digest

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
oz.htmlparser | 6 Dec 23:37 2004

&amp;#8212; decoding not done properly.

give an html page with a
&amp;#8212;

this is what i did

            Lexer lexer = new Lexer("LHASA, Tibet &amp;#8212; Wearing 
blue");
            Parser parser = new Parser(lexer);

            TextExtractingVisitor visitor = new TextExtractingVisitor();

            parser.visitAllNodesWith(visitor);

            String p = visitor.getExtractedText().trim();
            logger.debug(p);

and what i got was :

  "LHASA, Tibet &#8212; Wearing blue"

what i should have gotten according to 
http://www.tntluoma.com/sidebars/codes/
is an emdash replacement:
like this

  "LHASA, Tibet - Wearing blue"

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
Derrick Oswald | 7 Dec 13:37 2004

Re: &amp;#8212; decoding not done properly.


Is it supposed top perform recursive substitution? That is, is it 
supposed to change the &amp; character reference on the first pass and 
then re-scan to find the &#8212; character reference the second time?
The HTML specification (http://www.w3.org/TR/html4/charset.html#h-5.3) 
says nothing about rescanning the converted text.
Is it a reputable site that has this content?

oz.htmlparser <at> spamex.com wrote:

> give an html page with a
> &amp;#8212;
>
> this is what i did
>
>            Lexer lexer = new Lexer("LHASA, Tibet &amp;#8212; Wearing 
> blue");
>            Parser parser = new Parser(lexer);
>                      TextExtractingVisitor visitor = new 
> TextExtractingVisitor();
>                      parser.visitAllNodesWith(visitor);
>                      String p = visitor.getExtractedText().trim();
>            logger.debug(p);
>
> and what i got was :
>
>  "LHASA, Tibet &#8212; Wearing blue"
>
> what i should have gotten according to 
> http://www.tntluoma.com/sidebars/codes/
> is an emdash replacement:
> like this
>
>  "LHASA, Tibet - Wearing blue"

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
Roy Russo | 7 Dec 21:17 2004

transforming examples

Does anyone have any examples of transforming relative links? I seem to be 
running in to trouble, where my code spits out the text inside the href tags 
AND is also transforming absolutes urls. I only want html parser to prepend 
to the relative urls, not absolute urls.

My code:
            Parser parser = Parser.createParser(sHTML, null); // we must 
parse and fix relative links before serving
            UrlModifyingVisitor visitor = null;
            try {
                visitor = new UrlModifyingVisitor(parser, 
"http://localhost:8080/nukes/files/");
                parser.visitAllNodesWith(visitor);
            } catch (ParserException pe) {
                pe.printStackTrace();
            }
            String result = visitor.getModifiedResult();
            writer.write(result);

This takes in this code fragment:

<b>this is an index page</b>
<br><br>
<a href="foobar/index.html">hello world</a>
<Br><Br>
<a href="http://www.google.com">Another link</a>
<br><br>
This is an image:<br>
<img src="images/jbosslogo.gif">
<br>

And gives me this:

<b>this is an index page</b>
<br><br>
hello world<a 
href="http://localhost:8080/nukes/files/foobar/index.html">hello world</a>
<Br><Br>
Another link<a 
href="http://localhost:8080/nukes/files/http://www.google.com">Another 
link</a>
<br><br>
This is an image:<br>
<img src="http://localhost:8080/nukes/files/images/jbosslogo.gif">
<br>

Both the absolute link to google and the extra text are not necessary.

Roy Russo
JBoss Portal Developer
JBoss, Inc.
404-467-8555 x223
roy <at> jboss.com

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
Derrick Oswald | 8 Dec 00:37 2004

Re: transforming examples


Look at the code in org/htmlparser/parserapplications/SiteCapturer.java 
for examples.

Roy Russo wrote:

> Does anyone have any examples of transforming relative links? I seem 
> to be running in to trouble, where my code spits out the text inside 
> the href tags AND is also transforming absolutes urls. I only want 
> html parser to prepend to the relative urls, not absolute urls.
>
> My code:
>            Parser parser = Parser.createParser(sHTML, null); // we 
> must parse and fix relative links before serving
>            UrlModifyingVisitor visitor = null;
>            try {
>                visitor = new UrlModifyingVisitor(parser, 
> "http://localhost:8080/nukes/files/");
>                parser.visitAllNodesWith(visitor);
>            } catch (ParserException pe) {
>                pe.printStackTrace();
>            }
>            String result = visitor.getModifiedResult();
>            writer.write(result);
>
> This takes in this code fragment:
>
> <b>this is an index page</b>
> <br><br>
> <a href="foobar/index.html">hello world</a>
> <Br><Br>
> <a href="http://www.google.com">Another link</a>
> <br><br>
> This is an image:<br>
> <img src="images/jbosslogo.gif">
> <br>
>
> And gives me this:
>
> <b>this is an index page</b>
> <br><br>
> hello world<a 
> href="http://localhost:8080/nukes/files/foobar/index.html">hello 
> world</a>
> <Br><Br>
> Another link<a 
> href="http://localhost:8080/nukes/files/http://www.google.com">Another 
> link</a>
> <br><br>
> This is an image:<br>
> <img src="http://localhost:8080/nukes/files/images/jbosslogo.gif">
> <br>
>
> Both the absolute link to google and the extra text are not necessary.
>
> Roy Russo
> JBoss Portal Developer
> JBoss, Inc.
> 404-467-8555 x223
> roy <at> jboss.com
>

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
Roy Russo | 8 Dec 03:54 2004

Re: transforming examples

Thanks Derrick,

I actually followed the original post and your recursion pseudo-code in this 
post: http://sourceforge.net/forum/message.php?msg_id=2552492

Seems to be working alright now.

Roy Russo
JBoss Portal Developer
JBoss, Inc.

----- Original Message ----- 
From: "Derrick Oswald" <DerrickOswald <at> Rogers.com>
To: <htmlparser-user <at> lists.sourceforge.net>
Sent: Tuesday, December 07, 2004 6:37 PM
Subject: Re: [Htmlparser-user] transforming examples

>
> Look at the code in org/htmlparser/parserapplications/SiteCapturer.java 
> for examples.
>
> Roy Russo wrote:
>
>> Does anyone have any examples of transforming relative links? I seem to 
>> be running in to trouble, where my code spits out the text inside the 
>> href tags AND is also transforming absolutes urls. I only want html 
>> parser to prepend to the relative urls, not absolute urls.
>>
>> My code:
>>            Parser parser = Parser.createParser(sHTML, null); // we must 
>> parse and fix relative links before serving
>>            UrlModifyingVisitor visitor = null;
>>            try {
>>                visitor = new UrlModifyingVisitor(parser, 
>> "http://localhost:8080/nukes/files/");
>>                parser.visitAllNodesWith(visitor);
>>            } catch (ParserException pe) {
>>                pe.printStackTrace();
>>            }
>>            String result = visitor.getModifiedResult();
>>            writer.write(result);
>>
>> This takes in this code fragment:
>>
>> <b>this is an index page</b>
>> <br><br>
>> <a href="foobar/index.html">hello world</a>
>> <Br><Br>
>> <a href="http://www.google.com">Another link</a>
>> <br><br>
>> This is an image:<br>
>> <img src="images/jbosslogo.gif">
>> <br>
>>
>> And gives me this:
>>
>> <b>this is an index page</b>
>> <br><br>
>> hello world<a 
>> href="http://localhost:8080/nukes/files/foobar/index.html">hello 
>> world</a>
>> <Br><Br>
>> Another link<a 
>> href="http://localhost:8080/nukes/files/http://www.google.com">Another 
>> link</a>
>> <br><br>
>> This is an image:<br>
>> <img src="http://localhost:8080/nukes/files/images/jbosslogo.gif">
>> <br>
>>
>> Both the absolute link to google and the extra text are not necessary.
>>
>> Roy Russo
>> JBoss Portal Developer
>> JBoss, Inc.
>> 404-467-8555 x223
>> roy <at> jboss.com
>>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now. 
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> Htmlparser-user mailing list
> Htmlparser-user <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user 

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
Rob Eger | 9 Dec 20:35 2004

NullPointerException when parsing an html file

I get a NullPointerException when parsing the <a> links from this file 
with both my code and using "parser temp.html A" (bin/parser).  I 
haven't attached the file (not sure it's allowed in this mailing list), 
but I can send it to you separately if you want, Derrick.

Sounds kind of similar to the problem in bug 1024045: StringBean crashes 
on an URL

stack trace from my code:

Exception in thread "main" java.lang.NullPointerException
         at 
org.htmlparser.visitors.TagFindingVisitor.visitTag(TagFindingVisitor.java:68)
         at org.htmlparser.nodes.TagNode.accept(TagNode.java:775)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:439)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.tags.CompositeTag.accept(CompositeTag.java:435)
         at org.htmlparser.Parser.visitAllNodesWith(Parser.java:752)
         at com.aptas.khepri.TagExtractor.findTags(TagExtractor.java:123)
         at com.aptas.khepri.Khepri.extract(Khepri.java:428)
         at com.aptas.khepri.Khepri.main(Khepri.java:97)

stack trace from bin/parser:

Exception in thread "main" java.lang.NullPointerException
         at 
org.htmlparser.filters.TagNameFilter.accept(TagNameFilter.java:64)
         at 
org.htmlparser.nodes.AbstractNode.collectInto(AbstractNode.java:160)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:398)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at 
org.htmlparser.tags.CompositeTag.collectInto(CompositeTag.java:396)
         at org.htmlparser.Parser.parse(Parser.java:565)
         at org.htmlparser.Parser.main(Parser.java:740)

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
Derrick Oswald | 13 Dec 13:45 2004

Re: a question about ScriptTag


There is a known problem with parsing script with quotes, newlines and
comments:

    1024045 StringBean crashes on an URL

that might be the same problem.

Derrick

biao liu wrote:

>To get the pure text,StringBean is a nice choise.But
>these days,I find some pages that StringBean can't get
>right result.
>The raw code of the page is showed as follows(for
>simplicity,I have simplified the raw code):
><html>
><head>
><title>Finance adviser</title>
><meta http-equiv="Content-Type" content="text/html;
>charset=gb2312">
><script language=javascript>
>function chg(chgobj,chghtml)
>{
>try {
>obj=eval(chgobj);
>if (typeof(obj.length)=="number"){for
>(i=0;i<obj.length;i++)obj[i].innerHTML =chghtml}
>else {obj.innerHTML =chghtml}
>}
>catch(e){return false}
>finally{}
>}
>function changead()
>{
>adstr='<a
>href=http://web1.jrj.com.cn/Myhome/account/ShowMore.asp
>target=_blank
>class=tru><u>太阳每天都是新的,帐户每日都在增值!</u></a>'
>;
>adstr1='用基金经理、机构的视角观察股市......';
>adstr2='全新优良投资工具,透视股市的智能波段王' ;
>//chg("SpanAD1","(<a
>href=http://user.jrj.com.cn/jrjref/default.htm
>class=tru >"+adstr1+"</a>)")
>chg("SpanAD1","(<a
>href=http://www.jrj.com.cn/xfile/index.htm class=tru
>  
>
>>"+adstr1+"</a>)")
>>    
>>
>chg("SpanAD2","(<a
>href=http://web1.jrj.com.cn/Myhome/mystock/ShowMore.asp
>class=tbu>行情</a>,<a
>href=http://sms.jrj.com.cn/jrjsms/mysms/stock.asp
>class=tru>预警</a>,<a
>href=http://sms.jrj.com.cn/jrjsms/mysms/radar.asp
>class=tbu>轨迹</a>,<a
>href=http://user.jrj.com.cn/Custom/Default.asp
>class=tbu>资讯</a>)")
>chg("SpanAD3","("+adstr+")")
>chg("SpanAD4","(<a
>href=http://www.jrj.com.cn/ToLink.asp
>class=tru>"+adstr2+"</a>)")
>chg("SpanAD5","<table WIDTH=250 align=left
>CELLPADDING=2 CELLSPACING=2
>BGCOLOR=#f4f7fb><tr><td><OBJECT
>classid=clsid:D27CDB6E-AE6D-11cf-96B8-444553540000  
>codebase=http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=5,0,0,0
>WIDTH=250 HEIGHT=214 id=ShockwaveFlash1><PARAM
>NAME=movie
>VALUE=http://www.jrj.com.cn/action/dxjzfwhzh.swf><PARAM
>NAME=scale
>VALUE=exactfit></OBJECT></td></tr></table>")
>}
>function ShowStock()
>{
>adstr='为我的股票买份保险!'
>a=new Array()
>if (Stock!="")
>{
>a=Stock.split(",")
>document.write("<tr bgcolor=#edf0f5><td
>class=tb>相关股票:")
>for (i=0;i<a.length;i++)
>{
>document.write("<a
>href=http://share.jrj.com.cn/cominfo/default.asp?or_gpdm="+a[i]+"
>class=tl>"+a[i]+"</a> ")
>}
>document.write("  <a
>href=http://sms.jrj.com.cn/jrjsms/MySMS/Stock.asp
>target=_blank><u>"+adstr+"</u></a></td></tr>")
>}
>}
>function ShowRNews()
>{
>adstr='我的行情预警系统 不再被套的秘密武器'
>if (R.length>0)
>{
>document.write('<tr bgcolor=#ffc800><td><table
>width=100% cellspacing=0 cellpadding=0><tr><td
>class=f3>相关链接:</td><td align=right><A
>href=http://sms.jrj.com.cn/jrjsms/MySMS/Stock.asp
>target=_blank
>class=f3><u>'+adstr+'</u></A></td></tr></table></td></tr>')
>document.write("<tr bgcolor=#edf0f5><td>")
>for (i=0;i<(R.length+1)/3-1;i++)
>{
>newsdate=R[i*3+1]
>tmpstr='000000000000'+R[i*3+2]
>tmpstr=tmpstr.substr(tmpstr.length-12,12)
>document.write('<a
>href=http://news1.jrj.com.cn/news/'+newsdate.substr(0,10)+'/'+tmpstr+'.html
>class=tl>'+R[i*3]+'</a>')
>document.write(" <font
>class=c6>("+R[i*3+1].substr(0,R[i*3+1].length-3)+")</font><br>")
>}
>document.write("</td></tr>")
>document.write("<tr><td align=center
>bgcolor=#edf0f5><a href=http://www.jrj.com.cn/
>class=f3
>  
>
>>股票_证券_基金_财经_尽在中国金融界</a></td></tr>")
>>    
>>
>}
>}
>function SendSms()
>{
>frm.Msg.value = strNewsTitle.innerText;
>frm.Urltree.value = document.all.StrUrltree.innerText;
>if (frm.Msg.value!='')
>{
>window.open('about:blank','SMS','height=468,width=502');
>frm.submit();
>}
>}
>Stock=''
>R=new Array()
></script>
><base target=_blank>
></head>
><body leftmargin=0 topmargin=5 onload=changead()>
><table><tr><td><font>something should display
>here!</font></td></tr></table>
></body></html>
><script language=javascript>
>function QuoteClick()
>{var NewPath;
>d = new Date();
>s ="Cs" + d.getUTCHours() + d.getUTCMinutes() +
>d.getUTCSeconds() + d.getUTCMilliseconds();
>if (document.symbol_entry.symbol.value=='')
>document.symbol_entry.symbol.value='000001'
>lcSelect =
>document.symbol_entry.menu1.options[document.symbol_entry.menu1.selectedIndex].value;
>if (lcSelect=='10'){  
>window.open("http://quote.jrj.com.cn/htmdata/switch.asp?code="+escape(document.symbol_entry.symbol.value),"jrjminhello","height=250,width=270,status=no,toolbar=no,menubar=no,location=no,resizable=no");
>return false;}
>if (lcSelect=='11'){
>window.open("http://quote.jrj.com.cn/htmdata/gif.asp?code="+escape(document.symbol_entry.symbol.value),s,"height=290,width=434,status=no,toolbar=no,menubar=no,location=no,resizable=no","replace");
>return false;
>}
>if (lcSelect=='12'){
>window.open("http://quote.jrj.com.cn/htmdata/kline.asp?code="+escape(document.symbol_entry.symbol.value),s,"height=290,width=434,status=no,toolbar=no,menubar=no,location=no,resizable=no","replace");
>return false;
>}
>if (lcSelect=='13'){
>window.open("http://share.jrj.com.cn/cominfo/gsgg.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='14'){
>window.open("http://share.jrj.com.cn/cominfo/ggxw.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='15'){
>window.open("http://share.jrj.com.cn/cominfo/ggpl.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='16'){
>window.open("http://share.jrj.com.cn/cominfo/gsgk.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='17'){
>window.open("http://share.jrj.com.cn/cominfo/gbjg.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='18'){
>window.open("http://share.jrj.com.cn/cominfo/mgsy.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='19'){
>window.open("http://share.jrj.com.cn/cominfo/fhsp.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='20'){
>window.open("http://share.jrj.com.cn/cominfo/sdgd.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='21'){
>window.open("http://share.jrj.com.cn/cominfo/jbcwsj.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='22'){
>window.open("http://share.jrj.com.cn/cominfo/cwbl.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='23'){
>window.open("http://share.jrj.com.cn/cominfo/zcfzb.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='24'){
>window.open("http://share.jrj.com.cn/cominfo/glgs.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='25'){
>window.open("http://share.jrj.com.cn/cominfo/ggry.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='26'){
>window.open("http://share.jrj.com.cn/cominfo/zqbg.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='261'){
>window.open("http://share.jrj.com.cn/cominfo/jdbg.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='27'){
>window.open("http://share.jrj.com.cn/cominfo/ndbg.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='28'){
>window.open("http://share.jrj.com.cn/cominfo/gszc.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='29'){
>window.open("http://share.jrj.com.cn/cominfo/pgggs.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='30'){
>window.open("http://share.jrj.com.cn/cominfo/ssggs.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>if (lcSelect=='31'){
>window.open("http://share.jrj.com.cn/cominfo/jjcc.asp?or_gpdm="
>+ escape(document.symbol_entry.symbol.value));
>return false;
>}
>}
></script>
>
>The right result should be Finance adviser something
>should display here!
>but the result of StringBean is just:Finance adviser.
>
>"something should display here!"are lost.
>Then I use NodeVistor to parse the page,I find
>ScriptTag isn't rightly detected.The first <script>
>and the second </script> are to be seen as a pair.So
>the content between them are disappeared.
>
>I don't know why htmlparser can't parse this page
>correctly.I think Maybe the tag(ie.<td>) in the first
>pair of script confused the htmlparser.How can I parse
>this page correctly? 
>
>
>  
>

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/

Gmane