neethu joseph | 3 Jun 22:38 2008
Picon

Re: how to extract content from the html tag

Thanks Derrick !! I tried using the ANDFilter but no luck !! Gives me a null pointer exception
here is the page that i'm trying to read http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={ce1c851e-f6ee-4194-ad6d-c020f94be177}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1

On Thu, May 29, 2008 at 8:43 PM, Derrick Oswald <derrickoswald <at> rogers.com> wrote:
The results of applying new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the <td class="FormContentFieldValue">524</td> tag, so you could ask for toPlainText() and convert resulting the string into an integer value if you want.

----- Original Message ----
From: neethu joseph <neethujo <at> gmail.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Thursday, May 29, 2008 1:07:26 AM
Subject: Re: [Htmlparser-user] how to extract content from the html tag

Thanks for your reply ...Could you please explain a little more on this one ..
Well ultimately i'm interested in the field value of the job id i.e 524 .

On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <derrickoswald <at> rogers.com> wrote:

You should be able to construct a filter using the FilterBuilder application to look for the "Job ID" in the adjacent TD.
It will be something like:
  new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true)))


----- Original Message ----
From: neethu joseph <neethujo <at> gmail.com>
To: htmlparser-user <at> lists.sourceforge.net
Sent: Wednesday, May 28, 2008 1:06:00 PM
Subject: [Htmlparser-user] how to extract content from the html tag

Hi I'm new to HtmlParser.Could you please help me to extract the Job ID from the table .I was trying to located it as the 3rd element of the table, but the page is getting modified day by day so i need to work out an alternative to find the job ID


</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">City</td>


<td class="FormContentFieldValue">St. Louis</td>
</tr>

<tr class="FormContent">

<td class="FormContentFieldLabel">State/Province</td>

<td class="FormContentFieldValue">Missouri [MO]</td>

</tr>

<tr class="FormContent">
<td class="FormContentFieldLabel">Job Title</td>



<td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td>

</tr>
<tr class="FormContent">

<td class="FormContentFieldLabel">Job ID</td>

<td class="FormContentFieldValue">524</td>

</tr>

<tr class="FormContent">
<td class="FormContentFieldLabel">Job Type</td>


<td class="FormContentFieldValue">Director</td>
</tr>


regards

NAT


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
Derrick Oswald | 4 Jun 00:17 2008

Re: how to extract content from the html tag

I used the FilterBuilder application to quickly generate the filter you need:

import org.htmlparser.*;
import org.htmlparser.filters.*;
import org.htmlparser.beans.*;
import org.htmlparser.util.*;

public class JobId
{
    public static void main (String args[])
    {
        HasAttributeFilter filter0 = new HasAttributeFilter ();
        filter0.setAttributeName ("class");
        fi lter0.setAttributeValue ("FormContentFieldValue");
        StringFilter filter1 = new StringFilter ();
        filter1.setCaseSensitive (true);
        filter1.setLocale (new java.util.Locale ("en", "US", ""));
        filter1.setPattern ("Job ID");
        HasChildFilter filter2 = new HasChildFilter ();
        filter2.setRecursive (false);
        filter2.setChildFilter (filter1);
        HasSiblingFilter filter3 = new HasSiblingFilter ();
        filter3.setSiblingFilter (filter2);
        NodeFilter[] array0 = new NodeFilter[2];
  &n bsp;     array0[0] = filter0;
        array0[1] = filter3;
        AndFilter filter4 = new AndFilter ();
        filter4.setPredicates (array0);
        NodeFilter[] array1 = new NodeFilter[1];
        array1[0] = filter4;
        FilterBean bean = new FilterBean ();
        bean.setFilters (array1);
        if (0 != args.length)
        {
            bean.setURL (args[0]);
            System.out.println (bean.getNodes ().toHtml ());
        }
        else
            System.out.println ("Usage: java -classpath .;htmlparser.jar;htmllexer.jar JobId <url>");
    }
}


----- Original Message ----
From: neethu joseph <neethujo <at> gmail.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Tuesday, June 3, 2008 4:38:43 PM
Subject: Re: [Htmlparser-user] how to extract content from the html tag

Thanks Derrick !! I tried using the ANDFilter but no luck !! Gives me a null pointer exception
here is the page that i'm trying to read http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={ce1c851e-f6ee-4194-ad6d-c020f94be177}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1

On Thu, May 29, 2008 at 8:43 PM, Derrick Oswald <derrickoswald <at> rogers.com> wrote:
The results of applying new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the <td class="FormContentFieldValue">524</td> tag, so you could ask for toPlainText() and convert resulting the string into an integer value if you want.

----- Original Message ----
From: neethu joseph <neethujo <at> gmail.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Thursday, May 29, 2008 1:07:26 AM
Subject: Re: [Htmlparser-user] how to extract content from the html tag

Thanks for your reply ...Could you please explain a little more on this one ..
Well ultimately i'm interested in the field value of the job id i.e 524 .

On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <derrickoswald <at> rogers.com> wrote:

You should be able to construct a filter using the FilterBuilder application to look for the "Job ID" in the adjacent TD.
It will be something like:
  new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true)))


----- Original Message ----
From: neethu joseph <neethujo <at> gmail.com>
To: htmlparser-user <at> lists.sourceforge.net
Sent: Wednesday, May 28, 2008 1:06:00 PM
Subject: [Htmlparser-user] how to extract content from the html tag

Hi I'm new to HtmlParser.Could you please help me to extract the Job ID from the table .I was trying to located it as the 3rd element of the table, but the page is getting modified day by day so i need to work out an alternative to find the job ID


</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">City</td>








<td class="FormContentFieldValue">St. Louis</td>
</tr>

<tr class="FormContent">







<td class="FormContentFieldLabel">State/Province</td>

<td class="FormContentFieldValue">Missouri [MO]</td>







</tr>

<tr class="FormContent">
<td class="FormContentFieldLabel">Job Title</td>









<td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td>

</tr>
<tr class="FormContent">







<td class="FormContentFieldLabel">Job ID</td>

<td class="FormContentFieldValue">524</td>







</tr>

<tr class="FormContent">
<td class="FormContentFieldLabel">Job Type</td>








<td class="FormContentFieldValue">Director</td>
</tr>


regards

NAT


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
neethu joseph | 4 Jun 03:25 2008
Picon

Re: how to extract content from the html tag

Parser joburlparser=new Parser("http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={5e86df59-eb37-4e01-864a-e7662b31e44b}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1");
                                NodeList jobidList=joburlparser.parse(new HasAttributeFilter("class","FormContentFieldValue"));
                                jobidList.extractAllNodesThatMatch(new TagNameFilter("TD"));
                                System.out.println(jobidlist.toHtml());
                                NodeList jobid_child=jobidlist.elementAt(3).getChildren();
                                System.out.println(jobid_child.toHtml());

this gives me the jobId ,but i do not want to use elementAt(3).

On Tue, Jun 3, 2008 at 5:17 PM, Derrick Oswald <derrickoswald <at> rogers.com> wrote:
I used the FilterBuilder application to quickly generate the filter you need:

import org.htmlparser.*;
import org.htmlparser.filters.*;
import org.htmlparser.beans.*;
import org.htmlparser.util.*;

public class JobId
{
    public static void main (String args[])
    {
        HasAttributeFilter filter0 = new HasAttributeFilter ();
        filter0.setAttributeName ("class");
        filter0.setAttributeValue ("FormContentFieldValue");
        StringFilter filter1 = new StringFilter ();
        filter1.setCaseSensitive (true);
        filter1.setLocale (new java.util.Locale ("en", "US", ""));
        filter1.setPattern ("Job ID");
        HasChildFilter filter2 = new HasChildFilter ();
        filter2.setRecursive (false);
        filter2.setChildFilter (filter1);
        HasSiblingFilter filter3 = new HasSiblingFilter ();
        filter3.setSiblingFilter (filter2);
        NodeFilter[] array0 = new NodeFilter[2];
        array0[0] = filter0;
        array0[1] = filter3;
        AndFilter filter4 = new AndFilter ();
        filter4.setPredicates (array0);
        NodeFilter[] array1 = new NodeFilter[1];
        array1[0] = filter4;
        FilterBean bean = new FilterBean ();
        bean.setFilters (array1);
        if (0 != args.length)
        {
            bean.setURL (args[0]);
            System.out.println (bean.getNodes ().toHtml ());
        }
        else
            System.out.println ("Usage: java -classpath .;htmlparser.jar;htmllexer.jar JobId <url>");
    }
}


----- Original Message ----
From: neethu joseph <neethujo <at> gmail.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Tuesday, June 3, 2008 4:38:43 PM
Subject: Re: [Htmlparser-user] how to extract content from the html tag

Thanks Derrick !! I tried using the ANDFilter but no luck !! Gives me a null pointer exception
here is the page that i'm trying to read http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={ce1c851e-f6ee-4194-ad6d-c020f94be177}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1

On Thu, May 29, 2008 at 8:43 PM, Derrick Oswald <derrickoswald <at> rogers.com> wrote:
The results of applying new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the <td class="FormContentFieldValue">524</td> tag, so you could ask for toPlainText() and convert resulting the string into an integer value if you want.

----- Original Message ----
From: neethu joseph <neethujo <at> gmail.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Thursday, May 29, 2008 1:07:26 AM
Subject: Re: [Htmlparser-user] how to extract content from the html tag

Thanks for your reply ...Could you please explain a little more on this one ..
Well ultimately i'm interested in the field value of the job id i.e 524 .

On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <derrickoswald <at> rogers.com> wrote:

You should be able to construct a filter using the FilterBuilder application to look for the "Job ID" in the adjacent TD.
It will be something like:
  new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true)))


----- Original Message ----
From: neethu joseph <neethujo <at> gmail.com>
To: htmlparser-user <at> lists.sourceforge.net
Sent: Wednesday, May 28, 2008 1:06:00 PM
Subject: [Htmlparser-user] how to extract content from the html tag

Hi I'm new to HtmlParser.Could you please help me to extract the Job ID from the table .I was trying to located it as the 3rd element of the table, but the page is getting modified day by day so i need to work out an alternative to find the job ID


</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">City</td>








<td class="FormContentFieldValue">St. Louis</td>
</tr>

<tr class="FormContent">







<td class="FormContentFieldLabel">State/Province</td>

<td class="FormContentFieldValue">Missouri [MO]</td>







</tr>

<tr class="FormContent">
<td class="FormContentFieldLabel">Job Title</td>









<td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td>

</tr>
<tr class="FormContent">







<td class="FormContentFieldLabel">Job ID</td>

<td class="FormContentFieldValue">524</td>







</tr>

<tr class="FormContent">
<td class="FormContentFieldLabel">Job Type</td>








<td class="FormContentFieldValue">Director</td>
</tr>


regards

NAT


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user



-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
Henry Tran | 4 Jun 14:43 2008
Picon

How to save <TD> value to unique variables from html tables

Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
 
Parser parser = new Parser ("http://www.abc.com/...");         
           NodeList nl = parser.parse(null);
           NodeFilter currenttabledatafilter =
                   new AndFilter (
                       new TagNameFilter ("td"),
                       new OrFilter (
                           new HasAttributeFilter("class","even"),
                           new OrFilter (
                               new HasAttributeFilter("class", "odd"),
                               new AndFilter (
                                   new HasAttributeFilter("colspan","6"),
                                   new HasChildFilter(new TagNameFilter ("Strong"))))));
           NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
           int len = a1.size();
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)a1.elementAt(i);
               System.out.println(tag.toPlainTextString());
//             System.out.println(tag.toHtml());
           }
       } catch(Exception pe) {
           pe.printStackTrace();
       }
 
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign  each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
 
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
 
Thanks a lot,
Henry

Send instant messages to your online friends http://au.messenger.yahoo.com
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
Derrick Oswald | 4 Jun 14:56 2008

Re: How to save <TD> value to unique variables from html tables

You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.

----- Original Message ----
From: Henry Tran <htran_888 <at> yahoo.com.au>
To: Htmlparser-user <at> lists.sourceforge.net
Cc: htmlparser-user-request <at> lists.sourceforge.net
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html table s

Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
 
Parser parser = new Parser ("http://www.abc.com/...");         
           NodeList nl = parser.parse(null);
           NodeFilter currenttabledatafilter =
                   new AndFilter (
                       new TagNameFilter ("td"),
                     &nb sp; new OrFilter (
                           new HasAttributeFilter("class","even"),
                           new OrFilter (
                               new HasAttributeFilter("class", "odd"),
                               new AndFilter (
                          &nbsp ;        new HasAttributeFilter("colspan","6"),
                                   new HasChildFilter(new TagNameFilter ("Strong"))))));
           NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
           int len = a1.size();
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)a1.element At(i);
               System.out.println(tag.toPlainTextString());
//             System.out.println(tag.toHtml());
           }
       } catch(Exception pe) {
           pe.printStackTrace();
       }
 
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign  each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
 
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
 
Thanks a lot,
Henry

Send instant messages to your online friends http://au.messenger.yahoo.com
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
Henry Tran | 4 Jun 23:40 2008
Picon

Re: How to save <TD> value to unique variables from html tables

Hi Derrick,

Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
 
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
 
I am very new to using htmlparser and would appreciate a little guidance.
 
Thanks very much again,
 
Henry
----- Original Message ----
From: Derrick Oswald <derrickoswald <at> rogers.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables

You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.

----- Original Message ----
From: Henry Tran <htran_888 <at> yahoo.com.au>
To: Htmlparser-user <at> lists.sourceforge.net
Cc: htmlparser-user-request <at> lists.sourceforge.net
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables

Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
 
Parser parser = new Parser ("http://www.abc.com/...");         
           NodeList nl = parser.parse(null);
           NodeFilter currenttabledatafilter =
                   new AndFilter (
                       new TagNameFilter ("td"),
                       new OrFilter (
                           new HasAttributeFilter("class","even"),
                           new OrFilter (
                               new HasAttributeFilter("class", "odd"),
                               new AndFilter (
                                   new HasAttributeFilter("colspan","6"),
                                   new HasChildFilter(new TagNameFilter ("Strong"))))));
           NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
           int len = a1.size();
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)a1.elementAt(i);
               System.out.println(tag.toPlainTextString());
//             System.out.println(tag.toHtml());
           }
       } catch(Exception pe) {
           pe.printStackTrace();
       }
 
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign  each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
 
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
 
Thanks a lot,
Henry

Send instant messages to your online friends http://au.messenger.yahoo.com

Get the name you always wanted with the new y7mail email address.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
Derrick Oswald | 5 Jun 01:08 2008

Re: How to save <TD> value to unique variables from html tables


Create a node list:
NodeList results = new NodeList ();


Then in your loop over each result, add the nodes to the list instead of printing them out:
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)a1.elementAt(i);
               results.Add (tag);
           }

Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
           int len = results.size();
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)results.elementAt(i);
//             do what you want
           }

----- Original Message ----
From: Henry Tran <htran_888 <at> yahoo.com.au>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables

Hi Derrick,

Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
 
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
 
I am very new to using htmlparser and would appreciate a little guidance.
 
Thanks very much again,
 
Henry
----- Original Message ----
From: Derrick Oswald <derrickoswald <at> rogers.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables

You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.

----- Original Message ----
From: Henry Tran <htran_888 <at> yahoo.com.au>
To: Htmlparser-user <at> lists.sourceforge.net
Cc: htmlparser-user-request <at> lists.sourceforge.net
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables

Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
 
Parser parser = new Parser ("http://www.abc.com/...");         
           NodeList nl = parser.parse(null);
           NodeFilter currenttabledatafilter =
                   new AndFilter (
                       new TagNameFilter ("td"),
                     &nb sp; new OrFilter (
                           new HasAttributeFilter("class","even"),
                           new OrFilter (
                               new HasAttributeFilter("class", "odd"),
                               new AndFilter (
                                   new HasAttributeFilter("colspan","6"),
                                   new HasChildFilter(new TagNameFilter ("Strong"))))));
           NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
           int len = a1.size();
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)a1.elementAt(i);
               System.out.println(tag.toPlainTextString());
//             System.out.println(tag.toHtml());
           }
       } catch(Exception pe) {
           pe.printStackTrace();
       }
 
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign  each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
 
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
 
Thanks a lot,
Henry

Send instant messages to your online friends http://au.messenger.yahoo.com

Get the name you always wanted with the new y7mail email address.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
Henry Tran | 8 Jun 02:45 2008
Picon

Re: How to save <TD> value to unique variables from html tables

Hi Derrick,

 

It appears that I have made one step forward but two steps back in terms of parsing some of these html tables.

 

I would like to read the following table:

 

    <table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
        <tr>                                                                                            // HasParent()...
             <td class="propType">&nbsp;</td>                                      // HasAttributeFilter()...
             <td class="propType"><b>Patient</b></td>
             <td class="propType"><b>Firstname</b></td>
             <td class="propType"><b>Surname</b></td>
             <td class="propType" align="right"><b>Date of Birth</b></td>
             <td class="propType">Sex</td>
        </tr>
    </table>

 

Below are the various table data filters used to in an attempt to distinguish the correct table I wanted to read without

success:


(a) new AndFilter ( new TagNameFilter ("td"),
       new AndFilter ( new HasAttributeFilter("class", "proType"),
          new HasAttributeFilter("class", "even")));


(b) new AndFilter ( new TagNameFilter ("td"),
         new AndFilter ( new HasAttributeFilter("class", "proType"),
             new AndFilter ( new HasAttributeFilter("class", "even"),
                 new AndFilter ( new HasParentFilter ( new TagNameFilter ("tr")),
                     new AndFilter ( new HasParentFilter ( new TagNameFilter ("table")),
                         new AndFilter ( new HasAttributeFilter ("border","0"),
                             ( new HasAttributeFilter("width", "100%"))))))));

 

(c) new AndFilter ( new TagNameFilter ("table"),
          new AndFilter ( new HasAttributeFilter("border","0"),
              new AndFilter ( new HasAttributeFilter("width", "100%"),
                  new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
                      new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
                          new AndFilter ( new HasAttributeFilter("class", "proType"),
                              new HasAttributeFilter("class", "even")))))));

 

None of the above filters parse the table data I wanted. Where have I gone wrong?

 

(i) Btw, does htmlparser support HasGrandParent() and HasGrandChild() which would allow me to parse:

 

    <table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
         <tr>                                                                                           // HasParent()...
             <td class="propType">&nbsp;</td>                                      // HasAttributeFilter()...

 

(ii) I would also like to retrieve all the content of the same webpage to a file and then read it back to test out various

parsing needs without having a direct Internet connection to this site for every parsing test. Is this possible? If so, any

idea on how this can be done?

 

Many thanks again,

Jack



----- Original Message ----
From: Derrick Oswald <derrickoswald <at> rogers.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Thursday, 5 June, 2008 9:08:32 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables


Create a node list:
NodeList results = new NodeList ();


Then in your loop over each result, add the nodes to the list instead of printing them out:
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)a1.elementAt(i);
               results.Add (tag);
           }

Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
           int len = results.size();
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)results.elementAt(i);
//             do what you want
           }

----- Original Message ----
From: Henry Tran <htran_888 <at> yahoo.com.au>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables

Hi Derrick,

Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
 
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
 
I am very new to using htmlparser and would appreciate a little guidance.
 
Thanks very much again,
 
Henry
----- Original Message ----
From: Derrick Oswald <derrickoswald <at> rogers.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables

You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.

----- Original Message ----
From: Henry Tran <htran_888 <at> yahoo.com.au>
To: Htmlparser-user <at> lists.sourceforge.net
Cc: htmlparser-user-request <at> lists.sourceforge.net
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables

Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
 
Parser parser = new Parser ("http://www.abc.com/...");         
           NodeList nl = parser.parse(null);
           NodeFilter currenttabledatafilter =
                   new AndFilter (
                       new TagNameFilter ("td"),
                       new OrFilter (
                           new HasAttributeFilter("class","even"),
                           new OrFilter (
                               new HasAttributeFilter("class", "odd"),
                               new AndFilter (
                                   new HasAttributeFilter("colspan","6"),
                                   new HasChildFilter(new TagNameFilter ("Strong"))))));
           NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
           int len = a1.size();
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)a1.elementAt(i);
               System.out.println(tag.toPlainTextString());
//             System.out.println(tag.toHtml());
           }
       } catch(Exception pe) {
           pe.printStackTrace();
       }
 
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign  each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
 
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
 
Thanks a lot,
Henry

Send instant messages to your online friends http://au.messenger.yahoo.com

Get the name you always wanted with the new y7mail email address.

Get the name you always wanted with the new y7mail email address.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
Henry Tran | 9 Jun 12:40 2008
Picon

Does htmlparser Support cellpadding?

Hi forum members,
 
I am having difficulty parsing the content of the table below due to what appears to be the HasAttributeFilter() class which could not recognise the "cellpadding" attribute:
 
        <table border="0" cellspacing="0" cellpadding="2" width="100%">
 
Here are the table data filters that I have tried without much luck:
 
(i) new AndFilter ( new TagNameFilter ("table"), new HasAttributeFilter("cellpadding","2"));
(ii) new AndFilter ( new TagNameFilter ("table"), new HasAttributeFilter("cellspacing","0"));
(iii) new AndFilter ( new TagNameFilter ("table"),
          new AndFilter ( new HasAttributeFilter("cellspacing","0"),
              new HasAttributeFilter("width","100%")));
(iv)  new AndFilter ( new TagNameFilter ("table"),
                          new AndFilter ( new HasAttributeFilter("cellspacing","0"),
                              new AndFilter ( new HasAttributeFilter("cellpadding","2"),
                                  new HasAttributeFilter("width","100%"))));
Table data filters (i) & (iv) did not pick up anything while (ii) and (iii) worked but also include other tables that were not needed. Filter (iv) is perfect if only it would work. As a result, I would like to make the following queries on this issue:
 
(a) Does HasAttributeFilter() support cellpadding?
(b) Is there a limit on how many attribute HasAttributeFilter() could pick up in a table?
(c) Can HasAttributeFilter() pick up attributes in nested tables? This table is nested inside another table.
(d) Does the search for the attributes follow certain order? If so, it may mean that order of the HasAttributeFilter() may need to be alter to achieve the desire search.
 
Many thanks,
Henry

Get the name you always wanted with the new y7mail email address.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
Derrick Oswald | 9 Jun 12:56 2008

Re: How to save <TD> value to unique variables from html tables

It looks like you've got two HasAttribute filters looking for two different values in the same "class" attribute.
How can a tag have a "class=proType" *and* a "class=even" at the same time?

GrandParents and GrandChildren are handled with subfilters.
Here's an example for 'TABLE has a  grand child TD'.

new AndFilter (new TagNameFilter ("TABLE"), new AndFilter (new TagNameFilter ("TR"), new HasChildFilter (new TagNameFilter ("TD")))

You should probably play with the FilterBuilder application - it has a tutorial - to get the ha ng of it.

----- Original Message ----
From: Henry Tran <htran_888 <at> yahoo.com.au>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Saturday, June 7, 2008 8:45:01 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables

Hi Derrick,

 

It appears that I have made one step forward but two steps back in terms of parsing some of these html tables.

 

I would like to read the following table:

 

    <table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
        <tr>                                                                                            // HasParent()...
             <td class="propType">&nbsp;</td>                                      // HasAttributeFilter()...
             <td class="propType"><b>Patient</b></td>
             <td class="propType"><b>Firstname</b></td>
             <td class="propType"><b>Surname</b></td>
             <td class="propType" align="right"><b>Date of Birth</b></td>
             <td class="propType">Sex</td>
        </tr>
    </table>

 

Below are the various table data filters used to in an attempt to distinguish the correct table I wanted to read without

success:


(a) new AndFilter ( new TagNameFilter ("td"),
       new AndFilter ( new HasAttributeFilter("class", "proType"),
          new HasAttributeFilter("class", "even")));


(b) new AndFilter ( new TagNameFilter ("td"),
         new AndFilter ( new HasAttributeFilter("class", "proType"),
             new AndFilte r ( new HasAttributeFilter("class", "even"),
                 new AndFilter ( new HasParentFilter ( new TagNameFilter ("tr")),
                     new AndFilter ( new HasParentFilter ( new TagNameFilter ("table")),
                         new AndFilter ( new HasAttributeFilter ("border","0"),
                             ( new HasAttributeFilter("width", "100%"))))))));

 

(c) new AndFilter ( new TagNameFilter ("table"),
          new AndFilter ( new HasAttributeFilter("border","0"),
              new AndFilter ( new HasAttributeFilter("width", "100%"),
                  new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
           &nb sp;          new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
                          new AndFilter ( new HasAttributeFilter("class", "proType"),
                              new HasAttributeFilter("class", "even")))))));

 

None of the above filters parse the table data I wanted. Where have I gone wrong?

 

(i) Btw, does htmlparser support HasGrandParent() and HasGrandChild() which would allow me to parse:

 

    <table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
         <tr>                                                                                           // HasParent()...
             <td class="propType">&nbsp;</td>                                      // HasAttributeFilter()...

 

(ii) I would also like to retrieve all the content of the same webpage to a file and then read it back to test out various

parsing needs without having a direct Internet connection to this site for every parsing test. Is this possible? If so, any

idea on how this can be done?

 

Many thanks again,

Jack



----- Original Message ----
From: Derrick Oswald <derrickoswald <at> rogers.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Thursday, 5 June, 2008 9:08:32 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables


Create a node list:
NodeList results = new NodeList ();


Then in your loop over each result, add the nodes to the list instead of printing them out:
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)a1.ele mentAt(i);
               results.Add (tag);
           }

Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
           int len = results.size();
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)results.elementAt(i);
//             do what you want
           }

----- Original Message ----
From: Henry Tran <htran_888 <at> yahoo.com.au>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables

Hi Derrick,

Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
 
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
 
I am very new to using htmlparser and would appreciate a little guidance.
 
Thanks very much again,
 
Henry
----- Original Message ----
From: Derrick Oswald <derrickoswald <at> rogers.com>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables

You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.

----- Original Message ----
From: Henry Tran <htran_888 <at> yahoo.com.au>
To: Htmlparser-user <at> lists.sourceforge.net
Cc: htmlparser-user-request <at> lists.sourceforge.net
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables

Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
 
Parser parser = new Parser ("http://www.abc.com/...");         
     &nb sp;     NodeList nl = parser.parse(null);
           NodeFilter currenttabledatafilter =
                   new AndFilter (
                       new TagNameFilter ("td"),
                       new OrFilter (
                           new HasAttributeFilter("class","even"),
                           new OrFilter (
                               new HasAttributeFilter("class", "odd"),
                               new AndFilter (
                                   new HasAttributeFilter("colspan","6"),
                                   new HasChildFilter(new TagNameFilter ("Strong"))))));
           NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
           int len = a1.size();
           for (int i=0; i<len; i+=1)
           {
               TagNode tag = (TagNode)a1.elementAt(i);
               System.out.println(tag.toPlainTextString());
//             System.out.println(tag.toHtml());
           }
       } catch(Exception pe) {
           pe.printStackTrace();
       }
 
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign  each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
 
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
 
Thanks a lot,
Henry

Send instant messages to your online friends http://au.messenger.yahoo.com

Get the name you always wanted with the new y7mail email address.

Get the name you always wanted with the new y7mail email address.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Htmlparser-user mailing list
Htmlparser-user <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/htmlparser-user

Gmane