Re: Cookie handling
2007-02-01 02:00:17 GMT
That should be handled OK.
But it doesn't look like what you want from that URL.
From: Gavin Gilmour <gavin <at> brokentrain.net>
To: htmlparser user list <htmlparser-user <at> lists.sourceforge.net>
Sent: Tuesday, January 30, 2007 8:49:25 AM
Subject: Re: [Htmlparser-user] Cookie handling
inspection:
sokar:~/junk% curl http://www3.interscience.wiley.com/cgi-bin/abstract/68504762/ABSTRACT\?CRETRY\=1\&SRETRY\=0
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a
href="http://www3.interscience.wiley.com/cookie_setting_error.html">here</a>.</p>
</body></html>
Seems to have already decided it's dud, weird.
After a bit of investigating, the full story is a bit worse than I thought and
is involing multiple redirects. The first link I need comes back from this service:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=10629107&retmode=ref&cmd=prlinks
- which just offers up a 302 or whatever and then issues a redirect.
Fair enough, so:
---
sokar:~/junk% curl 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=10629107&retmode=ref&cmd=prlinks'
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a
href="http://dx.doi.org/10.1002/(SICI)1097-4644(1999)75:32+<84::AID-JCB11>3.0.CO;2-M">here</a>.</p>
</body></html>
---
Is giving 'http://dx...' which is what the parser is trying next I'd imagine. So then:
sokar:~/junk% curl 'http://dx.doi.org/10.1002/(SICI)1097-4644(1999)75:32+<84::AID-JCB11>3.0.CO;2-M'
<HTML><HEAD><TITLE>Handle Redirect</TITLE></HEAD>
<BODY><A
HREF="http://doi.wiley.com/10.1002/%28SICI%291097-4644%281999%2975%3A32%2B%3C84%3A%3AAID-JCB11%3E3.0.CO%3B2-M">http://doi.wiley.com/10.1002/%28SICI%291097-4644%281999%2975%3A32%2B%3C84%3A%3AAID-JCB11%3E3.0.CO%3B2-M</A></BODY></HTML>
Looking at this URL, it seems to be the 'final one' which is leading to the
(desired) destination in a browser. (Does that output even look like something
the parser would handle though?)
What a mess :(
Gavin.
P.S. Sorry about the horribly formatted mail due to the unsightly urls
involved.
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Htmlparser-user mailing list Htmlparser-user <at> lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/htmlparser-user
RSS Feed