Subramanyam Mallela | 1 Aug 2003 21:51
Favicon

Re: NON-event-driven libwww


Hi 
    I have a simple question. 
    I have many html files on disk and I need to parse them.
    how can I get libwww to parse these files ?

    All the example files seem to download first from the web 
    and then parse the content. I need to by pass all this downloading.

    Any help ?
    Thanks
    Manyam

Subramanyam Mallela | 4 Aug 2003 23:30
Favicon

Parsing local html files


Hi
    how can I use the libwww HTML parser for 
    parsing local files on the disk. 
    I don't need to download and use rest of the 
    code ?
    Is there any example code for this.

    Thanks for any help
    Manyam

Tim Serong | 5 Aug 2003 01:47
Favicon

RE: Parsing local html files


Hi,

The simplest thing to do is supply a file URL (something like
file:///foo or file:///c:/foo.txt on Windows) for the Request, rather
than an HTTP URL.  libwww should then read the file from disk.

Alternately, you can hack up something like this (please excuse the C++
style):

    // Declare HTStream, so you can write to it directly
  typedef struct _HTStream
  {
    HTStreamClass * isa;
  } HTStream;

  ...

  HText_registerLinkCallback(myFoundLink);
    // register any other required callbacks here
  HTRequest * r = HTRequest_new();
    // this base URL will be used for resolving links
    // in the file being parsed
  HTRequest_setAnchor(r, HTAnchor_findAddress("http://baseurl/"));
  HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0);
  FILE * fp = fopen("the file", "rb");
  char buf[4096];
  while (!feof(fp))
  {
    size_t bytes = fread(buf, 1, 4096, fp);
(Continue reading)

Jerry G. Chiuan | 6 Aug 2003 20:48

POST bytes of data and receive response from server

Hi All,
I try to send this email to see anyone can give me some helps for my difficulty
 
I would like to use libwww to POST bytes of data from my application to the server and receive response as well from there, save response as bytes of data.
my data is stored in some kind of data structure like this:
 
class buffer {
char a[2];
char b[1];
char *c = "Hello World";    // length is changeable
::
::
}
 
my problem is:
 
- is it possible to POST seperated bytes of data instead of using

HTAnchor_setDocument(src, data);        // char   *data

I mean I don't want to construct my buffer as single "consecutive bytes of data" before using setDocument( ) and set it at a time, cause it brings overhead of array copy and possible memory reallocation ( length of data pointed by "c" is dynamic, not fixed length ).

Instead, I would like to directly dump my data to libwww, a, b, and c.....one by one, like what we can do by using JAVA's DataOutputStream.write( )

Or, you think this kind of overhead can't be avoid anyway. libwww also needs to do it if application doesn't do it and just directly dump data seperatedly as what I mention.

 

Regds,

- Jerry  

 

 

 

 

Paul Accosta | 6 Aug 2003 20:54

Re: POST bytes of data and receive response from server


JFYI

I was having similar problems with libwww and I couldnt find help here.
I switched over to use libcurl and all my web coding problems have
disappeared. It is very easy to use and the mailing list are actively
monitored.

Pico G

"Jerry G. Chiuan" <jerry <at> oridus.com> wrote:

>Hi All,
> I try to send this email to see anyone can give me some helps for my
> difficulty
> 
> I would like to use libwww to POST bytes of data from my application
to
> the server and receive response as well from there, save response as
> bytes of data.
> my data is stored in some kind of data structure like this:
> 
> class buffer {
> char a[2];
> char b[1];
> char *c = "Hello World";    // length is changeable
> ::
> ::
> }
> 
> my problem is:
> 
> - is it possible to POST seperated bytes of data instead of using 
> HTAnchor_setDocument(src, data);        // char   *data 
> 
> I mean I don't want to construct my buffer as single "consecutive
bytes
> of data" before using setDocument( ) and set it at a time, cause it
> brings overhead of array copy and possible memory reallocation (
length
> of data pointed by "c" is dynamic, not fixed length ). 
> 
> Instead, I would like to directly dump my data to libwww, a, b, and
> c.....one by one, like what we can do by using JAVA's
> DataOutputStream.write( )
> 
> Or, you think this kind of overhead can't be avoid anyway. libwww also
> needs to do it if application doesn't do it and just directly dump
data
> seperatedly as what I mention.
> 
> 
> 
> Regds,
> 
> - Jerry  
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
---------------------------------------

Renate Bahnemann | 11 Aug 2003 13:08

FTP problem: Get dir listing when expecting 404 Not Found

Hi

I am new to libwww but have managed to integrate libwww into our
application to retrieve small ASCII files transparently from either an
ftp or webserver.

However, I've hit a snag and would like to propose a small patch that
gets around the problem.

Problem occurs for the following:

ftp://<host>:<port>/<top-dir>/<invalid-dir>/<filename>
	returns an empty directory listing,
	I expected "404 Not Found" error.

By comparison the following scenario's produce:

ftp://<host>:<port>/<top-dir>/<valid-dir>
	returns "404 Not Found"

ftp://<host>:<port>/<top-dir>/<valid-dir>/
	returns directory listing

ftp://<host>:<port>/<top-dir>/<invalid-dir>
	return "404 Not Found"

I am wondering whether the current behaviour of the library conforms to
paragraph 3.2.2 of RFC 1738.

Here is a portion of the trace of the communication between the client
and the server:
IN is the client
OUT is the server
   ....	
   OUT 230 Guest login ok, access restrictions apply.
   IN  TYPE I
   OUT 200 Type set to I.
   IN  PORT 192,168,35,47,153,217
   OUT 200 PORT command successful.
   IN  RETR /pub/nonsuch/anyfile
   OUT 550 /pub/nonsuch/anyfile: No such file or directory.
   IN  CWD /
   OUT 250 CWD command successful.
   IN  CWD pub
   OUT 250 CWD command successful.
   IN  CWD nonsuch
   OUT 550 nonsuch: No such file or directory.
   IN  SYST
   OUT 215 UNIX Type: L8 Version: SUNOS
   IN  LIST anyfile
   OUT 150 Binary data connection for /bin/ls (192.168.35.47,39385) (0
bytes).
   OUT 226 Binary Transfer complete.

Stepping through the code I found the code that causes this to be in
HTFTP.c(v1.109):
   1568: 	    else if (!FTP_DIR(data) && !data->stream_error) {
   1569: 		FTPListType(data, ctrl->server);

If I delete the above lines, I get the results that I expect and am
wondering whether the changes that were applied to CVS versions 1.95 and
1.104 have not made this code redundant.

I am attaching a patch that shows my changes.
Patch tests OK with: 
	patch HTFTP.c HTFTP.patch

(My platform is Solaris 5.7.
Also the application defaults to FTP_DEFAULT_TRANSFER_MODE (i.e. 'I'))

Is my 'fix' valid or am I misunderstanding something and making a
mistake in my application?

Regards
Renate
*** HTFTP.c.orig	Fri Aug  8 19:35:58 2003
--- HTFTP.c	Fri Aug  8 19:29:57 2003
***************
*** 1563,1575 ****
  		ctrl->state = FTP_SUCCESS;
  	    else if (status == HT_OK)
  		ctrl->state = FTP_NEED_DCON;
! 	    else if (HTRequest_method(request) == METHOD_PUT)
  		ctrl->state = FTP_ERROR;
- 	    else if (!FTP_DIR(data) && !data->stream_error) {
- 		FTPListType(data, ctrl->server);
- 		ctrl->state = FTP_NEED_SERVER;         /* Try a dir instead? */
- 	    } else
- 		ctrl->state = FTP_ERROR;
  	    break;

  	  case FTP_NEED_SERVER:
--- 1563,1570 ----
  		ctrl->state = FTP_SUCCESS;
  	    else if (status == HT_OK)
  		ctrl->state = FTP_NEED_DCON;
! 	    else
  		ctrl->state = FTP_ERROR;
  	    break;

  	  case FTP_NEED_SERVER:
Richard Atterer | 12 Aug 2003 14:05

Re: FTP problem: Get dir listing when expecting 404 Not Found


Hi Renate,

On Mon, Aug 11, 2003 at 12:08:43PM +0100, Renate Bahnemann wrote:
> Problem occurs for the following:
> 
> ftp://<host>:<port>/<top-dir>/<invalid-dir>/<filename>
> 	returns an empty directory listing,
> 	I expected "404 Not Found" error.

I also noticed the same problem, but it hadn't yet hurt bad enough for me
to fix it. :)

I've tried your patch and can confirm that it works and appears to be 
correct - many thanks!

FYI, there are a few other problems with HTFTP:
- Doesn't support resuming downloads from a certain offset
  [me, 13 Feb, patch]
- Breaks with certain ftpd, e.g. OpenBSD ftpd [me, 13 Feb, patch]
- Incorrect timeout triggers if server is a bit slow to open the data 
  connection [fix discussed by Timothee Besset, 6 May]

HTTP:
- Various problems with POST requests, especially when 100 Continue'ing
  [discussed nicely /somewhere/ - uh, need to track that message down]
- Some problems with pipelining [me, 19 Feb, patch / me, 3 Mar, fixed in CVS]
- Some issues with HTTP cookies

Other:
- Various mem leaks [Tim Serong, 26 Mar; Torbjörn Carlsson, 13 May]
- Win32 minor fix [Calum, 21 May]
...and lots more. :-/

Jose, I have a very bad conscience: I was given CVS write access months ago 
and still haven't gotten around to committing things. Sorry about that - I 
still use libwww regularly, so I'm bound to become more active again sooner 
or later.

> I am wondering whether the current behaviour of the library conforms to
> paragraph 3.2.2 of RFC 1738.

RFC 1738 describes how to turn an FTP URL into FTP commands. If you have a 
look, this RFC was actually written in *1994* by the same Tim Berners-Lee 
who wrote the libwww FTP code in *1991*. :-) I agree that the code probably 
doesn't comply to the RFC.

OTOH, e.g. doing a direct "RETR /foo/bar/file" can be a lot faster than
CWDing to the foo/bar directory, especially with high-latency
connections...

> Is my 'fix' valid or am I misunderstanding something and making a mistake
> in my application?

I like your patch! :)

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer     |  GnuPG key:
  | \/¯|  http://atterer.net  |  0x888354F7
  ¯ '` ¯

For posterity, without the patch I get the following log:

FTP Tx...... CWD asdf
Write Socket 10 bytes written to 5
download:       Alert 16 for ftp://localhost/asdf/foo obj 0x8231ae8
FTP Get Data now in state NEED_CWD
Read Socket. WOULD BLOCK fd 5
Host Event.. READ passed to `ftp://localhost/asdf/foo'
FTP Event... now in state FTP_NEED_DATA
FTP Get Data now in state NEED_CWD
Read Socket. 38 bytes read from socket 5
Host........ passing 38 bytes as consumed to 0x8239ee0
Host........ 0 bytes remaining 
FTP Rx...... `550 asdf: No such file or directory.'
Read Socket. Target returns 200
FTP Get Data now in state SUB_ERROR
Error....... Add  22    Severity: 1     Parameter: `Unspecified'        
Where: `HTFTPGetData'
FTP Event... now in state FTP_NEED_SERVER
FTP Server.. now in state NEED_SYST
FTP Tx...... SYST
Write Socket 6 bytes written to 5
download:       Alert 16 for ftp://localhost/asdf/foo obj 0x8231ae8
FTP Server.. now in state NEED_SYST
Read Socket. WOULD BLOCK fd 5
Host Event.. READ passed to `ftp://localhost/asdf/foo'
FTP Event... now in state FTP_NEED_SERVER
FTP Server.. now in state NEED_SYST
Read Socket. 19 bytes read from socket 5
Host........ passing 19 bytes as consumed to 0x8239ee0
Host........ 0 bytes remaining 
FTP Rx...... `215 UNIX Type: L8'
Read Socket. Target returns 200
FTP Server.. now in state CHECK_SYST
FTP Server.. now in state SUB_SUCCESS
FTP Server.. Guessed type 4
FTP Event... now in state FTP_NEED_DATA
FTP Get Data now in state NEED_SELECT
FTP Get Data now in state NEED_REST
FTP Get Data now in state NEED_ACTION
FTP Tx...... LIST foo
Write Socket 10 bytes written to 5
download:       Alert 16 for ftp://localhost/asdf/foo obj 0x8231ae8
FTP Get Data now in state NEED_ACTION
Read Socket. WOULD BLOCK fd 5
Host Event.. READ passed to `ftp://localhost/asdf/foo'
FTP Event... now in state FTP_NEED_DATA
FTP Get Data now in state NEED_ACTION
Read Socket. 56 bytes read from socket 5
Host........ passing 56 bytes as consumed to 0x8239ee0
Host........ 0 bytes remaining 
FTP Rx...... `150 Opening BINARY mode data connection for '/bin/ls'.'
Read Socket. Target returns 200
FTP Get Data now in state NEED_ACCEPT
Accepted.... socket 7
Host connect Unlocking Host 0x81b62d0
FTP Get Data Passive data socket 7
FTP Get Data now in state NEED_STREAM
StreamStack. Constructing stream stack for text/html to */*
StreamStack. Source output
Error....... Add   2    Severity: 8     Parameter: `Unspecified'        
Where: `HTDir_new'
HTDir_new... base is `foo/'
[...]

With the patch:

FTP Tx...... CWD asdf
Write Socket 10 bytes written to 5
download:       Alert 16 for ftp://localhost/asdf/foo obj 0x8231790
FTP Get Data now in state NEED_CWD
Read Socket. 38 bytes read from socket 5
Host........ passing 38 bytes as consumed to 0x82380b0
Host........ 0 bytes remaining 
FTP Rx...... `550 asdf: No such file or directory.'
Read Socket. Target returns 200
FTP Get Data now in state SUB_ERROR
Error....... Add  22    Severity: 1     Parameter: `Unspecified'        
Where: `HTFTPGetData'
FTP Event... now in state FTP_ERROR
Channel..... Delete input stream 0x824f4d8 from channel 0x81c1020
Channel..... Delete input stream 0x824f4d8 from channel 0x81c1020
Net Object.. Delete 0x81cdbe8 and call AFTER filters
Host info... Remove 0x81cdbe8 from pipe
Host Object. closing socket 6

Tanmay Patwardhan | 15 Aug 2003 22:33
Picon
Favicon

Libwww and Windows events


Hi,

I use libwww as an underlying http library, in a COM dll. When COM is used
in STA (single threaded apartments), there seems to be some issue with
libwww and COM win32 message queues.

If libwww in the midst of processing a post request, and a second COM
request is done, the libwww event loop seems to crash.

In the HTEvtLst.c file, the GetMessage(..) while loop does the following:
while (!HTEndLoop  && (timepass = GetMessage(&msg,0,0,0)))
{
	TranslateMessage(&msg);
	DispatchMessage(&msg);
}

When the message for the second event is trigerred, Libwww doesnt seem to 
know how to handle this, and the first request loop exists abruptly. This 
causes the app to freeze up. Using single threaded apartment, should not 
cause any multiple thread issues. Hence, this seems to be simply a case of 
Libwww not being able to handle more than one message in the queue.

If anyone has an idea of how to get around this, I would really appreciate 
that.

Thanks and regards,
Tanmay

-----------------------
Tanmay Patwardhan
Applications Developer,
UBS Warburg,
Chicago, IL.
-----------------------

_________________________________________________________________
Add photos to your e-mail with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=features/featuredemail

sbel | 16 Aug 2003 14:16
Picon
Favicon

RE: Proxy authentication in libwww


Hi Francisco,

<skip>

>The problem is that the proxy is configured to >ask for a user/password, but
>the application never ask it and returns with no >data. The environment
>variable for the proxy is settled and I'm >debugging the application and
>libwww try to use it.
>
>In http.c, after the connection is established >with the proxy, in
>HTTPNextState() the status is 407 which is >correct, but it seems that
>despite this there is no call to the callback >settled in the main
>program for the user defined handler
>(HTAlert_add(PromptUsernameAndPassword, >HT_A_USER_PW).

I try to write the same code for Linux (RedHat9; gcc 3.2.2) and reveal the same problem:
In main function were called 

HTProfile_newNoCacheClient(APP_NAME, APP_VERSION);

HTProxy_add( "http", MY_PROXY );
HTNoProxy_add( "localhost", NULL, 0 );

HTAlert_deleteOpcode( HT_A_USER_PW );
HTAlert_add( PromptUsernameAndPassword, HT_A_USER_PW );

HTAlert_setInteractive( YES );

....

After that, in main, I've tried to connect to some host in Internet (http://www.hotmail.com e.g.) and
always received (over traces) "407: Proxy authorization required". 

It looks like application hangs on HTEventList_loop().
There is no call to the callback settled in the main.

When I try to connect to http://localhost (NoProxy) - I receive user/password dialog for my local apache
server, that works normally.

So wassup?! Why this happens?
I see I'm not a first person who ask this question.
So does anybody find a solution to this problem already?!

Thanks in advance!

Sergey.
__________
www.newmail.ru -- Новая Почта для нового поколения.

sbel | 22 Aug 2003 15:15
Picon
Favicon

Re: proxy authentication.


>I am behind a proxy and have been trying to do >proxy authentication for
>a long time. but cant get the program access >webpages. 
>I have tried doing HTProxy_add. But donot know >how to do the
>authentication part.
>I have also tried setting the http_proxy >environment variable, but the
>program does not ask for user-name, password for >the proxy.

Hello!

I've revealed the same problem when have tried to apply to proxy authentication in my programm.

For my application I use RedHat Linux 9 (2.4.20), for compilation - gcc 3.2.2. Library installed from w3c-libwww-5.4.0-4.i386.rpm.

In main function I do the next calls:

...
HTProfile_newNoCacheClient(APP_NAME, APP_VERSION);
...
HTProxy_add( "http", MY_PROXY );
HTNoProxy_add( "localhost", NULL, 0 );

HTAlert_deleteOpcode( HT_A_USER_PW );
HTAlert_add( PromptUsernameAndPassword, HT_A_USER_PW );

HTAlert_setInteractive( YES );
...

Then I apply to requested web resource.
After that:
HTEventList_loop(...);

End of main function.

So when I try to get some resource abroad proxy (www.hotmail.com e.g.) - application, after receiving
"407: Proxy Authentication required", hunging up on call HTEventList_loop(...). 

When I apply to "http://localhost" (no proxy) - I receive from my local apache server authentication
request. After inputing of authentication information I get html page and application ends.
By other words - without proxy everything works fine.

So what's wrong?
Does anybody already revealed such a problem and  know what to do?

Thanx in advance.
Sergey.


Gmane