OTR Comm | 5 Feb 2004 07:41

libwww webbot


Hello,

If you are interested, I had to add an additional include to HTRobot.c
so it would compile.  I kept getting
compile errors:

<snip>
.../../modules/md5/.libs/libmd5.so -ldl -Wl,--rpath -Wl,/usr/local/lib
HTRobot.o: In function `calculate_linkRelations':
/usr/local/src/w3org/libwww/w3c-libwww-5.4.0_gafilterfish/Robot/src/HTRobot.c:146:
undefined reference to `HTSQLLog_addLinkRelationship'
/usr/local/src/w3org/libwww/w3c-libwww-5.4.0_gafilterfish/Robot/src/HTRobot.c:179:
undefined reference to `HTSQLLog_addLinkRelationship'
/usr/local/src/w3org/libwww/w3c-libwww-5.4.0_gafilterfish/Robot/src/HTRobot.c:207:
undefined reference to `HTSQLLog_addLinkRelationship'
HTRobot.o: In function `Robot_delete':
/usr/local/src/w3org/libwww/w3c-libwww-5.4.0_gafilterfish/Robot/src/HTRobot.c:808:
undefined reference to `HTSQLLog_close'
HTRobot.o: In function `redirection_handler':
/usr/local/src/w3org/libwww/w3c-libwww-5.4.0_gafilterfish/Robot/src/HTRobot.c:1011:
undefined reference to `HTSQLLog_addLinkRelationship'
HTRobot.o: In function `terminate_handler':
/usr/local/src/w3org/libwww/w3c-libwww-5.4.0_gafilterfish/Robot/src/HTRobot.c:1089:
undefined reference to `HTSQLLog_addEntry'
HTRobot.o: In function `RHText_foundAnchor':
/usr/local/src/w3org/libwww/w3c-libwww-5.4.0_gafilterfish/Robot/src/HTRobot.c:1334:
undefined reference to `HTSQLLog_addLinkRelationship'
/usr/local/src/w3org/libwww/w3c-libwww-5.4.0_gafilterfish/Robot/src/HTRobot.c:1409:
undefined reference to `HTSQLLog_addLinkRelationship'
(Continue reading)

Ceri Coburn | 9 Feb 2004 16:31
Picon

wwwlib parsing with own server/client implementation


Hi,

I would like to use the wwwlib in my application only for parsing.  I
have written my own server implementation for transport.  Is there a way
I can use the wwwlib to parse the HTTP header and HTML for a char*
within my application?

Thanks
Ceri

________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
http://www.messagelabs.com
________________________________________________________________________

Tim Serong | 9 Feb 2004 23:49
Favicon

RE: wwwlib parsing with own server/client implementation


Hi,

A very similar request came up several months ago, to use libwww for
parsing local files.  Below is what I suggested then, the second example
of which can be used to parse a char *.  I can't help with parsing
headers manually...  This will probably take some digging.

The simplest thing to do is supply a file URL (something like
file:///foo or file:///c:/foo.txt on Windows) for the Request, rather
than an HTTP URL.  libwww should then read the file from disk.

Alternately, you can hack up something like this (please excuse the C++
style):

    // Declare HTStream, so you can write to it directly
  typedef struct _HTStream
  {
    HTStreamClass * isa;
  } HTStream;

  ...

  HText_registerLinkCallback(myFoundLink);
    // register any other required callbacks here
  HTRequest * r = HTRequest_new();
    // this base URL will be used for resolving links
    // in the file being parsed
  HTRequest_setAnchor(r, HTAnchor_findAddress("http://baseurl/"));
  HTStream * parseStream = HTMLPresent(r, 0, WWW_HTML, WWW_PRESENT, 0);
(Continue reading)

Ceri Coburn | 10 Feb 2004 11:51
Picon

RE: wwwlib parsing with own server/client implementation


Hi,

Many thanks.  That's great.  I will try this.  Just need to find a way
to do the same with HTTP headers and them I will be really happy.  I
imagine writing my own parser for this would be a very tedious task.

Thanks again
Ceri

-----Original Message-----
From: Tim Serong [mailto:tim.serong <at> conceiva.com] 
Sent: 09 February 2004 22:49
To: Ceri Coburn; www-lib <at> w3.org
Subject: RE: wwwlib parsing with own server/client implementation

Hi,

A very similar request came up several months ago, to use libwww for
parsing local files.  Below is what I suggested then, the second example
of which can be used to parse a char *.  I can't help with parsing
headers manually...  This will probably take some digging.

The simplest thing to do is supply a file URL (something like
file:///foo or file:///c:/foo.txt on Windows) for the Request, rather
than an HTTP URL.  libwww should then read the file from disk.

Alternately, you can hack up something like this (please excuse the C++
style):

(Continue reading)

Akker, D van den | 12 Feb 2004 10:15

is libwww thread safe?

Hi,

I've made a threaded c++ wrapper for the libwww to get the content of a web-page, but have some problems now. In every thread I make a new instance of the libwww, but when I have 2 threads or more I get a Segmentation fault.

I've followed the "is libwww thread safe?"-thread, but i didn't get a clear answer of the question. I see a lot of messages about "run configure with --enable-reentrant to build thread-safe applications, but note that libwww isn't thread safe throughout the code as not all functions are reentrant". Is it thread safe?

Cu,
Daniël

Virginie LHUILLIER | 13 Feb 2004 16:40
Picon
Favicon

Re: cvs login failure


hello
I have the same problem, have you any solution?
thanks
Virginie

Baum, Dietmar | 26 Feb 2004 14:34
Picon
Favicon

www-lib and openssl


Hallo,

does the wwwssl example process client authentication and where the
client certificate / private key must be placed in the example?

thanks
dietmar

Picon
Favicon

serious libwww timers bug


I think there is a serious bug in implementation of internal timers in libwww.

I see cases when HTTimer_new() with valid parameter values causes endless 
recursion and stack overfloating.

Here is what I think is a root cause:

HTTimer_new() makes use of HTGetTimeInMillis function which supposed to return 
number of milliseconds since EPoch (00:00 January  1, 1970, UTC)

in Library/src/HTInet.c

PUBLIC ms_t HTGetTimeInMillis (void)
{
#ifdef WWW_MSWINDOWS
    return GetTickCount();
#else /* WWW_MSWINDOWS */
#ifdef HAVE_GETTIMEOFDAY
    struct timeval tp;
    gettimeofday(&tp, NULL);
    return(tp.tv_sec * 1000) + (tp.tv_usec / 1000);
#else
    return((ms_t) 0);
#endif
#endif /* !WWW_MSWINDOWS */
}

ms_t is declared as 

typedef unsigned long ms_t;

in Library/src/wwwsys.h

On my system unsigned long is 4 bytes or 32 bits.

Currently number of millisecods since epoch 
will be something like: 1 078 179 223 000 

which simply doesn't fit into ms_t (unsigned long) - 4 bytes
max 4 294 967 296

This causes major problems with timers (which end-up 
in endless recursion sometimes) due to arithmetics in 

HTTimer.c

    ms_t now = HTGetTimeInMillis();
    ms_t expires;
    HTTimer * pres;

    CHECKME(timer);
    expires = millis;
    if (relative)
        expires += now;
    else
        millis = expires-now;

I think correct declaration for ms_t in Library/src/wwwsys.h
should have been

typedef unsigned long long  ms_t;

See stack trace below for described problems

#0  0xc00a6d20 in ltostr ()
#1  0xc00a6e28 in ltostr ()
#2  0xc00a7584 in malloc ()
#3  0xc00824c0 in calloc ()
#4  0xc0b6c37c in HTMemory_calloc (nobj=1, size=8) at HTMemory.c:88
#5  0xc0b6b364 in HTList_addList (me=0x40041088, newObject=0x40041098 "") at HTList.c:88
#6  0xc0bd010c in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:251
#7  0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#8  0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#9  0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#10 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#11 0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#12 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#13 0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#14 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#15 0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#16 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#17 0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#18 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
...
this goes on

I hope this will get fixed in CVS.

--MG

PS. Sorry if this is repost. My first mail seems didn't make it to the list.

Picon
Favicon

serious libwww timers bug


I think there is a serious bug in implementation of internal timers in libwww.

I see cases when HTTimer_new() with valid parameter values causes endless 
recursion and stack overfloating.

Here is what I think is a root cause:

HTTimer_new() makes use of HTGetTimeInMillis function which supposed to return 
number of milliseconds since EPoch (00:00 January  1, 1970, UTC)

in Library/src/HTInet.c

PUBLIC ms_t HTGetTimeInMillis (void)
{
#ifdef WWW_MSWINDOWS
    return GetTickCount();
#else /* WWW_MSWINDOWS */
#ifdef HAVE_GETTIMEOFDAY
    struct timeval tp;
    gettimeofday(&tp, NULL);
    return(tp.tv_sec * 1000) + (tp.tv_usec / 1000);
#else
    return((ms_t) 0);
#endif
#endif /* !WWW_MSWINDOWS */
}

ms_t is declared as 

typedef unsigned long ms_t;

in Library/src/wwwsys.h

On my system unsigned long is 4 bytes or 32 bits.

Currently number of millisecods since epoch 
will be something like: 1 078 179 223 000 

which simply doesn't fit into ms_t (unsigned long) - 4 bytes
max 4 294 967 296

This causes major problems with timers (which end-up 
in endless recursion sometimes) due to arithmetics in 

HTTimer.c

    ms_t now = HTGetTimeInMillis();
    ms_t expires;
    HTTimer * pres;

    CHECKME(timer);
    expires = millis;
    if (relative)
        expires += now;
    else
        millis = expires-now;

I think correct declaration for ms_t in Library/src/wwwsys.h
should have been

typedef unsigned long long  ms_t;

See stack trace below for described problems

#0  0xc00a6d20 in ltostr ()
#1  0xc00a6e28 in ltostr ()
#2  0xc00a7584 in malloc ()
#3  0xc00824c0 in calloc ()
#4  0xc0b6c37c in HTMemory_calloc (nobj=1, size=8) at HTMemory.c:88
#5  0xc0b6b364 in HTList_addList (me=0x40041088, newObject=0x40041098 "") at HTList.c:88
#6  0xc0bd010c in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:251
#7  0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#8  0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#9  0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#10 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#11 0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#12 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#13 0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#14 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#15 0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#16 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
    repetitive=1 '\001') at HTTimer.c:259
#17 0xc0bcf75c in Timer_dispatch (cur=0x400410b8, last=0x40041088) at HTTimer.c:112
#18 0xc0bd0190 in HTTimer_new (timer=0x40041098, cbf=0x4000ea42, param=0x0, millis=600000,
relative=1 '\001',
...
this goes on

I hope this will get fixed in CVS.

--MG


Gmane