antoine sauray | 8 Feb 21:04
Picon
Gravatar

wget probleme

hello,

First i wanted to say that wget in an incredible software and i use it
veryoften. The problem is that i use it on windows and there's something i
can't do because of this. I can't use cookies !!
I can't go back to linux because i need to stay on windows . I've got a lot
of softwares i really need. I don't know if you still maintain the windows
version but it would be cool if someone could add this function .

thank you

don't pay attention to my english cause i'm french

Peter Genczler | 1 Feb 14:36
Picon
Gravatar

-q option not work

Hello,

I would like to use wget for windows(Wget 1.11.4) refresh my dtdns
account only, so I use the -q option, but wget makes output file :-(.

My windows version: Win XP SP3 hungarian.

Thanks,

Peter

Gijs van Tulder | 1 Feb 00:23
Picon

Fix: Large files in WARC

Hi,

Another small problem in the WARC section: wget crashes with a 
segmentation fault if you have WARC output enabled and try to download a 
file larger than 2GB. I think this is because of the size_t, ftell and 
fseek in warc.c.

The attached patch changes the references from size_t to off_t, ftell to 
ftello, fseek to fseeko. On my 64-bit system this seemed to fix the 
problem (but I'm not an expert in these matters, so maybe this doesn't 
hold for 32-bit systems).

Regards,

Gijs
Sven Herzberg | 31 Jan 18:08
Gravatar

Wget issues with delicious (SAN)

Hi,

I used to execute a script to backup my bookmarks from delicious.com. However, since quite some time, wget
doesn't connect to the remote site anymore, the error is:

> $ wget 'https://api.del.icio.us'
> --2012-01-31 12:58:52--  https://api.del.icio.us/
> Resolving api.del.icio.us... 184.72.40.0, 184.72.44.135, 50.18.156.75, ...
> Connecting to api.del.icio.us|184.72.40.0|:443... connected.
> ERROR: certificate common name `d-static.com' doesn't match requested host name `api.del.icio.us'.
> To connect to api.del.icio.us insecurely, use `--no-check-certificate'.
> Unable to establish SSL connection.

This looks like a trivial error, one might think. However, while the common name of the certificate indeed
doesn't "api.del.icio.us", it includes that name (among others) in the list of subject alternative names:

> DNS-Name: api.del.icio.us
> DNS-Name: www.delicious.com
> DNS-Name: d.me
> DNS-Name: delicious.com
> DNS-Name: d-static.com

Firefox, Chrome and Safari seem to be happy with this setup, wget isn't. Is there a reason behind this or is it
just awaiting a patch to become fixed?

Kind regards,
  Sven

MS | 30 Jan 14:00
Picon
Gravatar

Download Options "--tries=number".

I have a query about the download options "--tries=number".

The name of the option "--tries" suggest the total number of tries, 
while the documentation says it refers to the number of retries. [This 
may be an English language 'anomaly', Eg. 4 versions of a document - 1 
draft and 3 re-drafts.]

I am using wget in a script and require it to make one single attempt to 
download a file, will I achieve that with "--tries=1" or does that 
actually mean - if the first attempt fails, retry once (in other words 
try twice then stop)?

If "--tries=1" does mean 'try twice then stop' and since "--tries=0" 
means infinite retrying is there any way to make wget try exactly once 
and then stop?

Also does "--timeout=seconds" apply to the whole operation, including 
any retries? If so it does not seem to work accurately with "GNU Wget 
1.12" but that may be because it only applies to each 'try'?

I realize that the way I am using wget is not exactly what it has been 
designed for and that 'curl' is actually more suitable. In fact my 
script uses curl by default. The script checks to see if curl and wget 
are installed, if curl is not, but wget is, then rather than output a 
message saying 'please install curl', it uses wget.

Sorry it this mail is longer than need be, any help would be 
appreciated. Many thanks.

(Continue reading)

Pawel Pabian | 29 Jan 09:47
Gravatar

Segmentation fault on broken response

Hello

I want to report Segmentation fault that happens on broken response.
The response causing error is "HTTP/HTTP/1.0 200 OK\r\n\r\n".
Wget version "GNU Wget 1.12 built on darwin11.0.0.".
OS X 10.7.2.

The easies way to reproduce is to run script below (which returns broken response on purpose) and visit
through wget URL it prints:
perl -e 'use HTTP::Daemon; my $d = HTTP::Daemon->new; print $d->url, $/; while ( my $c = $d->accept ) {
$c->get_request; $c->send_response("HTTP/HTTP/1.0 200 OK\r\n\r\n"); $c->close;}'

if you need any additional info you can find me on IRC #perl6.

bbkr

Gijs van Tulder | 27 Jan 09:36
Picon

Two fixes: Memory leak with chunked responses / Chunked responses and WARC files

Hi,

Here are two small patches. I hope they will be useful.

First, a patch that fixes a memory leak in fd_read_body (src/retr.c) and 
skip_short_body (src/http.c) when it retrieves a response with 
"Transfer-Encoding: chunked". Both functions make calls to fd_read_line 
but never free the result.

Second, a patch to the fd_read_body function that changes the way 
chunked responses are saved in the WARC file. Until now, wget would 
write a de-chunked response to the WARC file, which is wrong: the WARC 
file is supposed to have an exact copy of the HTTP response, so it 
should also include the chunk headers.

The first patch fixes the memory leaks. The second patch changes 
fd_read_body to save the full, chunked response in the WARC file.

Regards,

Gijs

markk | 24 Jan 16:22
Picon

Feature request/suggestion: option to pre-allocate space for files

Hi,

This post is to suggest a new feature for wget: an option to pre-allocate
disk space for downloaded files. (Maybe have a --pre-allocate command-line
option?)

The ability to pre-allocate space for files would be useful for a couple
of reasons:

- By pre-allocating all space before downloading, the risk of exiting due
to a disk-full error is avoided. When downloading from a server which
doesn't support resuming downloads, an accidental disk full condition
means you have to re-download the whole file after freeing up some disk
space. That wastes a lot of time and network bandwidth.

- Disk fragmentation can be reduced. Downloading large files can take many
hours. While wget is downloading, much other disk activity can be caused
by other programs (web browser cache, email client etc.). The result is
the wget output file can end up unnecessarily fragmented. And likewise,
files written by other programs while wget is running end up more
fragmented.

On Linux, fallocate() and posix_fallocate() can be used to pre-allocate
space. The advantage of fallocate() is that, by using the
FALLOC_FL_KEEP_SIZE flag, space is allocated but the apparent file size is
unchanged. That means resuming with --continue works as normal.
posix_fallocate() on the other hand, sets the file length to its full
size, meaning that --continue won't work unless there were some way to
specify the byte offset that wget should continue from.

(Continue reading)

Henrik Holst | 24 Jan 10:50
Favicon

Re: Fwd: Regarding wget to download webpage

I looked at your web site and it does not perform standard http
authentication so --username and --password cannot be used to logon to
that page.

You have to supply the username and password using --post-data there
aswell. If you had followed my advice to use the Live Headers extension
with Firefox you would have seen exactly what to do, please use that
tool aswell as learn some http basics and you will soon learn how to
perform what you want. Because since we do not have access to that site
of yours (no username or password) we as a community will have quite
some hard time to tell you exactly how to proceed since we cannot test
things at our end.

Anyways, as I wrote I tested to perform a logon attempt with the site
and with Live HTTP Headers extension active I could see that the
authentication should be performed like this:

wget --post-data "detour=https%3A%2F%2Fwww.collabnet.timeinc.net%
2F&loginID=username&password=password&Login=Login" --save-cookies
cookies.txt "https://www.collabnet.timeinc.net/servlets/TLogin"

Replace the "username" and "password" in the post data with your account
details.

However since I have no account on that site of yours, I do not know if
this really works and whether the detour=xxx thing is really needed and
whether you have to also add a Referer: header or not to the request.

So if I where you:
1. Install Live HTTPS Headers extension in Firefox
(Continue reading)

maurice van putten | 23 Jan 04:26
Picon
Gravatar

question


   Hi,

      Could you explain how to use wget to retrieve all the hyperlinks only (up to a certain number, or down to a
certain depth) generated
   by an Internet search engine (Google, Yahoo, Bing, AskJeeves,...)? That is, to use wget to retrieve these
hyperlinks in a list, without
   downloading each.

      Looking forward to hearing from you,

   Best regards,

   Maurice van Putten 
 		 	   		  
Ángel González | 20 Jan 18:07
Picon

Re: Regarding wget to download webpage

On 20/01/12 12:00, Bhargavi N wrote:
> Hello Everyone,
>   I am extremely thankful to all of you for the help regarding wget. 
> But still i am unable to get the right page downloaded.
>
> Requirement:
>  I need to run adhoc sql query on remote web page. ie; collabnet. The 
> webpage has text area where i can enter sql query and then click submit.
>
> Once i submit sql runs and results are displayed in the page.
> This REMOTE page is call to servlet ie; 
> https://collabnet.net/servlets/adhocquery for example.
>
> I want to pass SQL query as form data through POST/GET method and get 
> the sql results page downloaded to my local directory on LINUX box. I 
> am running wget on LINUX host on commandline / shell script.
>
> I will invoke wget with the URL 
> https://collabnet.net/servlets/adhocquery and formdata ie; SQL as 
> --post-data "select * from emp". Finally i want to download the sql 
> query results page to my local directory.
>
> I tried all the options that you suggested me, but still nable to get 
> the right page. Every time it is displaying me the index page for 
> collabnet, which i do not need. I need to run the servelet page with 
> the formdata passed to it.
What's the name of the field?
Supposing you entered that into a textbox called 'sqlquery', the 
post-data would look like:
--post-data "sqlquery=select%20*%20from%20emp".
(Continue reading)


Gmane