Ariel T. Glenn | 1 Oct 2010 21:01
Picon

dataset1 maintenance Sat Oct 1 (dumps unavailable)

Folks,

The server that hosts the XML dumps will be undergoing maintenance (it's
going to be moved to another rack), on Saturday Oct 1 starting at about
15:00 GMT.  We expect the server to be back up by 17:00 GMT.  During
that time XML dumps will be unavailable.  

In other news the first run of the full en.wikipedia history in chunks
has completed.  The recompression to 7z has not been done, nor the
recompression into a single large bz2 file for people who prefer it.
However, for those interested, please have a look at the files:

http://dumps.wikimedia.org/enwiki/20100904/

Each file has its own mediawiki header and footer, each covering a range
of 2 million (sequential) page IDs, except for the last "chunk" which
covers rather more than it should.

As you can see, the chunk sizes are rather disparate.  The next such run
should split up more evenly with roughly the same number of revisions in
each chunk, and as such, they should all take nearly the same time to
complete.

Ariel Glenn

Ariel T. Glenn | 2 Oct 2010 19:00
Picon

Re: dataset1 maintenance Sat Oct 1 (dumps unavailable)

The server that hosts XML dumps was moved this morning and all
maintenance completed.  The dumps for dewiki, arwiki, srwiki and
ptwikiquote were restarted from the beginning; everything else should be
running normally. (Except for enwiki, which is a special case.)

Ariel Glenn

emijrp | 4 Oct 2010 14:21
Picon

Re: dataset1 maintenance Sat Oct 1 (dumps unavailable)

So, will English Wikipedia dumps be created with this new method from now?

2010/10/2 Ariel T. Glenn <ariel-AeOJrEpdGNeGglJvpFV4uA@public.gmane.org>
The server that hosts XML dumps was moved this morning and all
maintenance completed.  The dumps for dewiki, arwiki, srwiki and
ptwikiquote were restarted from the beginning; everything else should be
running normally. (Except for enwiki, which is a special case.)

Ariel Glenn



_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

emijrp | 4 Oct 2010 14:22
Picon

Re: Domas visits logs

Also, if this data[1] is public, it may be added to a section in download.wikimedia.org too. Thanks.

[1] http://download.wikimedia.org/fundraising/

2010/9/28 Ariel T. Glenn <ariel-AeOJrEpdGNeGglJvpFV4uA@public.gmane.org>
Yes, I need to get on this.  Thanks for reminding me.

Ariel

Στις 28-09-2010, ημέρα Τρι, και ώρα 16:25 +0200, ο/η emijrp έγραψε:
> Hi;
>
> Some weeks ago, I read about WMF had downloaded every hour log from
> the Domas website. In Internet Archive are only in the date range from
> December 2007 to September 2009. Now, at Domas website, are available
> the last few months (from April 2010 to now). So, the dta from October
> 2009 to March 2010 is missing.
>
> Can be enabled a new section in download.wikimedia.org with a link to
> the directory where WMF saves a copy of these logs?
>
> Regards,
> emijrp
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l <at> lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l



Erik Zachte | 11 Oct 2010 02:06
Picon

en dump stalled

English Wikipedia apparently has stalled. Yahoo dump step should take one or two days, but is running since October 2

 

Erik Zachte

emijrp | 12 Oct 2010 15:46
Picon

Public dumps

Hi;

Do you know more projects which publish public dumps? I know Wikimedia, Wikia and Citizendium. Any more? I'm working in a tool for analysing dumps, and I want to add support to all of them.

Thanks,
emijrp

Erik Moeller | 13 Oct 2010 00:18
Picon
Gravatar

Re: Public dumps

2010/10/12 emijrp <emijrp@...>:
> Do you know more projects which publish public dumps? I know Wikimedia,
> Wikia and Citizendium. Any more? I'm working in a tool for analysing dumps,
> and I want to add support to all of them.

In my experience, most wikis don't make these available, or only do it
infrequently. You can find a one-off copy of the Wikitravel database
here (listed at the bottom of the talk page):

http://wikitravel.org/en/Wikitravel_talk:Database_dump

OmegaWiki makes a full, regular dump available (in MySQL format) here:

http://www.omegawiki.org/Development

Note that most of the actual data resides in non-standard tables added
by the OmegaWiki software.

I didn't find a dump for WikiHow, but they seem to be the kind of
project that would make one available if asked. Finally, I came across
this wiki which makes dumps available:

http://wiki.osdev.org/OSDevWiki:About
--

-- 
Erik Möller
Deputy Director, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Jamie Morken | 13 Oct 2010 00:32
Picon
Favicon

Re: Public dumps


Hi,

There is the openstreetmaps project, using xml .osm format:

"A new version of planet.osm is released weekly (currently every Thursday morning). We have these, going back to the start of April 2006. The current size of a planet.osm file is over 171 Gigabytes.(reduced to 11GB with bzip2 compression) as of September 2010."

from: "http://wiki.openstreetmap.org/wiki/Planet.osm"

There is also WikiHow, http://en.wikipedia.org/wiki/WikiHow

They license their content with Creative Commons, yet they don't provide public dumps of the site.  I have asked them to about this in the past on their forum with no luck so far.

cheers,

Jamie




----- Original Message -----
From: emijrp <emijrp-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Tuesday, October 12, 2010 6:47 am
Subject: [Xmldatadumps-l] Public dumps
To: xmldatadumps-l-RusutVdil2icGmH+5r0DM0B+6BGkLq7r@public.gmane.org

> Hi;
>
> Do you know more projects which publish public dumps? I know
> Wikimedia,Wikia and Citizendium. Any more? I'm working in a tool
> for analysing dumps,
> and I want to add support to all of them.
>
> Thanks,
> emijrp
>
Platonides | 19 Oct 2010 23:31
Picon

Re: Xmldatadumps-l Digest, Vol 9, Issue 1

jcms wrote:
> Saludos a Todos.Que es?
> ondbzip2???? Es posible mejor ,para traduccion por favor.Gracias,Mr.Serguey
> puede ayudar usted en las traducciones,por favor,he leido todo,y me parece
> interesante.Dr.Juan Cesar Martinez

Dbzip2 es un programa para comprimir en el formato bzip2 [1] pero usando
varios procesos [2].

1- http://es.wikipedia.org/wiki/Bzip2
2- http://www.mediawiki.org/wiki/Dbzip2

> He leido todo en absoluto.Hace falta involucrar mas La Wiki,y enlazarla mas.
> Repito necesario elazar mas la wikipedia,atraer gente joven,voluntaria, no 
>dejar que la Wikipedia ,toque fondo,Yo propongo muy respetuosamente
enlazar
>con Facebook,Twitter,es el lugar mas vistado por los Jovenes,tambien
revisar
>prensas,Blogs, y Los Bloguer por obligacion deben condicionar el
anuncio de la Wikipedia.

Yo diría que estás -como mínimo- mandando el mensaje a la lista
equivocada. Lo que dices tiene poco o nulo sentido.

Por último, tienes el reloj de tu computadora atrasado 4 días, 5 meses y
9 años.


Gmane