Christine Bush | 21 May 2013 17:44
Picon
Gravatar

Inquiry re: XML data

Hi there,


    Is this list still active? Where does one find XML data dumps from Wikipedia?

Christine Bush

On Tuesday, May 21, 2013, wrote:
Send Xmldatadumps-l mailing list submissions to
        xmldatadumps-l-RusutVdil2iUmLTBS4g/ug@public.gmane.orgmedia.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
or, via email, send a message with subject or body 'help' to
        xmldatadumps-l-request-RusutVdil2icGmH+5r0DM0B+6BGkLq7r@public.gmane.org

You can reach the person managing the list at
        xmldatadumps-l-owner <at> lists.wikimedia.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Xmldatadumps-l digest..."
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
wp mirror | 17 May 2013 20:36
Picon

mwxml2sql

Dear Ariel,

I submitted the patches upstream.

0) Review

The changes may be found at <https://gerrit.wikimedia.org/r/64343/>.

1) Git

In the hope that others may find it useful, I am posting the sequence
of Git commands used, organized as a Makefile.

#-----------------------------------------------------------------------------+
# Makefile for submitting patches to the Wikimedia Foundation                 |
# Copyright (C) 2013 Dr. Kent L. Miller.  All rights reserved.                |
#                                                                             |
# This program is free software: you can redistribute it and/or modify        |
# it under the terms of the GNU General Public License as published by        |
# the Free Software Foundation, either version 3 of the License, or (at       |
# your option) any later version.                                             |
#                                                                             |
# This program is distributed in the hope that it will be useful, but         |
# WITHOUT ANY WARRANTY; without even the implied warranty of                  |
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU           |
# General Public License for more details.                                    |
#                                                                             |
# You should have received a copy of the GNU General Public License           |
# along with this program.  If not, see <http://www.gnu.org/licenses/>")      |
#-----------------------------------------------------------------------------+
GERRIT   = wpmirrordev <at> gerrit.wikimedia.org
PORT     = 29418
WMFLABS  = wpmirrordev <at> bastion.wmflabs.org

all: clone hooks pull checkout branch edit diff add commit rebase review

clone:
        # create directory `dumps' and initialize a repository in it
        # copy all commit objects and head references (from remote to local)
        # add `remote repository reference' named `origin' (saves typing)
        # add `remote heads' named `origin/[head-name]'
        # add `HEAD' to track `origin/master'
        git clone ssh://$(GERRIT):$(PORT)/operations/dumps

hooks:
        # get `pre-commit-hook' to add `change id' to commit summary
        scp -p -P $(PORT) $(GERRIT):hooks/commit-msg ~/dumps/.git/hooks/.
        cd dumps; git review -s

pull:
        # list `remote heads'
        cd dumps; git branch -r
        # setup tracking branch `ariel'
        cd dumps; git branch --track ariel origin/ariel
        # add new commit objects (if any)
        # update `remote heads'
        cd dumps; git fetch origin
        # update `local heads' (`master' and `ariel') to `remote-heads'
        # merge `origin/HEAD' into `HEAD'
        cd dumps; git pull origin

checkout:
        # point `HEAD' to `ariel's commit object
        cd dumps; git checkout ariel
        cd dumps; git status

branch:
        # create head `wpmirrordev'
        # point `wpmirrordev' to `ariel's commit object
        cd dumps; git branch wpmirrordev ariel
        # point `HEAD' to `wpmirrordev's commit object
        cd dumps; git checkout wpmirrordev
        cd dumps; git status

edit:
        # apply patched files
        cp temp/* dumps/xmlfileutils/.

diff:
        # diff files (but not added files) against `HEAD'
        cd dumps; git diff
        # list changed files against `HEAD'
        cd dumps; git status

add:
        # stage the files to be committed
        cd dumps/xmlfileutils; git add mwxml2sql.c sql2txt.c sqlfilter.c
        cd dumps/xmlfileutils; git add Makefile
        # diff added files against `HEAD'
        cd dumps; git diff --cached
        # list changed files against `HEAD'
        cd dumps; git status

commit:
        # create `commit object'
        # point `HEAD' to the new `commit object'
        cd dumps; git commit -m "Fix for compatibility with help2man
and Debian Policy"
        # list all commits from `HEAD' back to initial commit
        cd dumps; git log

rebase:
        # add new commit objects (if any)
        # update `remote head' `origin/ariel'
        # merge `origin/ariel' into `ariel'
        # point `ariel' to `origin/ariel's commit object
        cd dumps; git pull origin ariel
        # rebase `wpmirrordev' branch on updated `ariel' head
        cd dumps; git rebase ariel

review:
        # push changes to Gerrit
        cd dumps; git review -R ariel

#-----------------------------------------------------------------------------+

shell:
        ssh -A $(WMFLABS)

purge:
        rm -r dumps

clean:
        rm -f *~

Sincerely Yours,
Kent

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
wp mirror | 17 May 2013 08:18
Picon

mwxml2sql

Dear Ariel,

I am having some trouble submitting the patches.  On-line examples
that I have seen do not cover the case of submitting patches to a
branch.  Here is what I have tried so far:

0) clone, hooks, and review setup

(shell) git clone ssh://wpmirrordev <at> gerrit.wikimedia.org:29418/operations/dumps
(shell) scp -p -P 29418
wpmirrordev <at> gerrit.wikimedia.org:hooks/commit-msg ~/dumps/.git/hooks/.
(shell) cd dumps
(shell) git review -s
(shell) git status

1) pull the ariel branch

(shell) git pull origin master
(shell) git branch ariel
(shell) git pull origin ariel  # throws errors
(shell) git status # emits a long list of revisions

2) commit

(shell) git commit -a # needed to quell errors from the `pull'
(shell) git checkout ariel
(shell) git status

3) create branch for the patches

(shell) git branch wpmirrordev master
(shell) git checkout wpmirrordev
(shell) git status

4) apply patches

(shell) cp ../patched-files/* xmlfileutils/.
(shell) git diff
(shell) git status

5) commit the patches

(shell) cd xmlfileutils
(shell) git add mwxml2sql.c sql2txt.c sqlfilter.c Makefile
(shell) git diff --cached
(shell) git commit
(shell) git status

6) rebase

(shell) git pull origin master
(shell) git rebase master
(shell) cat .gitreview
gerrit]
host=gerrit.wikimedia.org
port=29418
project=operations/dumps.git
<<<<<<< HEAD
defaultbranch=master
=======
defaultbranch=ariel
>>>>>>> 3b82bbea24f999f1a5af721d37ec0684615bc3ae
(shell) git checkout ariel  # need to fix error in .gitreview

7) submit for review

(shell) review -R ariel
Enter passphrase for key '/home/wikimedia/.ssh/id_rsa':
Creating a git remote called "gerrit" that maps to:
        ssh://wpmirrordev <at> gerrit.wikimedia.org:29418/operations/dumps.git
Enter passphrase for key '/home/wikimedia/.ssh/id_rsa':
You have more than one commit that you are about to submit.
The outstanding commits are:

bf268a2 (HEAD, origin/master, origin/HEAD, gerrit/master, ariel) pep8
whitespaces fixing
6dff615 pep8: E302 expected 2 blank lines, found 1
671beaa .pep8 configuration file
1235eaa Merge "Add .gitreview file"
9a814c2 README is obsolete :(
67a61fa Add .gitreview file
47b6db7 add CC-BY_SA license for text, plus pointer to terms of use
689fa7c dump iwlinks table
e4bc572 Kill .cvsignore, svn ignore is doing the same
25f4a46 svn:eol-style native

Is this really what you meant to do?
Type 'yes' to confirm: yes
Enter passphrase for key '/home/wikimedia/.ssh/id_rsa':
X11 forwarding request failed on channel 0
remote: Processing changes: refs: 1, done
To ssh://wpmirrordev <at> gerrit.wikimedia.org:29418/operations/dumps.git
 ! [remote rejected] HEAD -> refs/publish/ariel/ariel (no new changes)
error: failed to push some refs to
'ssh://wpmirrordev <at> gerrit.wikimedia.org:29418/operations/dumps.git'
make: *** [review] Error 1
wikimedia <at> darkstar-7:~$ QDBusConnection: session D-Bus connection
created before QCoreApplication. Application may misbehave.
QDBusConnection: session D-Bus connection created before
QCoreApplication. Application may misbehave.
QSystemTrayIcon::setVisible: No Icon set
Connecting to deprecated signal
QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString)

Any help is welcome.

Sincerely Yours,
Kent

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Ariel T. Glenn | 16 May 2013 17:53
Picon

wikidata dumps woes

I'm seeing some issues with the history phase of the wikidata dumps
taking a huge amount of memory and causing the server they are on to
swap. I've shot the jobs and left less worker running on the one host
for now; I'll investigate in depth tomorrow.

Ariel

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Yannick Guigui | 16 May 2013 12:42
Picon

please about image

Please i m french,i look for a simple link to download all the wikipedia image.I already use all xml dump i my local wikimedia but i don't have image.It's it also possible to have a minified image because i think all the image have several GigaOctet.

Tank to all


--
guigui777
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
wp mirror | 7 May 2013 21:00
Picon

mwxml2sql

Dear Ariel,

0) INTRO

I am close to releasing WP-MIRROR 0.6.  It will exhibit reliability
and performance improvements in all areas of operation.

As a part of the development process, I have been testing `mwxml2sql'
with a view towards using it to replace `importDump.php' in WP-MIRROR
0.6.  These tests have worked out well.

There are however some issues that I should discuss with you.

1) Packaging

I distribute WP-MIRROR as a DEB package.  In order to use `mwxml2sql',
I would have to package your tools as a separate DEB package.  This I
have done.  However, in the process, I had to apply some patches; and
the question now arises as to how to submit them upstream.

2) Makefile

I have patched the `Makefile' that you distribute with `mwxml2sql'
because:  a) the `install' target must use `install' rather than `mv';
and b) it lacked a `deinstall' target.  Both changes are required by
Debian policy.

3) Man pages

Man pages are also required by Debian policy.  To that end, I have
written man pages for `mwxml2sql', `sql2txt', and `sqlfilter'.
However, the better approach would be to patch those tools so that man
pages could automatically be generated using `help2man'.  The later
approach has the benefit of eliminating duplication, and hence, helps
keep code and documentation in sync.

4) Upstream

I would like to know:

a) if patches are welcome upstream; and, if so,
b) what procedures you prefer for submitting, reviewing, and applying
patches; and
c) whether you would prefer that I submit the man pages I wrote, or
submit patches to your utilities to make them compatible with
`help2man'.

Sincerely Yours,
Kent

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
yossi galanty | 3 May 2013 17:41
Picon

(no subject)



--
יוסי גלנטי
0502441015
galantyo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Giovanni Luca Ciampaglia | 2 May 2013 22:40
Picon
Gravatar

Pagecounts data missing (2009/09/21 - 2009/10/01)

Hi,

I noticed that some pagecounts data files are missing, namely the files in the 
interval (20090921160000 - 20091001000000) (ends excluded).

See http://dumps.wikimedia.org/other/pagecounts-raw/2009/2009-09/

Does anybody know the reason why these data are missing?

Best,

--

-- 
Giovanni Luca Ciampaglia

Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University

✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
☞ http://cnets.indiana.edu/
✉ gciampag <at> indiana.edu

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Petr Onderka | 2 May 2013 15:40
Picon

Incremental XML dumps GSoC proposal

I realized I didn't post my proposal to the list yet (I have added it to the official GSoC site few days ago), so here it is:

http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps

In short, the project aims to create new format for dumps (which allow users to download parts of the database of Wikimedia projects). The primary advantage of this new format will be that it should take shorter time to create the dump, because the previous dump can be reused.

Any comments or co-mentors (as far as I know, Ariel Glenn is currently the only potential mentor on this project) are welcome.

Petr Onderka
[[en:User:Svick]]
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Yannick Guigui | 27 Apr 2013 12:42
Picon

Encodage with french dump

Hi for everybody,

Im french,please sorry for my english.

Please correct me if it's not the right place for my help message

I have a problem with my French dump of Wikipedia using XML dump. I'm having a problem with accented characters.

When i install Mediawiki, I choose innoBdb, this my MySQL configuration:

Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 179
Server version: 5.5.8-log MySQL Community Server (GPL)
mysql > status
c:/wamp/bin/mysql/mysql5.5.8/bin/mysql.exe Ver 14.14 Distrib 5.5.8, for Win32 ( x86)

Connection id: 179 Current database: Current user: root <at> localhost SSL: Not in use Using delimiter: ; Server version: 5.5.8-log MySQL Community Server (GPL) Protocol version: 10 Connection: localhost via TCP/IP Server characterset: latin1 Db characterset: latin1 Client characterset: cp850 Conn. characterset: cp850 TCP port: 3306 Uptime: 3 hours 47 min 6 sec Threads: 8 Questions: 35648 Slow queries: 3 Opens: 976 Flush tables: 1 Open tables: 50 Queries per second avg: 2.616

I'm using Mwdumper,this my code for the command

set class=mwdumper.jar;driver_mysql.jar set data="frwikis_fr.xml" java -client -classpath %class% org.mediawiki.dumper.Dumper "--output=mysql://127.0.0.1/my_wiki?user=root&password=" "--format=sql:1.5" %data% --default-character-set=utf8 pause

I don't know the java language but with this,the tranfert to sql database is good, but the accented characters are not good when I try to retrieve articles. What can I do? Tank's a lot






--
guigui777
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Wyatt Winters | 24 Apr 2013 02:18
Favicon

GSoC 2013 - Incremental XML Dumps

Hello everyone!

My name is Wyatt, and I would like to present to you the first draft of my GSoC proposal, available here:
https://www.mediawiki.org/wiki/User:Wywin
and on the official Melange. On the Melange, should I clean out the Mediawiki syntax, and convert it to look nice in their formatting, or is leaving it wiki-fied ok?

I am not particularly familiar with mailing lists and their specific etiquette, so please correct me if I do anything too outrageous.

I look forward to your feedback, and hopefully working with you in the future, whether I am accepted for GSoC or not!

Wyatt Winters

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Gmane