Rob Lanphier | 7 Jan 07:58 2011
Picon

Feature requirements for WMF's analytics infrastructure

Hi everyone,

I've only posted once before here, and didn't do much of an intro back
then, so let me do one now.  I'm the Program Manager for General
Engineering at Wikimedia Foundation, which is the slice of the WMF
Engineering organization that does infrastructure-related software
development.  One piece we're responsible for is the analytics
infrastructure.

We're in the process of planning our software development for
analytics for the coming months, so we've had a few conversations, and
Howie Fung and I spent some time planning and writing up our thoughts
on feature prioritization here:
http://strategy.wikimedia.org/wiki/Task_force/Analytics/Feature_prioritization

This is a really rough cut, and something we haven't fully discussed
within the Foundation, so don't take this as something that is coming
down from on high.  There are some things on the list that are well
underway, but many things are things we're just getting started on.

Barring any objections here, we'd like to use this mailing list as our
primary venue for discussing general prioritization of analytics
features.  We know we need a place that we can tell WMF employees to
subscribe if they're interested in this stuff, and nothing we're
discussing should be confidential.  Rather than starting a new mailing
list, we'd like to try using this list for a bit (in combination with
relevant talk pages on documents referenced here).  If it turns out
we're generating enough traffic to warrant splitting off or if this
list isn't working out for whatever reason, we'll figure out some
alternate plan.
(Continue reading)

Liam Wyatt | 10 Jan 05:35 2011
Picon

Re: Feature requirements for WMF's analytics infrastructure

Hi Rob,

For one, I'm super pleased that we're taking a wholistic approach to
improving the analytics on WMF project. I have been hoping that we
make it easier to extract x, y or z stats/metrics on an ad hoc basis,
but to actually get proper analytics built right in is a giant leap
beyond what I thought was possible.
And secondly, as far as I'm personally concerned, this research-l
mailing list would seem an appropriate place to host discussions about
the analytics project in the manner in which you described.

One question: as I understand it, one of the key priorities of this
analytics project is the installation of OpenWebAnalytics (which
AFAICT will be similar to GoogleAnalytics but open source and also
compliant with the WMF's stringent privacy policy). If so, will the
full array of anonymised analytics be visible to everyone live, or
will the results be released in a summarised format on a regular
basis? That is, will the public/wikimedians/press be able to see the
same thing that the WMF can see and at the same time?

Finally, if I may just throw in a little request to the "wishlist" -
one thing that GLAM partners would really like to be able to do is
easily produce for themselves a "report card" of their organisation's
relationship to Wikimedia over time. Currently, we make do with
producing ad hoc stats for them based maingly on magnus' tools
(especially baGLAMa and GLAMorous) and other things like
linkypedia.inkdroid.org . It would be brilliant if a GLAM partner
could quickly and easily produce a *pretty* report that showed how
their images were being used (number of usages, number of views...),
how our external links to their site were used (most popular referral
(Continue reading)

Gerard Meijssen | 10 Jan 12:02 2011
Picon

Re: Feature requirements for WMF's analytics infrastructure

Hoi,
In order to celebrate 10 years and prepare for the next 10 years, there will be a hackaton in Amsterdam. We will concentrate on GLAM stuff. Yes, we hope that Erik Zachte will have time to come as well, sadly doubtful, and one of the things high on the list is to streamline some of Magnus' wonderful tools.
Thanks,
      GerardM

On 10 January 2011 05:35, Liam Wyatt <liamwyatt <at> gmail.com> wrote:
Hi Rob,

For one, I'm super pleased that we're taking a wholistic approach to
improving the analytics on WMF project. I have been hoping that we
make it easier to extract x, y or z stats/metrics on an ad hoc basis,
but to actually get proper analytics built right in is a giant leap
beyond what I thought was possible.
And secondly, as far as I'm personally concerned, this research-l
mailing list would seem an appropriate place to host discussions about
the analytics project in the manner in which you described.

One question: as I understand it, one of the key priorities of this
analytics project is the installation of OpenWebAnalytics (which
AFAICT will be similar to GoogleAnalytics but open source and also
compliant with the WMF's stringent privacy policy). If so, will the
full array of anonymised analytics be visible to everyone live, or
will the results be released in a summarised format on a regular
basis? That is, will the public/wikimedians/press be able to see the
same thing that the WMF can see and at the same time?

Finally, if I may just throw in a little request to the "wishlist" -
one thing that GLAM partners would really like to be able to do is
easily produce for themselves a "report card" of their organisation's
relationship to Wikimedia over time. Currently, we make do with
producing ad hoc stats for them based maingly on magnus' tools
(especially baGLAMa and GLAMorous) and other things like
linkypedia.inkdroid.org . It would be brilliant if a GLAM partner
could quickly and easily produce a *pretty* report that showed how
their images were being used (number of usages, number of views...),
how our external links to their site were used (most popular referral
paths, total traffic, most linked-from categories...) and how articles
about things relate to them are used (quality improvement over time,
combined pageviews for categories important to them...). Ideally,  if
this could generate into a report fit to show to senior management, I
suspect that we would have much greater success with enticing more
GLAMs to move towards free-culture. All "whishlist" stuff I know, but
I thought I might as well ask :-)

-Liam / witty lama

On 07/01/2011, Rob Lanphier <robla <at> wikimedia.org> wrote:
> Hi everyone,
>
> I've only posted once before here, and didn't do much of an intro back
> then, so let me do one now.  I'm the Program Manager for General
> Engineering at Wikimedia Foundation, which is the slice of the WMF
> Engineering organization that does infrastructure-related software
> development.  One piece we're responsible for is the analytics
> infrastructure.
>
> We're in the process of planning our software development for
> analytics for the coming months, so we've had a few conversations, and
> Howie Fung and I spent some time planning and writing up our thoughts
> on feature prioritization here:
> http://strategy.wikimedia.org/wiki/Task_force/Analytics/Feature_prioritization
>
> This is a really rough cut, and something we haven't fully discussed
> within the Foundation, so don't take this as something that is coming
> down from on high.  There are some things on the list that are well
> underway, but many things are things we're just getting started on.
>
> Barring any objections here, we'd like to use this mailing list as our
> primary venue for discussing general prioritization of analytics
> features.  We know we need a place that we can tell WMF employees to
> subscribe if they're interested in this stuff, and nothing we're
> discussing should be confidential.  Rather than starting a new mailing
> list, we'd like to try using this list for a bit (in combination with
> relevant talk pages on documents referenced here).  If it turns out
> we're generating enough traffic to warrant splitting off or if this
> list isn't working out for whatever reason, we'll figure out some
> alternate plan.
>
> While we suspect that many of the details will be of specific interest
> to Foundation employees (who are relying on much of this information
> to perform their jobs effectively), we also know there is plenty of
> general interest in this work.  Please feel free to share your
> thoughts.
>
> Thanks!
> Rob
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l <at> lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--
wittylama.com/blog
Peace, love & metadata

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Giovanni Luca Ciampaglia | 10 Jan 14:08 2011
Picon

ANN: notabilia.net: Visualizing deletion decisions on Wikipedia

Hello all,
It seems community discussions like those for AfDs or admin elections 
have received some attention in the literature recently.
Together with Dario Taraborelli and Moritz Stefaner we wanted to get a 
better perspective on them and tried to visualize them. We just put 
online today a visualization of the longest AfD discussions on the 
English Wikipedia.

Check it out: http://notabilia.net

Cheers,

--

-- 
Giovanni L. Ciampaglia
PhD Student
University of Lugano, MACS Lab
zh509 | 11 Jan 00:48 2011
Picon

Re: Feature requirements for WMF's analytics infrastructure

Hi, Rob,

It is really good if WMF put such project as unconfidential. Actually, it 
offer the possibility to combine such results for further academic usage. 
Specifically, I am interesting on the % new vs repeat and Minutes/Visit 
(medium), which I thought are impossible to generate...I am really newbie 
on technical.

Zeyi

>Hi everyone,
>
>I've only posted once before here, and didn't do much of an intro back
>then, so let me do one now.  I'm the Program Manager for General
>Engineering at Wikimedia Foundation, which is the slice of the WMF
>Engineering organization that does infrastructure-related software
>development.  One piece we're responsible for is the analytics
>infrastructure.
>
> We're in the process of planning our software development for analytics 
> for the coming months, so we've had a few conversations, and Howie Fung 
> and I spent some time planning and writing up our thoughts on feature 
> prioritization here: 
> http://strategy.wikimedia.org/wiki/Task_force/Analytics/Feature_prioritization
>
>This is a really rough cut, and something we haven't fully discussed
>within the Foundation, so don't take this as something that is coming
>down from on high.  There are some things on the list that are well
>underway, but many things are things we're just getting started on.
>
>Barring any objections here, we'd like to use this mailing list as our
>primary venue for discussing general prioritization of analytics
>features.  We know we need a place that we can tell WMF employees to
>subscribe if they're interested in this stuff, and nothing we're
>discussing should be confidential.  Rather than starting a new mailing
>list, we'd like to try using this list for a bit (in combination with
>relevant talk pages on documents referenced here).  If it turns out
>we're generating enough traffic to warrant splitting off or if this
>list isn't working out for whatever reason, we'll figure out some
>alternate plan.
>
>While we suspect that many of the details will be of specific interest
>to Foundation employees (who are relying on much of this information
>to perform their jobs effectively), we also know there is plenty of
>general interest in this work.  Please feel free to share your
>thoughts.
>
>Thanks!
>Rob
>
>
Rob Lanphier | 11 Jan 22:19 2011
Picon

Re: Feature requirements for WMF's analytics infrastructure

On Sun, Jan 9, 2011 at 8:35 PM, Liam Wyatt <liamwyatt <at> gmail.com> wrote:
> For one, I'm super pleased that we're taking a wholistic approach to
> improving the analytics on WMF project. I have been hoping that we
> make it easier to extract x, y or z stats/metrics on an ad hoc basis,
> but to actually get proper analytics built right in is a giant leap
> beyond what I thought was possible.
> And secondly, as far as I'm personally concerned, this research-l
> mailing list would seem an appropriate place to host discussions about
> the analytics project in the manner in which you described.

Excellent!  There's some wording on this page that caused me to be a
little timid about this:
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

"Internal Wikimedia matters, discussions of new projects and similar
threads should be kept off the list."

This is arguably an "internal Wikimedia matter", but I suspect that
wording was written long ago, and could use some tuning and
clarification.

> One question: as I understand it, one of the key priorities of this
> analytics project is the installation of OpenWebAnalytics (which
> AFAICT will be similar to GoogleAnalytics but open source and also
> compliant with the WMF's stringent privacy policy). If so, will the
> full array of anonymised analytics be visible to everyone live, or
> will the results be released in a summarised format on a regular
> basis? That is, will the public/wikimedians/press be able to see the
> same thing that the WMF can see and at the same time?

Not yet.  We've discussed how to make this possible, but I think
there's a lot of work left to do to make this a reality.  We'd need to
make sure of a couple of things:
1.  That the only thing we're providing is a fully sanitized view of the data
2.  That any user interface that we expose via public web page go
through much more rigorous security review

For the first item, it's worth discussing on one of the OWA mailing lists:
http://wiki.openwebanalytics.com/index.php?title=Support

I'll also make Peter aware of this thread so he knows what's going on.

> Finally, if I may just throw in a little request to the "wishlist" -
> one thing that GLAM partners would really like to be able to do is
> easily produce for themselves a "report card" of their organisation's
> relationship to Wikimedia over time. Currently, we make do with
> producing ad hoc stats for them based maingly on magnus' tools
> (especially baGLAMa and GLAMorous) and other things like
> linkypedia.inkdroid.org . It would be brilliant if a GLAM partner
> could quickly and easily produce a *pretty* report that showed how
> their images were being used (number of usages, number of views...),
> how our external links to their site were used (most popular referral
> paths, total traffic, most linked-from categories...) and how articles
> about things relate to them are used (quality improvement over time,
> combined pageviews for categories important to them...). Ideally,  if
> this could generate into a report fit to show to senior management, I
> suspect that we would have much greater success with enticing more
> GLAMs to move towards free-culture. All "whishlist" stuff I know, but
> I thought I might as well ask :-)

By all means.  I'm wondering what the most sensible way to organize
and vet all of the community wishlist issues.  What I'd like to do is
make sure we have a bulleted summary or a query somewhere that we can
march through during the meetings we have at WMF about priority
setting. If it's buried in an email thread, it's going to get lost.
Where do you think is the most sensible place to ask people to put
these requests that works for everyone?

Rob
Christoph LANGE | 12 Jan 00:49 2011
Picon

Call for Papers & Demos: Semantic Publication Workshop SePublica <at> ESWC (May 29 or 30, Crete, Greece) – Deadline Feb 28

1st International Workshop on Semantic Publication (SePublica 2011)
http://sepublica.mywikipaper.org
at the 8th Extended Semantic Web Conference (ESWC 2011)
http://www.eswc2011.org
May 29th or 30th, Hersonissos, Crete, Greece
Keynote by Steve Pettifer, Manchester University, UK.
“Utopia Documents and The Semantic Biochemical Journal experiment”

SUBMISSION DEADLINE February 28

The MISSION of the SePublica workshop is to bring together researchers
and practitioners dealing with different aspects of Semantic
Technologies in the Publishing Industry. How is the Semantic Web
impacting the publishing industry? How is our experience of
publications changing because of Semantic Web technologies being
applied to the publishing industry?

The CHALLENGE of the Semantic Web is to allow the Web to move from a
dissemination platform to an interactive platform for networked
information. The Semantic Web promises to “fundamentally change our
experience of the Web”.

In spite of improvements in the distribution, accessibility and
retrieval of information, little has changed in the publishing
industry so far. The Web has succeeded as a dissemination platform for
scientific and non-scientific papers, news, and communication in
general; however, most of that information remains locked up in
discrete documents, which are poorly interconnected to one another and
to the Web.

The connectivity tissues provided by RDF technology and the Social Web
have barely made an impact on scientific communication nor on ebook
publishing, neither on the format of publications, nor on repositories
and digital libraries. The worst problem is in accessing and reusing
the computable data which the literature represents and describes.

• Consider research publications: Data sets and code are essential
elements of data intensive research, but these are absent when the
research is recorded and preserved in perpetuity by way of a scholarly
journal article.
• Or consider news reports: Governments increasingly make public
sector information available on the Web, and reporters use it, but
news reports very rarely contain fine-grained links to such data
sources.

QUESTIONS AND TOPICS OF INTEREST

• What does a network of truly interconnected papers look like?
How could interoperability across documents be enabled?
• How could concept-centric social networks emerge?
• Are blogs and wikis new means for scholarly communication?
• What lessons can be learned from humanities and social science publishers
(i.e. going beyond scientific publishing towards scholarly publishing)?
• How could we move beyond the PDF?
How can we embed and link semantics in EPUB and other e-book formats?
• How are digital libraries related to semantic e-science?
What is the relationship between a paper and its digital library?
• How could we realize a paper with an API?
How could we have a paper as a database, as a knowledge base?
• How is the paper an interface, gateway, to the web of data?
How could such and interface be delivered in a contextual manner?
• How could RDF(a) and ontologies be used to represent the knowledge encoded
in scientific documents and in general-interest media publications?
• What ontologies do we need for representing structural elements in a
document?
• How can we capture the semantics of rhetorical structures in
scholarly communication, and of  hypotheses and scientific evidence?

AUDIENCE

• researchers from diverse backgrounds such as argumentative
structures, scholarly communication, multi-modality in publications,
digital libraries, semantics in publications, and ontology
engineers.
• practitioners active in the publishing industry, repositories of
experimental information and document standards.

IMPORTANT DATES

Paper/Demo Submission Deadline: February 28, 23:59 Hawaii Time
Acceptance Notification: April 1
Camera Ready Version: April 15
SePublica Workshop: May 29 or May 30 (to be announced)

SUBMISSION AND PROCEEDINGS

Research papers are limited to 12 pages and position papers to 5
pages. For system descriptions, a 5 page paper should be
submitted. All papers and system descriptions should be formatted
according to the LNCS format

http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0

We encourage the submission of semantic documents. LaTeX documents in
the LNCS format can, e.g., be annotated using SALT
(http://salt.semanticauthoring.org) or sTeX
(http://trac.kwarc.info/sTeX/). We also invite submissions in
XHTML+RDFa or in the format or YOUR semantic publishing tool.
However, to ensure a fair review procedure, authors must additionally
export them to PDF.  For submissions that are not in the LNCS PDF
format, 400 words count as one page. Submissions that exceed the page
limit will be rejected without review.

Depending on the number and quality of submissions, authors might
be invited to present their papers during a poster session.

Please submit your paper via EasyChair at
http://www.easychair.org/conferences/?conf=sepublica2011

The author list does not need to be anonymized, as we do not have a
double-blind review process in place.

Submissions will be peer reviewed by three independent
reviewers. Accepted papers have to be presented at the workshop
(requires registering for the ESWC conference and the workshop) and
will be included in the workshop proceedings that are published online
at CEUR-WS.

PROGRAM COMMITTEE

• Robert Stevens, Manchester University, UK
• Benjamin Good, Genomic Institute, Novartis, USA
• Michael Kohlhase, Jacobs University, Germany
• Oscar Corcho, Politecnica de Madrid, Spain
• Steve Pettifer, Manchester University, UK
• Jodi Schneider, DERI, NUI Galway, Ireland
• Sebastian Kruk, knowledgehives.com, Poland
• Henrik Eriksson,  Linköping University, Sweden
• Dagobert Soergel, University of Maryland, USA
• Tim Clark, Harvard Medical School, USA
• Paolo Ciccarese, Harvard Medical School, USA

ORGANIZING COMMITTEE

• Alexander García Castro, University of Bremen, Germany
• Christoph Lange, Jacobs University Bremen, Germany
• Anita de Waard, Elsevier, USA/Netherlands
• Evan Sandhaus, New York Times, USA

QUESTIONS? → sepublica <at> googlegroups.com

--

-- 
Christoph Lange, Jacobs Univ. Bremen, http://kwarc.info/clange, Skype
duke4701
Semantic Publication workshop, May 29 or May 30, Hersonissos, Crete, Greece
Submission deadline February 28, http://SePublica.mywikipaper.org

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
phoebe ayers | 13 Jan 21:53 2011
Picon

Pew Research Report on Wikipedia

As you all may have seen there is tons of media coverage coming out
around Wikipedia's 10th anniversary (Jan 15, 2011). In the midst of
this the Pew Internet Research Center released a new report today:

"Wikipedia, past and present"
http://pewinternet.org/Reports/2011/Wikipedia.aspx

-- phoebe

--

-- 
* I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Joseph Reagle | 13 Jan 22:06 2011

Re: Pew Research Report on Wikipedia

On Thursday, January 13, 2011, phoebe ayers wrote:
> "Wikipedia, past and present"
> http://pewinternet.org/Reports/2011/Wikipedia.aspx

Given how much Google juice WP has, I find it unintuitive that only "53% of adult internet users" "use
Wikipedia to look for information". I thought this low number is perhaps people thinking this means they
type the query into Wikipedia itself. Pew is always thorough, so looking for the questions I see [1] and
infer the question was: 

> Thinking about your internet use overall... Please tell me if you ever use the internet to do any of the
following things. Do you ever use the internet to [Look for information on Wikipedia] ? / Did you happen to
do this yesterday, or not?

...?

[1]: http://pewinternet.org/Shared-Content/Data-Sets/2010/May-2010--Cell-Phones.aspx
Steven Walling | 13 Jan 22:16 2011
Picon

Re: Pew Research Report on Wikipedia

Just a reminder that Pew is exclusive to the U.S. so that's 53% of American adult internet users using Wikipedia. 


Steven Walling

On Thu, Jan 13, 2011 at 1:06 PM, Joseph Reagle <joseph.2008 <at> reagle.org> wrote:
On Thursday, January 13, 2011, phoebe ayers wrote:
> "Wikipedia, past and present"
> http://pewinternet.org/Reports/2011/Wikipedia.aspx

Given how much Google juice WP has, I find it unintuitive that only "53% of adult internet users" "use Wikipedia to look for information". I thought this low number is perhaps people thinking this means they type the query into Wikipedia itself. Pew is always thorough, so looking for the questions I see [1] and infer the question was:

> Thinking about your internet use overall... Please tell me if you ever use the internet to do any of the following things. Do you ever use the internet to [Look for information on Wikipedia] ? / Did you happen to do this yesterday, or not?

...?

[1]: http://pewinternet.org/Shared-Content/Data-Sets/2010/May-2010--Cell-Phones.aspx


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l <at> lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Gmane