Michael Fourman | 1 Jan 14:58
Picon
Picon

Re: Mendely: beyond PDF, annotations in general, other thoughts (some OT)

One, as yet undocumented, iDEA lab project at Edinburgh is to generate topic indexes for browsing
relatively large collections (currently several thousands, planning for 10x - 100x that) of academic
papers. 

(See http://homepages.inf.ed.ac.uk/mfourman/research/topics/uoe.xml for an early test example.
Best viewed with a WebKit browser [Safari, Chrome], but also with latest Firefox [with some UI features missing].)

We're mining online pdf texts, and find that around one third of the pdfs that academics at Edinburgh
publish online don't easily yield text.

I have slightly different needs from someone wanting a text version for annotation (I just need a bag of
words). I'm resorting to OCR, using a combination of convert (ImageMagick), tesseract
(code.google.com/p/tesseract-ocr/), aspell, and a stemmer to produce the bag of words I need.

The ocropus project, which also builds on tesseract, may be closer to what you want. (code.google.com/p/ocropus/)

VelOCRaptor (http://blog.velocraptor.com/) provides an OSX tool (not open, but based on ocropus) for
using ocr to add searchable text to pdfs.

It would be good to establish an open version of something similar, together with tools for manual
correction, and learning from manual corrections to improve automation. I plan to propose an MSc project
along these lines.

With best wishes for the New Year,

Michael

On 1 Jan 2010, at 12:00, okfn-discuss-request@... wrote:

> On Fri, Dec 4, 2009 at 9:44 AM, Philippe Aigrain
(Continue reading)

philippe.aigrain | 2 Jan 12:25
Favicon
Gravatar

Re: Mendely: beyond PDF, annotations in general, other thoughts (some OT)

Hi Luis,

Right now, you have 2 ways to make enquiries on the future of COMT /
co-ment :
- Creating a ticket on the community site at www.co-ment.org
or
- Writing at contact@... (which is just an alias not an email list)

However, it it is of course perfectly OK to ask here (OKFN is part of our
home in the universe).

So the answer to your question is :
- COMT supports markdown import today. You can use it either by installing
the alpha version (http://www.co-ment.org) or by using a demo workspace
that we have created for testers. We would be pleased to count you in, as
well as any other OKFN interested person. Just confirm to my email address
(in copy). I will separately copy the co-ment developers.
- Somewhere between 20 January and end of January we will put in place a
new generation of co-ment Web services based on COMT.

Finally please note that one key enabler for expanding co-ment takeup
would be a OpenDocument to markdown converter (meaning cleaning up all the
styles while keeping some basic structure). John MacFarlane (philosopher
and developer of pandoc) had done some premilanry efforts at this. A
M$Word to markdown (without going through OpenDocument converter would
also be useful) but is clearly more stupidly complex work.

Best,

Philippe
(Continue reading)

Jonathan Gray | 4 Jan 18:44

Lingoes Project

I just came across the Lingoes Project which seems to claim that the
Open Knowledge Foundation endorses it in some way:

  http://www.lingoes.pk/

"Thanks to the Lingoes Project, the wait is over, finally! The Open
Knowledge Foundation is most happy to announce that there is now one
such desktop software available that can take care of all your
dictionary requirements, absolutely FREE!"

If its the same OKF, I'm not quite sure where this claim comes from.
This is the first I've heard of the project. Does anyone else know
anything about it? Furthermore, the material doesn't look very open.
Seems to be under a non-commercial license:

  http://www.lingoes.pk/2009/12/license-agreement.html

Jonathan

---------- Forwarded message ----------

Your website says:

"The Open Knowledge Foundation is most happy to announce that there is
now one such desktop software available that can take care of all your
dictionary requirements, absolutely FREE!"

Could you please clarify in what way the Open Knowledge Foundation
endorses this product?

(Continue reading)

Jonathan Gray | 5 Jan 12:54

IRC meeting tonight (2010-01-05) from 1800 GMT on #okfn at irc.oftc.net

Hi all,

We're planning to meet up tonight on the #okfn IRC channel
(irc.oftc.net) at 1800 GMT.

You can connect using a web browser at: http://ur1.ca/4fh

If anyone's got anything they'd like to discuss, propose or work on,
please drop in!

Details of this and other (virtual) events are on the Google Calendar at:

  http://wiki.okfn.org/Events

--

-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://www.okfn.org
Jonathan Gray | 5 Jan 18:23

Fwd: Drumbeat is hiring. Any ideas? Can you forward?

From Mark Surman at Mozilla Foundation (who recently joined OKF's
Advisory Board)...

Drumbeat definitely looks interesting, and look forward to watching it
as it develops!

Jonathan

---------- Forwarded message ----------
Hey Jonathan

We're looking to hire two 'project producers' people for Drumbeat over
the next month or so -- people who can find and lead good projects,
and also coach others. One in Paris and one in Mtn View.

https://wiki.mozilla.org/Drumbeat/jobs/pm

These need to be top notch people who really get the open web *at the
normal internet user level* and know how to get people participating.

My guess is you might know good people for this. If so, could you
forward or put me directly in touch? Also, can you forward to lists
you're on?

Thanks .... MS

PS. Here is the post in text form:

** Mozilla looking for two amazing open web 'project producers'
Location: One position in Paris, France and another Mountain View, CA
(Continue reading)

Jonathan Gray | 7 Jan 12:10

JISC funding frozen

For anyone who hasn't seen this yet...

---------- Forwarded message ----------
From: Malcolm READ [7230] <M.Read@...>
Date: Tue, Jan 5, 2010 at 4:27 PM
Subject: Re: JISC Capital Programme: important announcement
To: JISC-ANNOUNCE@...

The recent grant letter from the Department for Business, Innovation
and Skills (BIS) to HEFCE means that HEFCE are having to give careful
consideration as to how funding reductions arising from the grant
letter will be passed on to institutions.

In the light of the grant letter, HEFCE have asked that JISC make no
further commitments of capital funds ahead of the HEFCE Board meeting
on 28 January at which decisions on funding allocations will be made.
With this in mind, JISC is ‘freezing’ all current capitally-funded
calls and ITTs, and will regrettably not be issuing any new calls or
ITTs, until the funding situation is clearer.

Malcolm Read

--

-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://www.okfn.org
Jonathan Gray | 8 Jan 13:15

CFP for 14th European Conference on Digital Libraries, Glasgow, September 2010

It would be great to submit something to this about Public Domain
Works, the Public Domain Calculators, Open Bibliographic Metadata,
etc.

Sara: is this of interest to you?

Jonathan

---------- Forwarded message ----------
14th European Conference on Digital Libraries

September 6-10, 2010

Glasgow, UK

http://www.ecdl2010.org

Call for Contributions

Overview

The European Conference on Digital Libraries (ECDL) is the leading
European scientific forum on digital libraries and associated
technical, practical, and social issues, bringing together
researchers, developers, content providers and users in the field.
ECDL 2010, the 14th conference in this series, will be organised by
the University of Glasgow. The proceedings will be published as a
volume of Springer’s Lecture Notes on Computer Science (LNCS) series.

Topics of interest include, but are not limited to:
(Continue reading)

Jonathan Gray | 10 Jan 02:53

Fwd: freetable.org first draft specification

Thought this might be of interest. Text file pasted below.

Jonathan

---------- Forwarded message ----------

Attached is the first draft specification for freetable.org.

I am sending it to you because you indicated via the freetable.org
website an interest in helping freetable as a developer.

Feedback on the spec is welcomed.

thanks,
gordon

[This is all very preliminary, alterations welcome.]

freetable.org specification
***************************

Version 0.1

We provide a hosted real time repository for shared data.

Organizational concerns
=======================

Organizational structure
------------------------
(Continue reading)

Jonathan Gray | 10 Jan 11:48

Re: Lingoes Project

Thanks for your email! I now understand the name - and glad to hear
you are starting an OKF in Pakistan. :-)

I wonder whether you'd consider putting the content under, e.g.,
either CC-BY or CC-BY-SA or using a license for data such as those at:

  http://www.opendatacommons.org/

Also might you be interested in translating the Open Knowledge
Definition (OKD) into Urdu or other Pakistani languages?

  http://opendefinition.org/

Best wishes,

Jonathan

On Tue, Jan 5, 2010 at 6:04 AM, Team @ Lingoes.pk <team@...> wrote:
> hi Jonathan
>>
>> If its the same OKF, I'm not quite sure where this claim comes from.
>> This is the first I've heard of the project. Does anyone else know
>> anything about it? Furthermore, the material doesn't look very open.
>> Seems to be under a non-commercial license:
>>
>>  http://www.lingoes.pk/2009/12/license-agreement.html
>
> no it's NOT the same OKF. we're a group from Pakistan focusing on promoting
> Open Source / Public Domain knowledge works to Pakistani users. WE DONT SELL
> ANYTHING. and We dont have any affiliation whatsoever with okfn.org. We
(Continue reading)

Jonathan Gray | 10 Jan 14:51

Weaving History and related work...

Hi all,

We've been doing a bit of work behind the scenes on Weaving History
(http://www.weavinghistory.org/) and suchlike - but I think we really
need to find someone to be our 'point person' in this area.

This might include being main contact for the WH project - and
associated work. Also it would be great to strengthen the network of
people working on this kind of thing - from Simile's timeline suite,
to people publishing time/geo tagged data, to visualisation experts,
to contacting relevant research institutions, such as The Virginia
Center for Digital History. Perhaps in the first instance, we could
start a working group and start having regular meetings to discuss
relevant developments and to try to keep things ticking!

In addition to doing technical work on the Weaving History software, I
think it would also be useful to better articulate the longer term
vision here. At the end of the day, we are not interested in a single
website, so much as the underlying open-source technology to visually
represent spatio-temporal information.

Some thoughts and questions:

  * Using bibliographic metadata to represent authors and works? This
would be an excellent way to flesh out the intellectual culture of a
certain place/period.
  * Time-based visualisations. E.g. having a time slider, which would
show events on the map appearing and disappearing.
  * Representing regions rather than just points. E.g. how national
borders have changed over time.
(Continue reading)


Gmane