kragen | 3 Oct 2005 09:37
Picon
Favicon

Reclaiming the Oxford English Dictionary for the public

The Oxford English Dictionary, generously supported by the Oxford
University Press, is one of the earliest instances of what are now
called "pro-am" or "commons-based peer production" projects.  From
1857 to 1928, thousands of readers collected examples of uses of words
their dictionaries didn't define; they mailed these examples on slips
of paper to a small number of editors, who undertook to collate them
into a dictionary.  From 1884 to 1928, these editors published their
work in fascicles, mostly in alphabetical order.
<http://en.wikipedia.org/wiki/Oxford_English_Dictionary --- Wikipedia
article "Oxford English Dictionary">

In recent years, with the advent of public access to the internet, it
has become apparent that commons-based peer production works best when
no single party can restrict the uses of the end product; more people
can use it, it can be put to more uses, poor coordinators can be
replaced, and contributors have assurance that they will be able to
use their own work.  <http://perens.com/Articles/Economic.html ---
"The Emerging Economic Paradigm of Open Source", by Bruce Perens;
http://www.benkler.org/CoasesPenguin.html --- "Coase's Penguin, or
Linux and the Nature of the Firm, by Yochai Benkler>

This form of commons-based peer production of information, in which
the end product can be studied, copied, modified, and used freely, is
often called "Open Source
development". <http://opensource.org/docs/definition.php --- "The Open
Source Definition, Version 1.9", promulgated by the Open Source
Initiative; http:///www.catb.org/~esr/writings/cathedral-bazaar/ ---
"The Cathedral and the Bazaar", by Eric S. Raymond> It got this name
because it started with software whose source code was freely
available for all these purposes, also known as "free software"
(Continue reading)

kragen | 6 Oct 2005 09:37
Picon
Favicon

optimizing SQL in web apps by waiting until the last minute

So I consulted briefly on a project that was using a semistructured
data store for everything.  Everything was a graph, and they were
having a hard time getting it to all run at a reasonable speed,
because every time they touched a node, they issued a SQL query to
pull all the edges related to it out of the database.  Since typical
pages showed a few important properties from a large number of objects
(think: first and last names of all of your friends), every page
actually copied a substantial fraction of the data out of the
database.

On the other hand, it seemed that if you composed a SQL query (some
horrendous multi-way self-join) that fetched only the needed data from
the database, MySQL's query optimizer would do a good enough job
optimizing the query that it would fetch the needed data in a small
fraction of a second.

I'd experienced this before on a database-backed web site project, and
the cause was that the fetch_friends() method (or get_all_rogue_aps()
or whatever) didn't have enough knowledge of the context it was being
called in to know what other information its caller would want out of
the database; but it had to fetch the results immediately so its
caller could use them.  Using MySQL merely as a triple store, as
Class::RDF does, just made the problem ten times worse.

On my way to the grocery store, I was thinking about this, and about
Erlang's nested "IO lists" of characters for constructing output, and
about HTML templates, and my own implementation of part of the
relational algebra in Python with lazy compilation to SQL ("prototype
SchemeQL/Roe-like thing in Python",
http://lists.canonical.org/pipermail/kragen-hacks/2004-April/000394.html).
(Continue reading)

kragen | 10 Oct 2005 09:37
Picon
Favicon

Queer numbers: more aggressive purple numbers

"Purple numbers" are a way to let people link to every paragraph of
every web page you publish, by adding unobtrusive short serial numbers
to every newly created paragraph.  Chris Dent and Eugene Eric Kim
invented them (I think), and they've been implemented in IRC-logging
software, Wikis, and blogging software.  In PurpleWiki, you can even
transclude pragraphs from arbitrary Wiki pages by purple number, so
that edits to those other Wiki pages are immediately reflected in the
page that transcludes them.

But purple numbers only solve the problem for sites you control.  I'd
often like to link to, or transclude, some paragraph in a page I don't
control.  How can I do this, without the site owner providing any
stable naming scheme for paragraphs?

A few years ago, some folks [find reference!] came up with a "lexical
signature" scheme for persistent naming of web pages.  They picked the
five nonunique words from the page with the highest TF/IDF scores,
using the entire Web as the corpus of reference for these scores.
Then they could use a full-text search engine such as AltaVista or
Google to find the document if its author irresponsibly allowed its
URL to become invalid.  (Unique words were excluded because they were
usually misspellings.)

Their scheme appended the lexical signature as query terms to the URL,
as in "?lexical_signature=purple+wiki+transclude+lexical+idf".  The
idea was to modify web browsers to automatically indirect through a
search engine for these links.

Clearly you could do the same thing for paragraphs with a fragment
identifier:
(Continue reading)

kragen | 13 Oct 2005 09:37
Picon
Favicon

cost to install surveillance cameras in public places

Suppose you wanted to plant a hidden camera for some long period of
time and capture photos of all that went past.  You'd like to never
again have to enter the place where it's hidden, and only visit it
rarely; you'd like it to be small; and you'd like it to last a long
time.  For example, the book "The Social Life of Small Urban Spaces"
was based on a few years of research in this vein using Super 8
cameras for time-lapse photography.  It appears to me that this
equipment should now be incredibly cheap.

USB "webcams" that capture 100-kB 640x480 JPEGs are on the order of
$10.  I think 4-port USB hubs (again, on the order of $10) contain all
the hardware necessary to act as USB host controllers; one could
imagine integrating the USB hub hardware with a small single-board
computer with SD/MMC and Bluetooth interfaces, for a total cost on the
order of $50 plus up to 4 cameras and their USB cables, and an MMC
card ($50-$110).

This device would presently be limited in smallness only by the size
of its power supply, USB ports, and multi-chip integration, so it
could be concealed in many places.  You could probably run it on 200mW
when running (for less than a second) and <1mW when idle.

You could drop by periodically with an inconspicuous Bluetooth device,
such as a cellphone or laptop, to download the pictures (say, 4
cameras * 100kB/shot/camera * 4 shots / minute * 60 minutes/hour * 24
hours/day = 2.3GB/day; but one shot per minute is only 144MB/day).
Anyone snooping over Bluetooth at the time could tell that a lot of
data was being sent over Bluetooth (1megabit/sec? not sure; but at
that speed you'd have to spend 2300 seconds in the vicinity.)

(Continue reading)

kragen | 17 Oct 2005 09:37
Picon
Favicon

audio outliners for hypertext navigation

A "tree view" is a way of exploring tree-structured data; at any given
time, a subset of the tree is displayed, and clicking on any node to
display its children or hides all its descendants.  An editable tree
view, where you can easily change the contents of any node or copy or
move it elsewhere in the tree, is often called an "outliner".

Some people are big fans of outliners as universal user interfaces,
notably Douglas Engelbart and Dave Winer.  They have the advantage
that the display at any given time can include data from many
different sources, but still remains essentially linear.

I wrote recently about the art of summarization or rubrication as an
aid to navigation and retrieval.  These days, when I want to find a
web page on some topic, I might perform a Google query; this yields
ten web page titles, which I generally read in sequence, and when I
find one that sounds promising, I read the extract.  If it sounds
sufficiently interesting, I might follow through to the page itself,
which may have its own table of contents summarizing different
sections of the document.

This sounds a lot like outlining, and you could imagine a web browser
that would display the whole process as an outline, dynamically
transcluding text beneath each link you click on.  This may or may not
be an improvement on the standard web browser UI; probably for some
tasks it would be, especially if the web pages in question were
designed for it.

Today I was watching Dick Hardt's Identity 2.0 presentation from OSCON
<http://www.identity20.com/media/OSCON2005/>.  In Larry Lessig style,
his slides changed more than once per second, and typically had only a
(Continue reading)

kragen | 20 Oct 2005 09:37
Picon
Favicon

Current state of free-software OCR: not good


I downloaded Walter Parquhar Hook's 1842 Church Dictionary
<http://www.archive.org/details/ChurchDictionary> from the Internet
Archive and tried OCRing some text from it, using free software.  I
didn't have a lot of success, but success looks tantalizingly close.

I used DjView to extract the first page that has actual text on it.

gOCR
----

gOCR renders the first four lines of the sample book, as output by
DjView, more or less as follows:

    __E stronges_ __ecommendation o_ the _olIo_-
     <at> g _orb cons <at> ts  <at>  <at>  the statemen_ {_f  <at> s be <at> gg
    _oR the __ost paRt, me,Rely a Comp <at> at <at> n; a__d tb <at> 
    _eneraI ac___o_ledgment rendel_s  <at>  unnecessarY

It actually reads:

    THE strongest recommendation of the follow-
    ing Work consists in the statement of its being,
    for the most part, merely a Compilation; and this
    general acknowledgment renders it unnecessary

A second try, using the command-line "gocr -C 
'- abcdefghijklmnopqrstuvwxyz,;ABCDEFGHIJKLMNOPQRSTUVWXYZ.'
ChurchDictionary0004.pbm" yielded, after 100 seconds of CPU time, the
following results:
(Continue reading)

kragen | 24 Oct 2005 09:37
Picon
Favicon

passive laser sonar

Passive sonar systems have a practical difficulty: you must place many
microphones far apart and run wires to all of them (or have nearby
computers to transmit their signal over e.g. radio.)

An alternative is to use laser vibration-detection systems, such as
those used for remote snooping on speech.  These use Doppler shifts in
laser light reflected from an object (such as a window or wall) to
observe changes in the distance to an object, changes much smaller
than a light wavelength.

A set of such lasers could put many "virtual microphones" hundreds of
meters apart, with a consequent dramatic improvement in spatial
resolution, without having to distribute equipment over a large area.
I suspect that even a single laser beam scanning, say, a wall, could
improve spatial resolution significantly, analogously to synthetic
aperture radar.

Some possible uses; maybe not all of these are plausible:
- scanning three-dimensional human body shapes by their sound
  reflections, for recognition or to find concealed weapons;
- scanning environments on the other side of a wall, by the way they
  shape sound wavefronts that impinge upon that wall;
- measuring macroscopic distances and shapes such as buildings, caves,
  or terrain, for instance for building planning or inspection;
- spatially localizing sound sources in order to filter out background
  noise, for example when recording music;
- identifying, for example, vehicles by sound (having filtered out
  background noise) and measuring their position and velocity;
- characterizing sound wavefronts some distance away in order to have
  time to precompute a canceling waveform for active cancellation;
(Continue reading)

kragen | 27 Oct 2005 09:37
Picon
Favicon

feedback and automated fabrication

You could build a pretty simple hexapod with just six cables to move a
hanging flutterwumper in six degrees of freedom, and you could probably
control the lengths of the cables with great precision; but by itself
that doesn't give you much accuracy in the finished product.

Closed-loop control, though, could give you the accuracy you need.  If
you can measure the position of the flutterwumper to great precision,
you can correct the position until it works; similarly, to correct for
cutting-head loading errors, cutting-head wear, etc., if you can
measure the depth of cuts in the material, you can correct for
whatever inaccuracies arise.

Ultrasound seems like a plausible approach at first, but for
manufacturing metal parts, you need shape accuracy (thus measurement
accuracy) on the order of 25 microns.  At sea level, that's 75
nanoseconds of timing accuracy on the sound, which means you need a
13.3 MHz sound.  Air cannot carry sounds close to that frequency.

Radio seems like a plausible approach at first too, but 25 microns is
85 * 10^-15 seconds --- 85 attoseconds.  There are no circuits
contemplated that can measure the round-trip time of radio pulses that
accurately.

So light interferometry, of one form or another, is the only feedback
mechanism I can think of that might work.  Laser phase shifts can
measure the distance of movement of an object; a circularly polarized
laser can tell which direction, too.  Using this principle you can
measure the shape of a macroscopic continuous surface.  I still don't
quite know how to measure its position to within a few microns, but
maybe I don't need to.
(Continue reading)

kragen | 31 Oct 2005 09:37
Picon
Favicon

the energy cost to evacuate Earth's human population

http://en.wikipedia.org/wiki/Escape_velocity says:
    On the surface of the Earth the escape velocity is about 11.2
    kilometres per second.

You have: 100 kg * (11.2 km/sec) * (11.2 km/sec) / 2
You want: kilowatt hours
        * 1742.2222
        / 0.00057397959

So 1700 kWh per (large) person, to lift them out of Earth's gravity
well (assuming perfect efficiency, as with a space elevator.)

http://www.ecoworld.org/energy/EcoWorld_Energy_Resid_KWH_Prices.cfm
lists average US residential electricity prices from 6.5 to 14.8 cents
per kWh, with an outlier at 33.3 in San Francisco during the
California energy crisis.  It also claims that the cost of the fuel
alone amounts to about 0.5 to 1 cent per kWh.

So if we have to pay 10 cents per kWh, lifting a person into space
should cost around $170 --- an energy cost that could in theory be
recovered if they came back down.  (At present this energy is mostly
dissipated thermally.)

Evacuating the entire human race to an extraterrestrial habitat
prepared to handle them should then have an energy cost around $1
trillion.  This is roughly 2% of annual world GDP ($55.9 trillion) at
PPP.  (See http://www.worldbank.org/data/databytopic/GDP_PPP.pdf for
details.)

Current world energy usage is around 354 exajoules
(Continue reading)


Gmane