Brett Henderson | 1 Aug 2010 02:35
Gravatar

Re: [OSM-dev] New OSM binary fileformat implementation.

On Sun, Aug 1, 2010 at 2:26 AM, Frederik Ramm <frederik <at> remote.org> wrote:
Scott, others,


Scott Crosby wrote:
I would like to announce code implementing a binary OSM format that
supports the full semantics of the OSM XML.

[...]


The changes to osmosis are just some new tasks to handle reading and
writing the binary format.

[...]

This was 3 months ago.

What's the status of this project? Are people actively using it? Is it still being developed? Can the Osmosis tasks be used in the new Osmosis code architecture (see over on osmosis-dev) that Brett has introduced with 0.36?

I'm curious about this as well.  The main reason for me introducing the new project structure was to facilitate the integration of new features like this.  They're relatively easy to add (some Ant and Ivy foo required ...), and can be removed later on if they're not maintained or people lose interest in them.  If there's a demand for this binary format I'm happy to help integrate it as a new project into the existing codebase.

I believe the existing version of this binary OSM format is implemented as a fork in a GIT repo so I suspect it will take some effort to update it to run against 0.36.  The code hasn't changed a lot, but the build processes have.

Brett

_______________________________________________
dev mailing list
dev <at> openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev
Erik Johansson | 1 Aug 2010 11:34
Picon

Re: [OSM-dev] New OSM binary fileformat implementation.

On Sun, Aug 1, 2010 at 2:35 AM, Brett Henderson <brett <at> bretth.com> wrote:
> On Sun, Aug 1, 2010 at 2:26 AM, Frederik Ramm <frederik <at> remote.org> wrote:
>>
>> Scott, others,
>>
>> Scott Crosby wrote:
>>>
>>> I would like to announce code implementing a binary OSM format that
>>> supports the full semantics of the OSM XML.
>>
>> [...]
>>
>>> The changes to osmosis are just some new tasks to handle reading and
>>> writing the binary format.
>>
>> [...]
>>
>> This was 3 months ago.
>>
>> What's the status of this project? Are people actively using it? Is it
>> still being developed? Can the Osmosis tasks be used in the new Osmosis code
>> architecture (see over on osmosis-dev) that Brett has introduced with 0.36?
>
> I'm curious about this as well.  The main reason for me introducing the new
> project structure was to facilitate the integration of new features like
> this.  They're relatively easy to add (some Ant and Ivy foo required ...),
[...]
> The code hasn't changed a lot, but the build processes have.

Well that's one of the thing Scott said he had no clue on how to do.
From Scotts mail:

Scott Crosby:
> // TODO's

> Probably the most important TODO is packaging and fixing the build system.
> I have no almost no experience with ant and am unfamiliar with java
> packaging practices, so I'd like to request help/advice on ant and suggestions on
> how to package the common parsing/serializing code so that it can be
> re-used across different programs.
Gravatar

Re: [OSM-dev] New OSM binary fileformat implementation.

Hi, ist there any documentation of the binary format changes?
I have implemented a c++ reader using protobuf, would update that if
there is a new format spec.
mike
Brett Henderson | 1 Aug 2010 13:39
Gravatar

Re: [OSM-dev] New OSM binary fileformat implementation.

On Sun, Aug 1, 2010 at 7:34 PM, Erik Johansson <erjohan <at> gmail.com> wrote:
On Sun, Aug 1, 2010 at 2:35 AM, Brett Henderson <brett <at> bretth.com> wrote:
> On Sun, Aug 1, 2010 at 2:26 AM, Frederik Ramm <frederik <at> remote.org> wrote:
>>
>> Scott, others,
>>
>> Scott Crosby wrote:
>>>
>>> I would like to announce code implementing a binary OSM format that
>>> supports the full semantics of the OSM XML.
>>
>> [...]
>>
>>> The changes to osmosis are just some new tasks to handle reading and
>>> writing the binary format.
>>
>> [...]
>>
>> This was 3 months ago.
>>
>> What's the status of this project? Are people actively using it? Is it
>> still being developed? Can the Osmosis tasks be used in the new Osmosis code
>> architecture (see over on osmosis-dev) that Brett has introduced with 0.36?
>
> I'm curious about this as well.  The main reason for me introducing the new
> project structure was to facilitate the integration of new features like
> this.  They're relatively easy to add (some Ant and Ivy foo required ...),
[...]
> The code hasn't changed a lot, but the build processes have.


Well that's one of the thing Scott said he had no clue on how to do.
From Scotts mail:



Scott Crosby:
> // TODO's

> Probably the most important TODO is packaging and fixing the build system.
> I have no almost no experience with ant and am unfamiliar with java
> packaging practices, so I'd like to request help/advice on ant and suggestions on
> how to package the common parsing/serializing code so that it can be
> re-used across different programs.

I'll help incorporate this into the rest of Osmosis.  There's a few things to work through though.
  • Is there a demand for the binary format in its current incantation?  I'm not keen to incorporate it if nobody will use it.
  • Can the code be managed in the main OSM Subversion repo instead of GIT?
  • Is any code reuse between Osmosis and other applications required?  If only the Osmosis tasks will be managed in the Osmosis project and a component with common functionality managed elsewhere then I need to know how the common component will be managed and published for consumption in Osmosis.
Brett

_______________________________________________
dev mailing list
dev <at> openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev
Andreas Kalsch | 1 Aug 2010 19:33
Picon
Picon

Re: [OSM-dev] New OSM binary fileformat implementation.

What about some metrics (performance, size)? Data is the same, whether binary or not. So binary really has to pay off significantly.

Am 01.08.10 13:39, schrieb Brett Henderson:
On Sun, Aug 1, 2010 at 7:34 PM, Erik Johansson <erjohan <at> gmail.com> wrote:
On Sun, Aug 1, 2010 at 2:35 AM, Brett Henderson <brett <at> bretth.com> wrote:
> On Sun, Aug 1, 2010 at 2:26 AM, Frederik Ramm <frederik <at> remote.org> wrote:
>>
>> Scott, others,
>>
>> Scott Crosby wrote:
>>>
>>> I would like to announce code implementing a binary OSM format that
>>> supports the full semantics of the OSM XML.
>>
>> [...]
>>
>>> The changes to osmosis are just some new tasks to handle reading and
>>> writing the binary format.
>>
>> [...]
>>
>> This was 3 months ago.
>>
>> What's the status of this project? Are people actively using it? Is it
>> still being developed? Can the Osmosis tasks be used in the new Osmosis code
>> architecture (see over on osmosis-dev) that Brett has introduced with 0.36?
>
> I'm curious about this as well.  The main reason for me introducing the new
> project structure was to facilitate the integration of new features like
> this.  They're relatively easy to add (some Ant and Ivy foo required ...),
[...]
> The code hasn't changed a lot, but the build processes have.


Well that's one of the thing Scott said he had no clue on how to do.
>From Scotts mail:



Scott Crosby:
> // TODO's

> Probably the most important TODO is packaging and fixing the build system.
> I have no almost no experience with ant and am unfamiliar with java
> packaging practices, so I'd like to request help/advice on ant and suggestions on
> how to package the common parsing/serializing code so that it can be
> re-used across different programs.

I'll help incorporate this into the rest of Osmosis.  There's a few things to work through though.
  • Is there a demand for the binary format in its current incantation?  I'm not keen to incorporate it if nobody will use it.
  • Can the code be managed in the main OSM Subversion repo instead of GIT?
  • Is any code reuse between Osmosis and other applications required?  If only the Osmosis tasks will be managed in the Osmosis project and a component with common functionality managed elsewhere then I need to know how the common component will be managed and published for consumption in Osmosis.
Brett

_______________________________________________ dev mailing list dev <at> openstreetmap.org http://lists.openstreetmap.org/listinfo/dev

_______________________________________________
dev mailing list
dev <at> openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev
Ævar Arnfjörð Bjarmason | 1 Aug 2010 20:21
Picon
Gravatar

Re: [OSM-dev] New OSM binary fileformat implementation.

On Sun, Aug 1, 2010 at 17:33, Andreas Kalsch <andreaskalsch <at> gmx.de> wrote:
> What about some metrics (performance, size)? Data is the same, whether
> binary or not. So binary really has to pay off significantly.

What performance metrics would you like that haven't already be
covered earlier in this thread, and in the initial announcement?
Frederik Ramm | 1 Aug 2010 23:24
Favicon

Re: [OSM-dev] New OSM binary fileformat implementation.

Hi,

Brett Henderson wrote:
> I'll help incorporate this into the rest of Osmosis.  There's a few 
> things to work through though.
> 
>     * Is there a demand for the binary format in its current
>       incantation?  I'm not keen to incorporate it if nobody will use it.

I run a nightly job at Geofabrik which currently operates on plain 
(uncompressed) OSM files and goes roughly like this (every step uses 
Osmosis):

* apply daily diff to planet file
* split planet file into continents
* split each continent into countries
* split some countries into smaller units
* split some smaller units into even smaller units
* bzip2 the lot

The whole job takes from ~ 22h at night to ~ 9h in the morning, even 
though I'm ignoring the US.

A lot of time is spent just reading from, and writing to, disk and 
parsing XML. Running the whole thing with .gz files doesn't make a big 
difference - saves some disk i/o, adds some CPU time, doesn't change XML 
parsing overhead.

I wanted to test-drive the binary format as a replacement for raw .osm 
files in this setup, hoping that it would give me the i/o benefits of 
gzip compressed data but also slash XML parsing time. The numbers that 
have been posted seemed promising. I might even be able to skip the 
bzip2 step at the end if the binary format should become widely used, 
just placing binary files on the server; and use the saved time to 
re-introduce US extracts.

So here's one user who's definitely in for it - the reason I asked right 
now was that I was planning to have a go at it in the near future, and 
wanted to make sure that I'm not using an old version or going down a 
path that everyone else already discarded. - If there's "proper" 
integration with Osmosis around the corner then I'd wait for that.

The way I understood it, Scott was re-using some code he placed inside 
the Osmosis tree from within his "splitter" code. Also I could imagine 
that using this fance Google library means you'll have some format 
description files which might be shared across all projects using that 
library, perhaps even including the C++ reader that jamesmikedupont has 
built, but I'm not sure.

I prefer SVN over git for the simple reason that I only have to "svn up" 
and everything is there but I'm sure it is going to be a matter of 
minutes before someone from Iceland points out that the same convenience 
can be had with git if one knows what they're doing ;)

Bye
Frederik

--

-- 
Frederik Ramm  ##  eMail frederik <at> remote.org  ##  N49°00'09" E008°23'33"

_______________________________________________
dev mailing list
dev <at> openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev
Stefan de Konink | 2 Aug 2010 00:00
Picon
Gravatar

Re: [OSM-dev] New OSM binary fileformat implementation.

On Sun, 1 Aug 2010, Frederik Ramm wrote:

> A lot of time is spent just reading from, and writing to, disk and parsing 
> XML. Running the whole thing with .gz files doesn't make a big difference - 
> saves some disk i/o, adds some CPU time, doesn't change XML parsing overhead.

I'm sorry but the parsing overhead from Java or libXML basically a known 
slowless factor. MSXML, pre/post plane parsing or even custom readers are 
not slow, and only limited to the disk.

So the binary format, per se, is only faster because:
  - smaller filesize = less io
  - encoding: no xml rewriting

Anything else is currently available using for example osmsucker.c, 
obviously not using an XML parser because all input is structured.

If the binary format can pack our doubles (lat/lon), integers 
(version/ids) and makes strings available in UTF-8, that skips CPU and IO 
overhead. But makes the data not human readable. I can totally live with 
that, and I hope the API protocol also gets protocol buffers.

Stefan
Frederik Ramm | 2 Aug 2010 00:07
Favicon

Re: [OSM-dev] New OSM binary fileformat implementation.

Hi,

Stefan de Konink wrote:
> I'm sorry but the parsing overhead from Java or libXML basically a known 
> slowless factor. 

You don't have to be sorry, you're talking to the person who has patched 
osm2pgsql to "parse" XML with strcmp:

http://trac.openstreetmap.org/browser/applications/utils/export/osm2pgsql/primitive_xml_parsing

Bye
Frederik

--

-- 
Frederik Ramm  ##  eMail frederik <at> remote.org  ##  N49°00'09" E008°23'33"
Anthony | 2 Aug 2010 00:10

Re: [OSM-dev] New OSM binary fileformat implementation.

On Sun, Aug 1, 2010 at 6:00 PM, Stefan de Konink <stefan <at> konink.de> wrote:
> If the binary format can pack our doubles (lat/lon)

lat/lon is stored as a double?  I always use an int (and
divide/multiply by 10000000).

http://wiki.openstreetmap.org/wiki/Database_schema

Yeah, OSM seems to be doing the same thing.

Gmane