Andrew Ziem | 18 Aug 04:44
Favicon

libwps for Microsoft Works (.wps)


Hello,

I have started* a Microsoft Works (.wps) document importer.  Since 
libwpd has been incorporated into three word processors, I thought I'd 
emulate the libwpd API to make it easy to implement support for Works.  
Then, it was necessary or convenient to copy and paste libwpd's source 
code.  Now, I'm seeing there is a lot of copying, pasting, and renaming 
strings like "WP" to "WPS."  To avoid forking what has turned out to be 
an increasing amount of  code, any thoughts on consolidating efforts?

Andrew

* I am able to dump plain text from Works 4 and 7/8 formats.  Also, I 
have some progress on page and character formats in Works 4, which is 
apparently the most popular of the Works formats.  But I don't have any 
source code worth showing.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
Fridrich Strba | 18 Aug 08:18
Picon
Favicon

Re: libwps for Microsoft Works (.wps)


Hello, Andrew,

Andrew Ziem wrote:
> I have started* a Microsoft Works (.wps) document importer.  Since 
> libwpd has been incorporated into three word processors, I thought I'd 
> emulate the libwpd API to make it easy to implement support for Works.  

Very wise approach :-) It goes in line with Will's vision from this
e-mail
http://lists.ximian.com/pipermail/openoffice/2004-October/000556.html
and with an approach that I was trying to urge the OOo Google SoC
mentors to consider
http://sw.openoffice.org/servlets/ReadMsg?list=dev&msgNo=1241
Unfortunatelly, due to a low number of slots allocated to OOo, none of
the "Text importer" projects went through for SoC 2006, as you may know
:-( So, I am glad to see you in the "Text importer" universe again ;-)

> Then, it was necessary or convenient to copy and paste libwpd's source 
> code.  Now, I'm seeing there is a lot of copying, pasting, and renaming 
> strings like "WP" to "WPS."  To avoid forking what has turned out to be 
> an increasing amount of  code, any thoughts on consolidating efforts?

Yes, copy&paste == evil. Andrew, quick look at our API would suggest
following: Do not copy anything. Just for the time being make depend the
libwps on classes from libwpd public headers (WPXProperty* family
classes, WPXString, WPXHLListenerImpl (and eventually WPXInputStream)
interface class(es),...). It is quite possible that one could extract
the framework for the next ABI breakage from the libwpd-0.8.so into a
separage libwpd-framework-0.9.so, but as far as I am concerned, my todo
(Continue reading)

Ariya Hidayat | 18 Aug 09:45
Picon
Gravatar

Re: libwps for Microsoft Works (.wps)

Hi Andrew and others,

Here is my 2 cents: "every file format is unique".

I doubt it's quite difficult to merge two libraries which handle two
different file formats, unless the formats are very similar (e.g. WP 5
and WP 6). Thus, you need to have two libraries which - at some point
- will differ a lot, depending on the format that you would want to
handle.

So in your case, yes, copy-and-paste in the beginning can't be
avoided. But I believe (I'm glad if I'm proven wrong here), soon you
need to invent some data structures unique to WPS which do not have
their counterpart in WPD or which are difficult to handle if they were
to be merged.

Years ago I tried to make "an interface which rules them all" for all
major word processor file formats. Of course, this attempt failed
miserably (too complicated and humanly unmanageable), and hence my
theory above.

I have here also a preinstalled Works, so in case you need some help,
I'd try to give my best. Have fun !

Regards,

Ariya

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
(Continue reading)

Andrew Ziem | 18 Aug 16:55
Favicon

Re: libwps for Microsoft Works (.wps)


Fridrich Strba wrote:
> Andrew Ziem wrote:
>   
>> I have started* a Microsoft Works (.wps) document importer.  Since 
>> libwpd has been incorporated into three word processors, I thought I'd 
>> emulate the libwpd API to make it easy to implement support for Works.  
>>     
>
> Very wise approach :-) It goes in line with Will's vision from this
> e-mail
> http://lists.ximian.com/pipermail/openoffice/2004-October/000556.html
> and with an approach that I was trying to urge the OOo Google SoC
> mentors to consider
> http://sw.openoffice.org/servlets/ReadMsg?list=dev&msgNo=1241
>   
Thanks for the links.
> Unfortunatelly, due to a low number of slots allocated to OOo, none of
> the "Text importer" projects went through for SoC 2006, as you may know
> :-( So, I am glad to see you in the "Text importer" universe again ;-)
>   
Yes, one of those rejected SoC 2006 applications was mine.  :)  I 
applied to do the Apple iWorks filter, but I never heard much about why 
mine application wasn't accepted.   I suppose I should have chose a more 
"killer" project like the grammar checker.

>   
>> Then, it was necessary or convenient to copy and paste libwpd's source 
>> code.  Now, I'm seeing there is a lot of copying, pasting, and renaming 
>> strings like "WP" to "WPS."  To avoid forking what has turned out to be 
(Continue reading)

Andrew Ziem | 18 Aug 17:35
Favicon

Re: libwps for Microsoft Works (.wps)


Ariya Hidayat wrote:
> Hi Andrew and others,
>
> Here is my 2 cents: "every file format is unique".
>
> I doubt it's quite difficult to merge two libraries which handle two
> different file formats, unless the formats are very similar (e.g. WP 5
> and WP 6). Thus, you need to have two libraries which - at some point
> - will differ a lot, depending on the format that you would want to
> handle.
>
> So in your case, yes, copy-and-paste in the beginning can't be
> avoided. But I believe (I'm glad if I'm proven wrong here), soon you
> need to invent some data structures unique to WPS which do not have
> their counterpart in WPD or which are difficult to handle if they were
> to be merged.
>
> Years ago I tried to make "an interface which rules them all" for all
> major word processor file formats. Of course, this attempt failed
> miserably (too complicated and humanly unmanageable), and hence my
> theory above.
I suppose you are correct that it would be difficult as the libraries 
handled more details of each format.  However, my interest is limited to 
supporting basic features of the format.  The first phase includes:

    * Text import with basic formatting (bold, underline, etc.)
    * Page margins
    * Font size changes

(Continue reading)

William Lachance | 18 Aug 21:41
Picon
Gravatar

Re: libwps for Microsoft Works (.wps)

On 8/18/06, Fridrich Strba <fridrich.strba@...> wrote:
> Andrew Ziem wrote:
> > I have started* a Microsoft Works (.wps) document importer.  Since
> > libwpd has been incorporated into three word processors, I thought I'd
> > emulate the libwpd API to make it easy to implement support for Works.
>
> Very wise approach :-) It goes in line with Will's vision from this
> e-mail
> http://lists.ximian.com/pipermail/openoffice/2004-October/000556.html
> and with an approach that I was trying to urge the OOo Google SoC
> mentors to consider
> http://sw.openoffice.org/servlets/ReadMsg?list=dev&msgNo=1241
> Unfortunatelly, due to a low number of slots allocated to OOo, none of
> the "Text importer" projects went through for SoC 2006, as you may know
> :-( So, I am glad to see you in the "Text importer" universe again ;-)
>
> > Then, it was necessary or convenient to copy and paste libwpd's source
> > code.  Now, I'm seeing there is a lot of copying, pasting, and renaming
> > strings like "WP" to "WPS."  To avoid forking what has turned out to be
> > an increasing amount of  code, any thoughts on consolidating efforts?
>
> Yes, copy&paste == evil. Andrew, quick look at our API would suggest
> following: Do not copy anything. Just for the time being make depend the
> libwps on classes from libwpd public headers (WPXProperty* family
> classes, WPXString, WPXHLListenerImpl (and eventually WPXInputStream)
> interface class(es),...). It is quite possible that one could extract
> the framework for the next ABI breakage from the libwpd-0.8.so into a
> separage libwpd-framework-0.9.so, but as far as I am concerned, my todo
> list is still long enough and the libwpd API is written in written in
> stone for at least 1 year more. I have some ideas for API changes for
(Continue reading)

William Lachance | 18 Aug 21:47
Picon
Gravatar

Re: libwps for Microsoft Works (.wps)

On 8/18/06, Ariya Hidayat <ariya.hidayat@...> wrote:
> Hi Andrew and others,
>
> Here is my 2 cents: "every file format is unique".
>
> I doubt it's quite difficult to merge two libraries which handle two
> different file formats, unless the formats are very similar (e.g. WP 5
> and WP 6). Thus, you need to have two libraries which - at some point
> - will differ a lot, depending on the format that you would want to
> handle.
>
> So in your case, yes, copy-and-paste in the beginning can't be
> avoided. But I believe (I'm glad if I'm proven wrong here), soon you
> need to invent some data structures unique to WPS which do not have
> their counterpart in WPD or which are difficult to handle if they were
> to be merged.
>
> Years ago I tried to make "an interface which rules them all" for all
> major word processor file formats. Of course, this attempt failed
> miserably (too complicated and humanly unmanageable), and hence my
> theory above.

Certainly there are some things that wouldn't be useful to share. The
massive amount of WordPerfect byte-group code in libwpd probably isn't
going to be very useful in an MsWorks importer, for example. :) I do
think that the WPXHLListenerImpl interface would be a very good thing
to share between projects though. It just makes creating an
OpenOffice.org document so much easier..
--

-- 
William Lachance
(Continue reading)

Andrew Ziem | 18 Aug 21:58
Favicon

Re: libwps for Microsoft Works (.wps)

William Lachance wrote:
> On 8/18/06, Fridrich Strba <fridrich.strba@...> wrote:
>> Andrew Ziem wrote:
>> > I have started* a Microsoft Works (.wps) document importer.  Since
>> > libwpd has been incorporated into three word processors, I thought I'd
>> > emulate the libwpd API to make it easy to implement support for Works.
>>
>> Very wise approach :-) It goes in line with Will's vision from this
>> e-mail
>> http://lists.ximian.com/pipermail/openoffice/2004-October/000556.html
>> and with an approach that I was trying to urge the OOo Google SoC
>> mentors to consider
>> http://sw.openoffice.org/servlets/ReadMsg?list=dev&msgNo=1241
>> Unfortunatelly, due to a low number of slots allocated to OOo, none of
>> the "Text importer" projects went through for SoC 2006, as you may know
>> :-( So, I am glad to see you in the "Text importer" universe again ;-)
>>
>> > Then, it was necessary or convenient to copy and paste libwpd's source
>> > code.  Now, I'm seeing there is a lot of copying, pasting, and 
>> renaming
>> > strings like "WP" to "WPS."  To avoid forking what has turned out 
>> to be
>> > an increasing amount of  code, any thoughts on consolidating efforts?
>>
>> Yes, copy&paste == evil. Andrew, quick look at our API would suggest
>> following: Do not copy anything. Just for the time being make depend the
>> libwps on classes from libwpd public headers (WPXProperty* family
>> classes, WPXString, WPXHLListenerImpl (and eventually WPXInputStream)
>> interface class(es),...). It is quite possible that one could extract
>> the framework for the next ABI breakage from the libwpd-0.8.so into a
(Continue reading)

Andrew Ziem | 19 Aug 06:57
Favicon

Re: libwps for Microsoft Works (.wps)

Fridrich Strba wrote:
> Even the most ugly code that has some desired functionality is worth
> showing. IMHO, technical discussions around an existing, though
> imperfect, code are really useful for one's growth ;-) And I know what I
> am saying when I speak about imperfect code from my own hacking
> experience :-)
>   
Then here's the proposed ugly code.  So far, not much to look at.  This 
code was just an experiment.

Andrew
Attachment (wps_test.cpp): text/x-c++src, 12 KiB

all: wps_test

wps_test: wps_test.cpp
	g++ -o wps_test wps_test.cpp `pkg-config glib-2.0 --cflags` -I/usr/include/libgsf-1 -L/usrlib -g
-Wall -lgsf-1 
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libwpd-devel mailing list
(Continue reading)

Fridrich Strba | 19 Aug 15:08
Picon
Favicon

Re: libwps for Microsoft Works (.wps)


Come, on, Will, write the specifications, maybe someone will find it
interesting to implement. If not, I will try to push it as a Google SoC
2007 project :-)

Cheers

Fridrich

>> Of course,
>> there is the problem of figuring out how to properly factor all this
>> into a seperate library (libtextdocumentfilter?), which I don't really
>> have time to describe in detail right now. If people are really
>> interested in going in this direction, I could write a specification
>> on how I envision this working. Be warned: implementing the
>> specification would probably be a fair amount of work. :)

Gmane