Stefan Behnel | 1 Mar 2006 09:59
Picon

Updated parser API

Hi,

I updated the parser API according to the discussions (and the proposal of
Fredrik) that we had in November. It now uses an XMLParser class that simply
builds the libxml2 parse options in the constructor. I also added a global
function "set_default_parser" that globally sets the default parser (options),
or resets them if the supplied parser is None.

Although the internal implementation may change later on, I think it is better
to have this API in place *now* (i.e. for 0.9), so that we can simply add more
features (i.e. keyword arguments) later on without changing the API itself.

Since we already discussed this, I applied it directly to the trunk. Note,
however, that currently not all parse options are backed by test cases. I
added one that tests namespace stripping (in the new file test_parser.py), but
considering the fact that most of the functionality is implemented entirely by
libxml2, I (lazily) thought it's sufficient to test that the API works in general.

Stefan
Dethe Elza | 1 Mar 2006 16:09
Gravatar

Re: Call for Contribution: lxml 0.9

The main feature I'd like to see would be an easy installer that  
include lxml's dependencies, maybe using easy_install.  It's a  
complex project and the installation is easy to get wrong.

Thanks for the work on this. Looking forward to 0.9.

--Dethe

"the city carries such a cargo of pathos and longing
  that daily life there vaccinates us against revelation"
  -- Pain Not Bread, The Rise and Fall of Human Breath
Gerald John M. Manipon | 1 Mar 2006 19:02
Picon
Picon
Favicon

extracting namespace prefix map dict

Hi,

Is there a way to extract a namespace prefix map from
an etree _Element, i.e.:

xmlString=
<?xml version="1.0"?>
<sciflo xmlns="http://genesis.jpl.nasa.gov/sciflo"
         xmlns:sfl="http://genesis.jpl.nasa.gov/sciflo"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

        ...
</sciflo>

and get a dict:
nsmapDict={'_default': "http://genesis.jpl.nasa.gov/sciflo",
'sfl': "http://genesis.jpl.nasa.gov/sciflo",
'xsd': "http://www.w3.org/2001/XMLSchema",
'xsi': "http://www.w3.org/2001/XMLSchema-instance"}

Currently I'm parsing the xml into a minidom and extracting
this info.  Any help is greatly appreciated.

Thanks,

Gerald
Stefan Behnel | 1 Mar 2006 21:18
Picon

Re: extracting namespace prefix map dict


Gerald John M. Manipon wrote:
> Is there a way to extract a namespace prefix map from
> an etree _Element, i.e.:
> 
> xmlString=
> <?xml version="1.0"?>
> <sciflo xmlns="http://genesis.jpl.nasa.gov/sciflo"
>         xmlns:sfl="http://genesis.jpl.nasa.gov/sciflo"
>         xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
> 
>        ...
> </sciflo>
> 
> and get a dict:
> nsmapDict={'_default': "http://genesis.jpl.nasa.gov/sciflo",
> 'sfl': "http://genesis.jpl.nasa.gov/sciflo",
> 'xsd': "http://www.w3.org/2001/XMLSchema",
> 'xsi': "http://www.w3.org/2001/XMLSchema-instance"}
> 
> Currently I'm parsing the xml into a minidom and extracting
> this info.  Any help is greatly appreciated.

Hi,

there isn't an API for that currently. It could be made available, but it
actually doesn't fit very well with the intentions of the ElementTree API.
ElementTree is not very concerned with prefixes at all, since it deploys James
Clark's tag notation ('{namespace}elementname').
(Continue reading)

Stefan Behnel | 1 Mar 2006 21:36
Picon

Re: Better Installer (was: Call for Contribution: lxml 0.9)


Dethe Elza wrote:
> The main feature I'd like to see would be an easy installer that include
> lxml's dependencies, maybe using easy_install.  It's a complex project
> and the installation is easy to get wrong.

Hmm, I'm not quite sure what could be done better here. What you'd have to do
for 0.9 is:

* install libxml2 and libxslt (which lxml can't do for you)
* tar zxf lxml-0.9.tar.gz
* cd lxml-0.9
* run "python setup.py install" (or bdist_egg or whatever you run normally)

That doesn't sound very error prone to me...

But then, that's mainly how it works on Linux. You seem to be on Apple, so I
imagine what you're looking for is a readily installable darwin port? Maybe
even without compilation?

I guess then we'd have to find someone who uses MacOS-X and can provide a port...

Stefan
Gerald John M. Manipon | 1 Mar 2006 22:07
Picon
Picon
Favicon

Re: extracting namespace prefix map dict

I'm just using it to pass into the xpath() method.

Stefan Behnel wrote:
> Gerald John M. Manipon wrote:
> 
>>Is there a way to extract a namespace prefix map from
>>an etree _Element, i.e.:
>>
>>xmlString=
>><?xml version="1.0"?>
>><sciflo xmlns="http://genesis.jpl.nasa.gov/sciflo"
>>        xmlns:sfl="http://genesis.jpl.nasa.gov/sciflo"
>>        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>>        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>>
>>       ...
>></sciflo>
>>
>>and get a dict:
>>nsmapDict={'_default': "http://genesis.jpl.nasa.gov/sciflo",
>>'sfl': "http://genesis.jpl.nasa.gov/sciflo",
>>'xsd': "http://www.w3.org/2001/XMLSchema",
>>'xsi': "http://www.w3.org/2001/XMLSchema-instance"}
>>
>>Currently I'm parsing the xml into a minidom and extracting
>>this info.  Any help is greatly appreciated.
> 
> 
> 
> Hi,
(Continue reading)

Dethe Elza | 2 Mar 2006 03:23
Gravatar

Re: Better Installer (was: Call for Contribution: lxml 0.9)

>> The main feature I'd like to see would be an easy installer that  
>> include
>> lxml's dependencies, maybe using easy_install.  It's a complex  
>> project
>> and the installation is easy to get wrong.
>
> Hmm, I'm not quite sure what could be done better here. What you'd  
> have to do
> for 0.9 is:
>
> * install libxml2 and libxslt (which lxml can't do for you)

Why not?  Other python extensions install their dependencies.

> * tar zxf lxml-0.9.tar.gz
> * cd lxml-0.9
> * run "python setup.py install" (or bdist_egg or whatever you run  
> normally)
>
> That doesn't sound very error prone to me...
>
> But then, that's mainly how it works on Linux. You seem to be on  
> Apple, so I
> imagine what you're looking for is a readily installable darwin  
> port? Maybe
> even without compilation?

OS X is my main platform, but I actually encountered trouble  
installing on Windows, where the steps were:

(Continue reading)

Stefan Behnel | 2 Mar 2006 07:48
Picon

Re: Better Installer under Windows


Dethe Elza wrote:
>>> The main feature I'd like to see would be an easy installer that include
>>> lxml's dependencies, maybe using easy_install.  It's a complex project
>>> and the installation is easy to get wrong.
>>
>> Hmm, I'm not quite sure what could be done better here. What you'd
>> have to do
>> for 0.9 is:
>>
>> * install libxml2 and libxslt (which lxml can't do for you)
> 
> Why not?  Other python extensions install their dependencies.

It's not a problem as long as it's only about Python dependencies. EasyInstall
can do that. It's not a problem if it's only self-contained C extensions for
Python. EasyInstall can do that, too.

However, libxml2 and libxslt are written in plain C, with their own further
dependencies and their installation very much depends on the operating system,
its version, the processor architecture, the availability of a C compiler, ...

It would be hard work for us to handle all of that in Python. And it's not up
to the developers of /lxml/ to provide better ways of installing its
dependencies under the various types of systems. If installing libxml2 is a
problem, it's a problem with libxml2, not lxml.

> OS X is my main platform, but I actually encountered trouble installing
> on Windows, where the steps were:
> 
(Continue reading)

Stefan Behnel | 2 Mar 2006 07:55
Picon

Re: extracting namespace prefix map dict


> Stefan Behnel wrote:
>> Gerald John M. Manipon wrote:
>>
>>> Is there a way to extract a namespace prefix map from
>>> an etree _Element, i.e.:
>>>
>>> xmlString=
>>> <?xml version="1.0"?>
>>> <sciflo xmlns="http://genesis.jpl.nasa.gov/sciflo"
>>>        xmlns:sfl="http://genesis.jpl.nasa.gov/sciflo"
>>>        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>>>        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>>>
>>>       ...
>>> </sciflo>
>>>
>>> and get a dict:
>>> nsmapDict={'_default': "http://genesis.jpl.nasa.gov/sciflo",
>>> 'sfl': "http://genesis.jpl.nasa.gov/sciflo",
>>> 'xsd': "http://www.w3.org/2001/XMLSchema",
>>> 'xsi': "http://www.w3.org/2001/XMLSchema-instance"}
>>>
>>> Currently I'm parsing the xml into a minidom and extracting
>>> this info.  Any help is greatly appreciated.
>>
>> there isn't an API for that currently. It could be made available, but it
>> actually doesn't fit very well with the intentions of the ElementTree
>> API.
>> ElementTree is not very concerned with prefixes at all, since it
(Continue reading)

Stefan Behnel | 2 Mar 2006 13:55
Picon

Clean up of extension function implementation

Hi all,

I did a lot of cleaning up of my code regarding extension functions and I hope
it's now pretty close to 'ready for merging'.

You can look at doc/extensions.txt in the scoder2 branch for some examples.

One problem, however, remains: the first argument to extension functions,
which previously contained the current XPath evaluator. I absolutely cannot
see a reason for adding this argument to the call. The only usable thing in
the evaluator is the 'evaluate(path)' method, but I wouldn't even bet on it
being re-entrant, so I can only hope that no existing code actually uses it.

I did not want to break any legacy code by happily changing the argument order
of the call, so I just kept that argument in there and added a line in the
documentation stating that it should not be used (reserved for future
extensions :). The new implementation simply passes None.

It looks a bit ugly that way, but, well, we /may/ still succeed in finding a
use for it some time after lxml 1.0 ...

Have fun,
Stefan

Gmane