Stefan Behnel | 1 Jul 15:20
Picon
Favicon
Gravatar

Re: lxml 1.3 coming up

Hi Holger,

jholg <at> gmx.de wrote:
> Find attached a patch that:
> 
> - changes the above to apply xsi:nil="true" for None value arguments

Ok.

> - lets DataElement() graciously handle ObjectifiedDataElement arguments,
> keeping their attributes intact, if not overridden by the DataElement()
> args. This also reuses existing xsi:type or py:pytype information, unless
> _pytype and/or _xsi are provided as parameters to DataElement()
> 
> Previously, DataElement() cut off all attributes if given an
> ObjectifiedDataElement instance.

Ok.

> - Type-checks the _value against the given type hint:
> You will run into the error anyway - sooner or
> later - when accessing the .pyval in any way, so why not during
> instantiation.

Ok.

> Tests are included for the described behaviour.

Cool, thanks.

(Continue reading)

Stefan Behnel | 2 Jul 10:32
Picon
Favicon
Gravatar

Re: Some XPath questions...

Hi Ian,

just to comment on your actual first post in this thread, which I kinda
oversaw because of the later discussion.

I think this is pretty cool stuff and I love to have this in lxml. The html
module really seems to be getting somewhere. I think we shouldn't even wait
too long with a release so that we get some more feedback on the new APIs.
Maybe I should fix lxml's versioning so that we can put out a 2.0alpha1 (and
not only alpha, beta, final).

Ian Bicking wrote:
> div:contains('celia') -- means a div where the textual content has the 
> word 'celia' in it, case insensitive.  At least, I think it's case 
> insensitive -- the CSS spec is annoyingly vague, but implementations 
> seem to work like this.  I translate this to:
> 
>    descendant-or-self::div[contains(css:lower-case(string(.)), 'celia']
> 
> I added the lower-case function like:
> 
>    def _make_lower_case(context, s):
>        return s.lower()
>    etree.FunctionNamespace("css")['lower-case'] = _make_lower_case

"css" is not the namespace, it's the prefix. You can do this:

   ns = etree.FunctionNamespace("http://my/css/namespace")
   ns.prefix = "css"
   ns['lower-case'] = _make_lower_case
(Continue reading)

Stefan Behnel | 2 Jul 18:35
Picon
Favicon
Gravatar

lxml 1.3.1 on cheeseshop

Hi all,

I just released lxml 1.3.1. This is a bugfix release for the stable 1.3
series. Changelog follows.

Have fun,
Stefan

1.3.1 (2007-07-02)
==================

Features added
--------------

* objectify.DataElement now supports setting values from existing data
  elements (not just plain Python types) and reuses defined namespaces etc.

* E-factory support for lxml.objectify (``objectify.E``)

Bugs fixed
----------

* Better way to prevent crashes in Element proxy cleanup code

* objectify.DataElement didn't set up None value correctly

* objectify.DataElement didn't check the value against the provided type hints

* Reference-counting bug in ``Element.attrib.pop()``
(Continue reading)

Ian Bicking | 2 Jul 19:21
Gravatar

Re: Some XPath questions...

Stefan Behnel wrote:
> Hi Ian,
> 
> just to comment on your actual first post in this thread, which I kinda
> oversaw because of the later discussion.
> 
> I think this is pretty cool stuff and I love to have this in lxml. The html
> module really seems to be getting somewhere. I think we shouldn't even wait
> too long with a release so that we get some more feedback on the new APIs.
> Maybe I should fix lxml's versioning so that we can put out a 2.0alpha1 (and
> not only alpha, beta, final).

Yeah, I was thinking about writing up a summary of things that need to 
be done in the html package; there's still some outstanding stuff, but 
not too much.  The clean module needs to be cleaned up (I'm thinking of 
moving from a function to a class).  I'd like to make the usedoctest 
hack a little more general, as elsewhere I'm now using a similar hack to 
enable ELLIPSIS, and I'd like them not to conflict.  And then some docs, 
but I guess that's it.

> Ian Bicking wrote:
>> div:contains('celia') -- means a div where the textual content has the 
>> word 'celia' in it, case insensitive.  At least, I think it's case 
>> insensitive -- the CSS spec is annoyingly vague, but implementations 
>> seem to work like this.  I translate this to:
>>
>>    descendant-or-self::div[contains(css:lower-case(string(.)), 'celia']
>>
>> I added the lower-case function like:
>>
(Continue reading)

Ian Bicking | 3 Jul 01:26
Gravatar

Re: Some XPath questions...

Stefan Behnel wrote:
>> So when I use // it works.  Huh.  I prefer descendant-or-self, because I 
>> find it peculiar to do a search from the root when you've called the 
>> method on some particular element (that may not be at the root).
> 
> There's also ".//*".

That seems to be equivalent to //*, i.e., // goes directly to the root 
regardless of context.

>>>>>> div:empty (no children, including text, maybe not including whitespace).
>>>>> Ouch. let me think about that one.
>>>> Yeah, I couldn't figure that one out.  I thought this might work:
>>>>      >>> xpath('E:empty')
>>>>      e[count(./children::*) = 0 and string(.) = '']
>>>> But maybe I don't understand how count() works; this isn't a valid XPath 
>>>> expression.
>>> You want "child" not "children". Using normalize-space(.) instead of
>>> string(.) will exclude whitespace. This does assume you are ignoring
>>> comments and PIs; I believe that's the behavior you want.
>> Cool, that seems to work right.
> 
> What about "e[not(*) and not(normalize-space())]" ?

Yes, that works too.

>> One query I'm realizing might be really hard (maybe too hard in XPath) 
>> is *:first-of-type, *:last-of-type, and *:only-of-type, since they match 
>> in a funny sort of way.  You can't really do:
>>
(Continue reading)

Ian Bicking | 3 Jul 01:45
Gravatar

Re: Some XPath questions...

Mike Meyer wrote:
> In <4689898E.9080509 <at> colorstudy.com>, Ian Bicking <ianb <at> colorstudy.com> typed:
>> Stefan Behnel wrote:
>>>> So when I use // it works.  Huh.  I prefer descendant-or-self, because I 
>>>> find it peculiar to do a search from the root when you've called the 
>>>> method on some particular element (that may not be at the root).
>>> There's also ".//*".
>> That seems to be equivalent to //*, i.e., // goes directly to the root 
>> regardless of context.
> 
> Not quite. '//*' always goes to the root. './/*' starts at the current
> node and matches from there down. If you always test at the root of
> the document, they'll look the same.

It seems to be changing the results when I replace 
'descendant-or-self::' with './/'.  I want to include the current node 
if it matches; at least to me, that seems most logical.  Also necessary 
when I was doing microformat parsing, as a single element can have 
multiple roles.  It seems like .// excludes the current node, only 
looking at descendants.

>>>>>>>> div:empty (no children, including text, maybe not including whitespace).
>>>>>>> Ouch. let me think about that one.
>>>>>> Yeah, I couldn't figure that one out.  I thought this might work:
>>>>>>      >>> xpath('E:empty')
>>>>>>      e[count(./children::*) = 0 and string(.) = '']
>>>>>> But maybe I don't understand how count() works; this isn't a valid XPath 
>>>>>> expression.
>>>>> You want "child" not "children". Using normalize-space(.) instead of
>>>>> string(.) will exclude whitespace. This does assume you are ignoring
(Continue reading)

Stefan Behnel | 3 Jul 08:54
Picon
Favicon
Gravatar

Re: Some XPath questions...


Ian Bicking wrote:
>>>>>>>      >>> xpath('E:empty')
>>>>>>>      e[count(./children::*) = 0 and string(.) = '']
>>>>>>> But maybe I don't understand how count() works; this isn't a
>>>>>>> valid XPath expression.
>>>>>> You want "child" not "children". Using normalize-space(.) instead of
>>>>>> string(.) will exclude whitespace. This does assume you are ignoring
>>>>>> comments and PIs; I believe that's the behavior you want.
>>>>> Cool, that seems to work right.
>>>> What about "e[not(*) and not(normalize-space())]" ?
>>> Yes, that works too.
>>
>> That's the 'implicit conversion' I was talking about. You're relying
>> on 0 and the empty string being false. It's a standard idiom, and
>> pythonic, but I'm not sure you want to use it in automatically
>> generated code, since it means you can't generalize the code from "has
>> 0 children" to "has n children".
> 
> In this case it's a fixed expression used for e:empty, and nothing else,
> so it seems fine.  And possibly makes the resulting expression a bit
> easier to recognize from its CSS roots.

It's also likely faster. I don't think libxml2 optimises the comparisons, so
looking for "not(*)" can stop false after the first node, while
"count(./child::*) = 0" needs to count all children and then sees that, oh,
the number is bigger than 0.

Stefan
(Continue reading)

jholg | 3 Jul 09:43
Picon
Picon

lxml 1.3.1 setup.py bug

Hi,

the setup.py script in 1.3.1 seems to try to remove the dependency
on setuptools (which is a very good thing imho!) but fails:

Traceback (most recent call last):
  File "setup.py", line 7, in ?
    except pkg_resources.VersionConflict, e:
NameError: name 'pkg_resources' is not defined
1 lb54320 <at> adevp02 .../lxml-1.3 $ 

I must admit I don't fully undestand the intention of the relevant code portion, as it raises ImportError
even if pkg_resources import and version check runs smoothly; maybe this is the intended behaviour?

try:
    import pkg_resources
    try:
        pkg_resources.require("setuptools>=0.6c5")
    except pkg_resources.VersionConflict, e:
        from ez_setup import use_setuptools
        use_setuptools(version="0.6c5")
    from setuptools import setup
except ImportError:
    # not setuptools installed
    from distutils.core import setup

(Note: This is untested code, I have not tested with setuptools installed)

Oh, btw I couldn't find a 1.3.1 tag in the repository when trying to check out 1.3.1.

(Continue reading)

Stefan Behnel | 3 Jul 15:16
Picon
Favicon
Gravatar

Re: lxml 1.3.1 setup.py bug

Hi Holger,

jholg <at> gmx.de wrote:
> the setup.py script in 1.3.1 seems to try to remove the dependency on
> setuptools (which is a very good thing imho!) but fails:
> 
> Traceback (most recent call last): File "setup.py", line 7, in ? except
> pkg_resources.VersionConflict, e: NameError: name 'pkg_resources' is not
> defined 1 lb54320 <at> adevp02 .../lxml-1.3 $
> 
> I must admit I don't fully undestand the intention of the relevant code
> portion, as it raises ImportError even if pkg_resources import and version
> check runs smoothly; maybe this is the intended behaviour?

Ah, great. That was plain debug code. :)

Thanks, I just re-released the sources. Could you check if it works now?

> Oh, btw I couldn't find a 1.3.1 tag in the repository when trying to check
> out 1.3.1.

Luckily, yes. I'll tag it with the fix applied. :)

Stefan
jholg | 3 Jul 15:40
Picon
Picon

Re: lxml 1.3.1 setup.py bug

Hi Stefan,

> Thanks, I just re-released the sources. Could you check if it works now?

Works for me now.

Note: I get

Building lxml version 1.3.1-44702
/apps/prod/lib/python2.4/distutils/dist.py:236: UserWarning: Unknown distribution option: 'zip_safe'
  warnings.warn(msg)

but simply ignore it because I bet this is just some setuptools-related stuff and can be safely ignored by
plain-old-distutillers like me.

Holger

--

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Gmane