jholg | 2 Oct 17:58 2007
Picon
Picon

Re: trunk schematron tests core dump (was: annotate, pyannotate, xsiannotate)

Hi,

> > > Schematron uses XPath a lot, so I wouldn't be surprised if this was
> > > related to
> > > the XPath bug in libxml2 2.6.27. Is there any chance you could switch
> to
> [...]
> Unfortunately, using the latest & greatest libxml2/libxslt (2.6.33/1.1.22)
> doesn't solve the problem for me.

I'm trying to get some sensible information but have real problems with debugging, as I'm seeing line
number information that is just plain wrong, though compiling with debugging on and everything, the
likes of:

(gdb) info source
Current source file is src/lxml/etree.c
Compilation directory is /home/lb54320/pydev/LXML/lxml/
Located in /home/lb54320/pydev/LXML/lxml/src/lxml/etree.c
Contains 90795 lines.
Source language is c.
Compiled with stabs debugging format.
(gdb) b etree.c:70850
No line 70850 in file "src/lxml/etree.c".
(gdb)

No idea what I'm doing wrong here, at the moment.

So the info on the crash does not get much better than that backtrace at the moment:

Program received signal SIGSEGV, Segmentation fault.
(Continue reading)

James Graham | 2 Oct 22:33 2007
Picon
Picon

Re: Tag name validation and HTML

Stefan Behnel wrote:
> James Graham wrote:
>> The development branch of lxml 2 appears to restrict the characters that may 
>> appear in a tag name. Whilst this may be appropriate for XML, it does not match 
>> the behavior of all common HTML UAs and, as such, does not match the current 
>> draft of the HTML 5 spec [1].
> 
> This is actually not as simple as it might seem. The Element factory cannot
> distinguish between XML and HTML tags, so it cannot switch off validation for
> a particular tag. So the conservative solution would be to actually follow the
> HTML5 spec, as it is a superset of the XML spec, an extremely broad one even.
> But then there's not much left that you could honestly call validation. Also,
> I would still want to restrict ":" in tag names, as this has been a source of
> problems way too often. So that would just leave spaces and any of ":/>" as
> invalid characters in tag names.

The : thing is difficult because HTML UAs are expected to deal with : in 
the tag name and there is content in the wild that depends on this being 
accepted; MS Office produces "HTML" containing tags like <o:p>, for 
example. Since I, and I guess others too, want to use lxml to process 
random content that may have colons in the tag names, hard failure for 
this case is a problem. To make matters worse it is possible that the 
HTML spec will change in the future to introduce some sort of 
namespacing feature which may or may not use colons.

Given all of this I would prefer it if it were possible to have an 
HTML-specific mode with much more liberal rules than the XML mode. This 
could then be adapted to support any namespacing features HTML grows in 
the future. For example, if one could do something like

(Continue reading)

Lawrence Oluyede | 3 Oct 15:31 2007
Picon

Namespace serialization patch

I had the same problem Anders Bruun Olsen had in this thread:
http://comments.gmane.org/gmane.comp.python.lxml.devel/2924

What I'd like to know if I have to wait for 2.0 completion (using the
alpha is not an option AFAIK) to use it or you plan to release an
interim 1.3.x version with that patch applied.

Thanks

--

-- 
Lawrence, oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair
Anders Bruun Olsen | 3 Oct 20:28 2007
Picon

Re: Namespace serialization patch

Lawrence Oluyede wrote:
> I had the same problem Anders Bruun Olsen had in this thread:
> http://comments.gmane.org/gmane.comp.python.lxml.devel/2924
> What I'd like to know if I have to wait for 2.0 completion (using the
> alpha is not an option AFAIK) to use it or you plan to release an
> interim 1.3.x version with that patch applied.

Building LXML from SVN is really rather straightforward and of course
includes the fixes for that particular problem as well as others.
See the download page for instructions on building from SVN.

--

-- 
Anders
Lawrence Oluyede | 3 Oct 21:47 2007
Picon

Re: Namespace serialization patch

> Building LXML from SVN is really rather straightforward and of course
> includes the fixes for that particular problem as well as others.
> See the download page for instructions on building from SVN.

I, personally, don't have a problem with that but AFAIK at work using
the SVN version is a lesser option than using the 2.0alpha.

--

-- 
Lawrence, oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair
Mike Meyer | 3 Oct 22:50 2007

Dealing with segfaults in lxml?

I'm getting crashes - by which I mean the python process is
segfaulting and, with some tweaking of GNU/Linux, leaving me a core
file - while using lxml to parse data.

Versions:

OS: RHEL 5
Python: 2.5.1 (custom built).
lxml: 1.3.3
libxml: 2.6.26 (both compiled and built)
libxslt: 1.1.17

[Yes, I know those are a bit out of date, but we had to give our
client host requirements months ago, and those were current at the
time, and changing them is a non-trivial process, and I've already
started on it, but I'd rather not do that if I can avoid it....]

Rebuilding python with OPTS=-g (I set that for the lxml build as
well), I can get a "where" output that points at lxml:

#0  0x00002aaaaf906c3a in rename ()
   from /usr/local/lib/python2.5/site-packages/lxml/etree.so
#1  0x00002aaaaf906be7 in rename ()
   from /usr/local/lib/python2.5/site-packages/lxml/etree.so
#2  0x00002aaaaf8ebdfe in rename ()
   from /usr/local/lib/python2.5/site-packages/lxml/etree.so
#3  0x00002aaaaf966a5c in findOrBuildNodeNs ()
   from /usr/local/lib/python2.5/site-packages/lxml/etree.so

The first problem is that this isn't repeatable. I've got test data
(Continue reading)

Steve Lianoglou | 4 Oct 00:50 2007
Picon

Re: Dealing with segfaults in lxml?

> I'm getting crashes - by which I mean the python process is
> segfaulting and, with some tweaking of GNU/Linux, leaving me a core
> file - while using lxml to parse data.
>
> Versions:
>
> OS: RHEL 5
> Python: 2.5.1 (custom built).
> lxml: 1.3.3
> libxml: 2.6.26 (both compiled and built)
> libxslt: 1.1.17

As an aside (addendum?, whatever ..) I recently got nailed w/  
segfaults and bus errors that seemed to not be 100% reproducible on  
OS X.

I built lxml against:

libxml 2.6.30
libxslt 1.1.22
python2.5.1(and python2.4.4)
lxml 1.3.4
(all using MacPorts)

My code was basically generating large(-ish -- though really not much  
bigger than 4 megs or so) documents like so (inspired from  
ElementTree examples):

import lxml.etree as ET
root = ET.Element('graph', **root_attribs)
(Continue reading)

Eric Tiffany | 4 Oct 01:43 2007
Picon

Re: Dealing with segfaults in lxml?

On OS X, you might actually be using the system libs rather than the newer
libs (in /opt/local/lib, if you are using MacOSPorts, for example).  I had
lots of segfault problems until I realized that even though lxml was
claiming it was running with the newer libs, the info was only based on what
it was built with.  At least, that's what it seemed like.

Anyway, all my (segfault) problems went away when I exported

DYLD_LIBRARY_PATH=/opt/local/lib

Into the environment where python was running.

Actually, python was running zope/plone, but I think this problem could be
similar to yours.

ET

On 10/3/07 6:50 PM, "Steve Lianoglou" <lists.steve <at> arachnedesign.net> wrote:

>> I'm getting crashes - by which I mean the python process is
>> segfaulting and, with some tweaking of GNU/Linux, leaving me a core
>> file - while using lxml to parse data.
>> 
>> Versions:
>> 
>> OS: RHEL 5
>> Python: 2.5.1 (custom built).
>> lxml: 1.3.3
>> libxml: 2.6.26 (both compiled and built)
>> libxslt: 1.1.17
(Continue reading)

Steve Lianoglou | 4 Oct 01:50 2007
Picon

Re: Dealing with segfaults in lxml?

> On OS X, you might actually be using the system libs rather than  
> the newer
> libs (in /opt/local/lib, if you are using MacOSPorts, for  
> example).  I had
> lots of segfault problems until I realized that even though lxml was
> claiming it was running with the newer libs, the info was only  
> based on what
> it was built with.  At least, that's what it seemed like.
>
> Anyway, all my (segfault) problems went away when I exported
>
> DYLD_LIBRARY_PATH=/opt/local/lib
>
> Into the environment where python was running.

Hmm .. interesting.

I was playing with DYLD_LIBRARY_PATH, but I thought that had to be  
set during compile time (of lxml).

Even though ... through my hunting on the intarweb, I came across a  
suggestion to use `otool` to see what libs were being used. So I  
tried like so:

$ otool -L /opt/local/Library/Frameworks/Python.framework/Versions/ 
Current/lib/python2.4/site-packages/lxml/etree.so
/opt/local/Library/Frameworks/Python.framework/Versions/Current/lib/ 
python2.4/site-packages/lxml/etree.so:
         /opt/local/lib/libxslt.1.dylib (compatibility version 3.0.0,  
current version 3.22.0)
(Continue reading)

Eric Tiffany | 4 Oct 04:06 2007
Picon

Re: Dealing with segfaults in lxml?

Check the man page for dyld, which notes

 DYLD_LIBRARY_PATH
     This is a colon  separated  list  of  directories  that  contain
     libraries.  The dynamic linker searches these directories before
     it searches the default locations for libraries. It  allows  you
     to test new versions of existing libraries.

     For  each  library that a program uses, the dynamic linker looks
     for it in each directory in DYLD_LIBRARY_PATH  in  turn.  If  it
     still  can't  find  the  library,  it  then  searches DYLD_FALL-
     BACK_FRAMEWORK_PATH and DYLD_FALLBACK_LIBRARY_PATH in turn.

     Use the -L option to otool(1).  to discover the  frameworks  and
     shared libraries that the executable is linked against.

I think otool is telling you what libs the .so would *like* to use, but the
environment will tell dyld where to look at runtime.  At least, that's the
way I interpret it.  Anyway, my segfaults and bus errors stopped.

ET

On 10/3/07 7:50 PM, "Steve Lianoglou" <lists.steve <at> arachnedesign.net> wrote:

>> On OS X, you might actually be using the system libs rather than
>> the newer
>> libs (in /opt/local/lib, if you are using MacOSPorts, for
>> example).  I had
>> lots of segfault problems until I realized that even though lxml was
>> claiming it was running with the newer libs, the info was only
(Continue reading)


Gmane