Ian Bicking | 2 Jan 21:34 2009

lxml build problem, -arch ppc -arch i386

Martin (copied) has been having a problem building Deliverance/lxml on a 
Mac using the latest static build stuff.  Given the error messages 
(http://paste.plone.org/25648 for /usr/bin/python and 
http://paste.plone.org/25646 with macports Python), it seems like it 
might be related to the architecture.  The compilation uses "-arch ppc 
-arch i386" pretty much unconditionally (in buildlibxml.py).

Right now the Deliverance installation procedure is a bit opaque in this 
regard, so we couldn't really try just editing buildlibxml.py and 
rerunning.  But I'm wondering why both -arch options are always there? 
Is the macports Python also a fat binary, or should it be contingent on 
which Python we're using?

--

-- 
Ian Bicking : ianb <at> colorstudy.com : http://blog.ianbicking.org
Michael Guntsche | 3 Jan 00:53 2009

Re: lxml build problem, -arch ppc -arch i386


On Jan 2, 2009, at 21:34, Ian Bicking wrote:

> Martin (copied) has been having a problem building Deliverance/lxml  
> on a
> Mac using the latest static build stuff.  Given the error messages
> (http://paste.plone.org/25648 for /usr/bin/python and
> http://paste.plone.org/25646 with macports Python), it seems like it
> might be related to the architecture.  The compilation uses "-arch ppc
> -arch i386" pretty much unconditionally (in buildlibxml.py).
>
> Right now the Deliverance installation procedure is a bit opaque in  
> this
> regard, so we couldn't really try just editing buildlibxml.py and
> rerunning.  But I'm wondering why both -arch options are always there?
> Is the macports Python also a fat binary, or should it be contingent  
> on
> which Python we're using?

Both static libs are build as universal binary. If you have a python  
build that is NOT Universal only the lib for your arch will be linked  
during compilation.
I just tested current trunk with an universal python build (downloaded  
from python.org) and a i386 macports version and both worked without  
errors.
Looking at the the messages shows that there is something else going  
on, maybe it would be helpful to just see if lxml itself can be build  
on this system.

Kind regards,
(Continue reading)

Martin Aspeli | 3 Jan 01:45 2009
Picon
Picon

Re: lxml build problem, -arch ppc -arch i386

Michael Guntsche wrote:
> On Jan 2, 2009, at 21:34, Ian Bicking wrote:
> 
>> Martin (copied) has been having a problem building Deliverance/lxml  
>> on a
>> Mac using the latest static build stuff.  Given the error messages
>> (http://paste.plone.org/25648 for /usr/bin/python and
>> http://paste.plone.org/25646 with macports Python), it seems like it
>> might be related to the architecture.  The compilation uses "-arch ppc
>> -arch i386" pretty much unconditionally (in buildlibxml.py).
>>
>> Right now the Deliverance installation procedure is a bit opaque in  
>> this
>> regard, so we couldn't really try just editing buildlibxml.py and
>> rerunning.  But I'm wondering why both -arch options are always there?
>> Is the macports Python also a fat binary, or should it be contingent  
>> on
>> which Python we're using?
> 
> Both static libs are build as universal binary. If you have a python  
> build that is NOT Universal only the lib for your arch will be linked  
> during compilation.
> I just tested current trunk with an universal python build (downloaded  
> from python.org) and a i386 macports version and both worked without  
> errors.
> Looking at the the messages shows that there is something else going  
> on, maybe it would be helpful to just see if lxml itself can be build  
> on this system.

It definitely can, in that I have built it using this recipe: 
(Continue reading)

Stefan Behnel | 5 Jan 15:48 2009
Picon

Re: Working with <?xml-stylesheet ... ?>

Hi,

Martin Aspeli wrote:
> once I get the HtmlProcessingInstruction 
> node, how can I get the value of its pseudo-attributes (href and type, 
> in this case)? The attr dict is empty...

As you say, they are not attributes. The content of a processing
instruction is application specific plain text, according to the XML
specification.

http://www.w3.org/TR/REC-xml/#sec-pi

While there is some simple support for the xml-stylesheet processing
instruction in plain lxml.etree, it's not currently enabled in lxml.html,
and it's not available for any other PI target. Your best bet is to parse
the PI content yourself (.target and .text properties).

Stefan
Ian Kallen | 5 Jan 16:44 2009
Picon

whitespace in lxml.html vs. lxml.html.soupparser

We're using CSSSelector to pull out document fragments. I noticed that
the fragments from lxml.html.soupparser parses don't have extra
whitespace (which is desirable) but fragments from lxml.html has extra
whitespace cruft. For example

w/soupparser:

"""<div class="post"><a name="8720086857907265707"/>
<p/><div/>Josh Bancroft over at <a
href="http://www.tinyscreenfuls.com/">TinyScreenfuls</a> puts together
a great <a href="http://www.tinyscreenfuls.com/2008/01/site-statistics-i-care-about-as-a-blogger/">roundup
of stats</a> that matter to bloggers with Google Analytics screen
shots and meaningful context.  The comments are helpful
too.<br/><br/>Highly recommended.<br/><br/>Technorati Tags: <a
href="http://technorati.com/tag/stats" rel="tag">Stats</a>,<br/><a
href="http://technorati.com/tag/bloggers"
rel="tag">Bloggers</a>,<br/><a
href="http://technorati.com/tag/blogging" rel="tag">Blogging</a><div/>
</div>"""

w/o soupparser:

"""<div class="post"><a name="8720086857907265707"/>&#13;
    &#13;
    <p/><div/>Josh Bancroft over at <a
href="http://www.tinyscreenfuls.com/">TinyScreenfuls</a> puts together
a great <a href="http://www.tinyscreenfuls.com/2008/01/site-statistics-i-care-about-as-a-blogger/">roundup
of stats</a> that matter to bloggers with Google Analytics screen
shots and meaningful context.  The comments are helpful
too.<br/><br/>Highly recommended.<br/><br/>Technorati Tags: <a
(Continue reading)

Ian Kallen | 5 Jan 16:45 2009
Picon

whitespace in lxml.html vs. lxml.html.soupparser

We're using CSSSelector to pull out document fragments. I noticed that
the fragments from lxml.html.soupparser parses don't have extra
whitespace (which is desirable) but fragments from lxml.html has extra
whitespace cruft. For example

w/soupparser:

"""<div class="post"><a name="8720086857907265707"/>
<p/><div/>Josh Bancroft over at <a
href="http://www.tinyscreenfuls.com/">TinyScreenfuls</a> puts together
a great <a href="http://www.tinyscreenfuls.com/2008/01/site-statistics-i-care-about-as-a-blogger/">roundup
of stats</a> that matter to bloggers with Google Analytics screen
shots and meaningful context.  The comments are helpful
too.<br/><br/>Highly recommended.<br/><br/>Technorati Tags: <a
href="http://technorati.com/tag/stats" rel="tag">Stats</a>,<br/><a
href="http://technorati.com/tag/bloggers"
rel="tag">Bloggers</a>,<br/><a
href="http://technorati.com/tag/blogging" rel="tag">Blogging</a><div/>
</div>"""

w/o soupparser:

"""<div class="post"><a name="8720086857907265707"/>&#13;
    &#13;
    <p/><div/>Josh Bancroft over at <a
href="http://www.tinyscreenfuls.com/">TinyScreenfuls</a> puts together
a great <a href="http://www.tinyscreenfuls.com/2008/01/site-statistics-i-care-about-as-a-blogger/">roundup
of stats</a> that matter to bloggers with Google Analytics screen
shots and meaningful context.  The comments are helpful
too.<br/><br/>Highly recommended.<br/><br/>Technorati Tags: <a
(Continue reading)

Stefan Behnel | 6 Jan 10:57 2009
Picon

Re: whitespace in lxml.html vs. lxml.html.soupparser

Hi,

Ian Kallen wrote:
> We're using CSSSelector to pull out document fragments. I noticed that
> the fragments from lxml.html.soupparser parses don't have extra
> whitespace (which is desirable) but fragments from lxml.html has extra
> whitespace cruft. For example
> 
> w/soupparser:
> 
> """<div class="post"><a name="8720086857907265707"/>
> <p/><div/>Josh Bancroft over at <a
> href="http://www.tinyscreenfuls.com/">TinyScreenfuls</a> puts together
> a great <a href="http://www.tinyscreenfuls.com/2008/01/site-statistics-i-care-about-as-a-blogger/">roundup
> of stats</a> that matter to bloggers with Google Analytics screen
> shots and meaningful context.  The comments are helpful
> too.<br/><br/>Highly recommended.<br/><br/>Technorati Tags: <a
> href="http://technorati.com/tag/stats" rel="tag">Stats</a>,<br/><a
> href="http://technorati.com/tag/bloggers"
> rel="tag">Bloggers</a>,<br/><a
> href="http://technorati.com/tag/blogging" rel="tag">Blogging</a><div/>
> </div>"""
> 
> w/o soupparser:
> 
> """<div class="post"><a name="8720086857907265707"/>&#13;
>     &#13;
>     <p/><div/>Josh Bancroft over at <a
> href="http://www.tinyscreenfuls.com/">TinyScreenfuls</a> puts together
> a great <a href="http://www.tinyscreenfuls.com/2008/01/site-statistics-i-care-about-as-a-blogger/">roundup
(Continue reading)

Volker Paulsen | 6 Jan 11:01 2009
Picon

Re: lxml 2.1.4/2.2beta1 Solaris 9 segv in test-suite

Hi Stefan,

On Tue, Dec 23, 2008 at 07:38:52AM +0100, Stefan Behnel wrote:
> Volker Paulsen wrote:
> > I just compiled lxml-2.1.4 (and lxml-2.2beta)
> > with gcc 4.2.4 against
> > 
> >   - libxml2-2.7.2
> >   - libxslt-1.1.24
> > 
> > Unfortunately the test "test_schematron_invalid_schema_empty" causes a
> > segmentation violation with Python 2.5 and Python 2.6;
> > 
> > Please find a gdb backtrace for Python 2.6 and lxml-2.1.4 (and
> > lxml-2.2beta) attached.
> 
> I don't think I've seen this before, might be specific to Solaris. From the
> stack trace, it's not sure that the problem is in lxml, as the error is
> handled purely inside libxml2 up to that point.
> 
> I'd say you're safe if you don't use schematron (which most people won't
> run into anyway). Could you try to reproduce this with 'xmllint' (comes
> with libxml2) and the empty schema given by the test case?
> 
> 	<schema xmlns="http://purl.oclc.org/dsdl/schematron" />
> 
> That would allow us to see if it's a problem with libxml2.

Actually I am not an XML-Crack...

(Continue reading)

Stefan Behnel | 6 Jan 11:14 2009
Picon

Re: lxml 2.1.4/2.2beta1 Solaris 9 segv in test-suite

Hi,

Volker Paulsen wrote:
>     $ cat schematron.dsdl
>     <schema xmlns="http://purl.oclc.org/dsdl/schematron" />
> 
>     $ /usr/local/bin/xmllint --version
>     /usr/local/bin/xmllint: using libxml version 20702
>        compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N
Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron
Modules Debug Zlib
> 
>     $ /usr/local/bin/xmllint schematron.dsdl
>     <?xml version="1.0"?>
>     <schema xmlns="http://purl.oclc.org/dsdl/schematron"/>
> 
>     $ /usr/local/bin/xmllint --valid schematron.dsdl
>     schematron.dsdl:1: validity error : Validation failed: no DTD found !
>     <schema xmlns="http://purl.oclc.org/dsdl/schematron" />
>                                                          ^ 
>     <?xml version="1.0"?>
>     <schema xmlns="http://purl.oclc.org/dsdl/schematron"/>
> 
> Is this helpful?

Almost. :)

Try:

     $ /usr/local/bin/xmllint --schematron schematron.dsdl schematron.dsdl
(Continue reading)

Volker Paulsen | 6 Jan 11:29 2009
Picon

Re: lxml 2.1.4/2.2beta1 Solaris 9 segv in test-suite

Hi,

On Tue, Jan 06, 2009 at 11:14:57AM +0100, Stefan Behnel wrote:
> > Is this helpful?
> 
> Almost. :)
> Try:
>      $ /usr/local/bin/xmllint --schematron schematron.dsdl schematron.dsdl
> Works for me here, also with libxml2 2.7.2.

There we are:

    $ /usr/local/bin/xmllint --schematron schematron.dsdl schematron.dsdl
    schematron.dsdl:1: element schema: Schemas parser error : The schematron document 'schematron.dsdl'
has no pattern
    Schematron schema schematron.dsdl failed to compile
    <?xml version="1.0"?>
    <schema xmlns="http://purl.oclc.org/dsdl/schematron"/>

Regards,
Volker Paulsen
--

-- 
  OrbiTeam Software GmbH & Co. KG           http://www.orbiteam.de/
  () Ascii Ribbon Campaign
  /\ Support plain text e-mail

Gmane