Alex Boese | 27 Mar 14:08 2015
Picon

Etree to string problematic?

I noticed a strange behavior, and can only describe it as the code I write is fully owned my those who employ me.

I was utilizing an lxml iterator routine to go through all of the nodes in some xml documents to look for
differences. Now, because of the rigor of this computation, I had decided to convert some of these
elements to string using the etree function by the same name.

What I have observed in practice, but not expected, is that sometimes the tostring function will append a
carriage return to the output. This seems to occur when there is white space between the closing tag and the
next tag, which in this case was another closing tag.

So if I had two duplicate documents which I nicened using "xmllint --format filename" and added a carriage
return to the second which was just outside of the closing tag, something like this might occur. So
everything between tags would be precisely the same, and then an extra carriage return after end tag would
cause the tostring to output different.

I'm using a version of Python 2. I could probably post version of lxml and libxml2 additionally. Has anyone
had this experience? 

Sent from my Planet
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Burak Arslan | 20 Mar 17:50 2015
Picon

how to make a unicode string valid for xml?

Hello,

I'm looking for a function like xml_unicode(some_unicode_string,
'ignore') that works like unicode(some_string, 'utf8', 'ignore'). Does
lxml export such a function? I looked around the source but I didn't see
any.

Best regards,
Burak
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Maciej Fijalkowski | 11 Mar 15:28 2015
Picon

Inclusion of lxml-cffi into lxml

Hi

What it would take to include lxml-cffi
(https://github.com/amauryfa/lxml/tree/cffi) as an official part of
lxml? It works better on PyPy (with the original lxml being slow and
prone to bugs, notably segfaulting for me)

Cheers,
fijal
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Omar Gutiérrez | 10 Mar 20:47 2015
Picon

Why the character apostrophe is not escaped?

I was wondering why the apostrophe is not automatically escaped as other characters?

For example:

< is transformed to &lt;

and

> is transformed to &gt;

but ' is not transformed to &apos;


Python              : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0)

lxml.etree          : (3, 3, 3, 0)
libxml used         : (2, 9, 1)
libxml compiled     : (2, 9, 1)
libxslt used        : (1, 1, 28)
libxslt compiled    : (1, 1, 28)


_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Markus Schöpflin | 3 Mar 10:23 2015

Replacing children via slicing

Hello,

given the following piece of code which replaces all A nodes in P with B:

 >>> tree = etree.fromstring("<P><A/><A/></P>")
 >>> l = tree.findall("A")
 >>> first, last = tree.index(l[0]), tree.index(l[-1])
 >>> tree[first : last + 1] = [ etree.Element("B") ]
 >>> etree.tostring(tree)
'<P><B/></P>'

This code works, but it feels kind of ugly.

Is there a more elegant way to replace a slice of the list of children 
(actually all children having the same name, but they always appear in a row) 
with another list of elements and keeping the document order intact?

TIA, Markus

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Axel | 2 Mar 14:14 2015
Picon

html parsing - cssselect

hi,

i'm new to python and cssselect.
i'm trying to get some links from a webpage.

http://pastebin.com/w35S8dJm

i have not the slightest idea why the specified selector does not return the exptected results.

the selector does return the expected results in chrome console and firebug console.

thanks in advance for your help!

axel
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Aaron Storm | 1 Mar 17:12 2015
Picon

Re: Programmatically accessing schema

Thanks for your reply Stefan. I'll look into the threads you pointed out in more detail.

I am kinda glad that I didn't miss an obvious solution =)

My need is rather simplistic, so maybe I can try to querying the schema.

I did read a little bit about "Abbot" from MONK -- http://quest.library.illinois.edu/monk/project/ -- per Martin's suggestion (Thanks!), but it seems like it is no longer active and it is not a direct solution to my issue. 

Thanks!
Aaron


On Sunday, March 1, 2015 7:00 AM, "lxml-request <at> lxml.de" <lxml-request <at> lxml.de> wrote:

Date: Sat, 28 Feb 2015 21:28:37 +0100
From: Stefan Behnel <stefan_ml <at> behnel.de>
To: lxml <at> lxml.de
Subject: Re: [lxml] Programmatically accessing schema
Message-ID: <54F224F5.3030200 <at> behnel.de>
Content-Type: text/plain; charset=utf-8

Aaron Storm schrieb am 28.02.2015 um 12:29:
> I went through the documentation and couldn't find any hints on this.
> And I am not sure what keywords to search on google. Is there a way to
> programmatically (or api?) query the XSD for a specific element to get
> its spec? For example, I would like to know if /MyContainer/Container2
> can be repeated. Or if /MyContainer/Container1/Item1/ text() is
> optional.

Similar questions have been discussed in the past. These are related, for
example:

http://thread.gmane.org/gmane.comp.python.lxml.devel/7318

http://thread.gmane.org/gmane.comp.python.lxml.devel/5619

In general, figuring out what an XML Schema specification allows is rather
difficult. It can be done for simple cases (it's XML, you can search in
it), but schemas can be arbitrarily complex. Sometimes there are multiple
ways to express something, and covering all cases is cumbersome.

Stefan



_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Aaron Storm | 28 Feb 12:29 2015
Picon

Programmatically accessing schema

I went through the documentation and couldn't find any hints on this. And I am not sure what keywords to search on google.

Is there a way to programmatically (or api?) query the XSD for a specific element to get its spec?
 
For example, I would like to know if /MyContainer/Container2 can be repeated. Or if /MyContainer/Container1/Item1/ text() is optional.
 
my.xml
```xml
<?xml version="1.0" encoding="UTF-8"?>
<MyContainer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="sample.xsd">
  <Container1>
    <Item1>String</Item1>
  </Container1>
  <Container2>
    <Item2>2014-12-17T09:30:47Z</Item2>
  </Container2>
  <Container2>
    <Item2>2015-01-17T09:30:47Z</Item2>
  </Container2>
</MyContainer>
```
 
sample.xsd
```
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="MyContainer">
    <xs:annotation>
      <xs:documentation>docs</xs:documentation>
    </xs:annotation>
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Container1">
          <xs:complexType>
            <xs:sequence>
              <xs:element minOccurs="0" name="Item1" type="xs:string"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <xs:element maxOccurs="unbounded" ref="Container2"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="Container2">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Item2" type="xs:dateTime"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>
```


cheers,
Aaron

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Pierpaolo Da Fieno | 25 Feb 20:21 2015
Picon

Isoschematron.Schematron not working as expected

Hallo everyone,
while working on a ISO Schematron validation routine, I noticed that reporting, even if triggered correctly, was not generating a False value.
If check the validation_report, the report element is being triggered correctly but the final result is True and the error_log is empty. 
Finally I went back to a very simple example included in isoschematron.Schematron docstring:

>>> from lxml import isoschematron >>> schematron = isoschematron.Schematron(etree.XML(''' ... <schema xmlns="http://purl.oclc.org/dsdl/schematron" > ... <pattern id="id_only_attribute"> ... <title>id is the only permitted attribute name</title> ... <rule context="*"> ... <report test=" <at> *[not(name()='id')]">Attribute ... <name path=" <at> *[not(name()='id')]"/> is forbidden<name/> ... </report> ... </rule> ... </pattern> ... </schema> ... ''')) >>> xml = etree.XML(''' ... <AAA name="aaa"> ... <BBB id="bbb"/> ... <CCC color="ccc"/> ... </AAA> ... ''') >>> schematron.validate(xml) 0 >>> xml = etree.XML(''' ... <AAA id="aaa"> ... <BBB id="bbb"/> ... <CCC/> ... </AAA> ... ''') >>> schematron.validate(xml) 1
Now if I run the above code I always get True, even with the invalid xml input. Same situation: validation_report is correct, but return value is True and error_log is empty.I'm running python 3.4.1 with lxml 3.5dev0. Same result with lxml 3.4.2.Am I doing something horribly wrong without realizing or is there actually a bug here?
Best RegardsPierpaolo Da Fieno
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Charlie Clark | 25 Feb 12:45 2015
Picon

Avoiding validation errors when offline

Hi,

while travelling yesterday I got a shock when I ran my tests and they  
failed with a validation error:

xmlschema.pxi:87: in lxml.etree.XMLSchema.__init__  
(src/lxml/lxml.etree.c:174765)
     ???
E   lxml.etree.XMLSchemaParseError: Element  
'{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName  
value '{http://purl.org/dc/terms/}created' does not resolve to a(n)  
element declaration., line 19

My machine had been playing up a bit so I initially thought this might  
have come from a corrupt file but this was, fortunately (or perhaps  
unfortunately), not the case. The error arises, I think, because a schema  
(see below) is parsed with an online resource.

In the particular case I can probably remove the validation test as it's  
effectively redundant but I wondered what is the best way to deal with  
this. Should the test be skipped if I can somehow workout that there is no  
network connection? Is there a way of caching the online schema? Can lxml  
be allowed to fail gracefully if it can't load a remote resource?

Charlie

   <xs:import namespace="http://purl.org/dc/elements/1.1/"
     schemaLocation="http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd"/>
   <xs:import namespace="http://purl.org/dc/terms/"
     schemaLocation="http://dublincore.org/schemas/xmls/qdc/2003/04/02/dcterms.xsd"/>
   <xs:import id="xml" namespace="http://www.w3.org/XML/1998/namespace"/>

   <xs:element name="coreProperties" type="CT_CoreProperties"/>

   <xs:complexType name="CT_CoreProperties">
     <xs:all>
       <xs:element name="category" minOccurs="0" maxOccurs="1"  
type="xs:string"/>
       <xs:element name="contentStatus" minOccurs="0" maxOccurs="1"  
type="xs:string"/>
       <xs:element ref="dcterms:created" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="dc:creator" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="dc:description" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="dc:identifier" minOccurs="0" maxOccurs="1"/>
       <xs:element name="keywords" minOccurs="0" maxOccurs="1"  
type="CT_Keywords"/>
       <xs:element ref="dc:language" minOccurs="0" maxOccurs="1"/>
       <xs:element name="lastModifiedBy" minOccurs="0" maxOccurs="1"  
type="xs:string"/>
       <xs:element name="lastPrinted" minOccurs="0" maxOccurs="1"  
type="xs:dateTime"/>
       <xs:element ref="dcterms:modified" minOccurs="0" maxOccurs="1"/>
       <xs:element name="revision" minOccurs="0" maxOccurs="1"  
type="xs:string"/>
       <xs:element ref="dc:subject" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="dc:title" minOccurs="0" maxOccurs="1"/>
       <xs:element name="version" minOccurs="0" maxOccurs="1"  
type="xs:string"/>
     </xs:all>
   </xs:complexType>

--

-- 
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Martin Mueller | 24 Feb 19:25 2015

a 'tail' problem


I have a "tail" problem with the following XML fragment

<lg>
    <l>WHo doth desire the trump of fame, to sound vnto the Skies,</l>
    <l><pb/>Or els who seekes the holy place, where mighty Ioue he
lies,</l>
    <l>He must not by deceitfull mind, nor yet by puissant strength,</l>

</lg>

I want to turn the <pb> tag from child to previous sibling of <l> and use
this code

for element in tree.iter():
	if element.tag == 'pb':
		parent = element.getparent()
		grandparent = parent.getparent()
		position = grandparent.index(parent)
		parent.text = element.tail
		grandparent.insert(position , (element))

Some of it works but it fails to get rid of the pb tail and produces this
output:

<lg>
    <l>WHo doth desire the trump of same, to sound vnto the skies,</l>
    <pb/>Or els who seekes the holy place, where mighty Ioue he lies,
<l>Or els who seekes the holy place, where mighty Ioue he lies,</l>
    <l>He must not by deceitfull mind, nor yet by puissant strength,</l>

</lg>

If there is a with_tail=False solution, I don't know where to stick the
command. 

Martin Mueller
Professor emeritus of English and Classics
Northwestern University

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml

Gmane