Henrik Zagerholm | 2 May 16:39 2011
Picon

Declare doctype when parsing fragment

Hello list,

I'm using lxml to parse and rebuild some XHTML/HTML pages. I have a problem 
that the parser strips out all html entities if I do not specify a doctype.

I know now that I can use the load_dtd when parsing a full document but how do 
I do that with the fragment parser?

How can I specify a DTD when using lxml.html.fragment_fromstring()?

Cheers,
Henrik
Markus Feldmann | 4 May 13:49 2011
Picon
Picon

how to get attribute-values in a quick way

Hi All,

i am new to this language, i tried a bit but didnt get it to work.

Here my code:
     def oeffneDatei(self, pfad = "daten.xml"):
         """Beschreibung"""
         try:
             datei = file(pfad,  'r')
             domDoc = etree.parse(datei)
             datei.close()
             return domDoc
         except:
             print 'Datei '+pfad+' konnte nicht gefunden werden. Es 
wurde eine neue erstellt!'
             self.erstelleDatei()

     def ladeEinstellungen(self, bezeichnung="Einstellung1"):
         """Beschreibung"""
         for attribute in self.domDoc.findall("gruppe").items():
             print attribute
#            return attribute, attribute.get('client'), 
attribute.get('server')

My xml file contains the needed informations:
<?xml version="1.0" ?>
<gruppe bezeichnung="Einstellung1" server="feld-server" 
client="feld-bertlap">
     <eintrag bezeichnung="Eigene Dateien">
         <benutzer user1="markus" user2="maria" user3="bernard" 
(Continue reading)

Markus Feldmann | 4 May 14:28 2011
Picon
Picon

Re: how to get attribute-values in a quick way

First, I only want to get the attribute-values from the element 
"gruppe", How to in a quick way?
Bob Kline | 4 May 14:48 2011

Re: how to get attribute-values in a quick way

On 05/04/2011 08:28 AM, Markus Feldmann wrote:
> First, I only want to get the attribute-values from the element
> "gruppe", How to in a quick way?
>

How about something like this?

tree = etree.parse("daten.xml")
for g in tree.findall("*//gruppe"):
     for name, value in g.attrib.iteritems():
         print "%s=%s" % (repr(name), repr(value))

m.f.G.,
Bob Kline
jholg | 4 May 14:54 2011
Picon
Picon

Re: how to get attribute-values in a quick way

Hi,

since <gruppe/> is the root element in your example you could directly get at the attributes like this:

>>> doc = etree.parse(f)
>>> doc.getroot().attrib
{'bezeichnung': 'Einstellung1', 'client': 'feld-bertlap', 'server': 'feld-server'}
>>>

Note that findall() works on sub-elements, so you won't get the element itself on which you call findall():

>>> print doc.findall.__doc__
findall(self, path)

        Finds all elements matching the ElementPath expression.  Same as
        getroot().findall(path).

>>> print doc.getroot().findall.__doc__
findall(self, path)

        Finds all matching subelements, by tag name or path.

>>>

I tend to rather use XPath instead of lxml's ElementPath so you might do s.th. like:

>>> doc.xpath('//gruppe/ <at> *')
['Einstellung1', 'feld-server', 'feld-bertlap']
>>> for attrResult in doc.xpath('//gruppe/ <at> *'):
...     print attrResult.attrname
(Continue reading)

Markus Feldmann | 4 May 17:32 2011
Picon
Picon

Re: how to get attribute-values in a quick way

Am 04.05.2011 14:54, schrieb jholg <at> gmx.de:
>
> I tend to rather use XPath instead of lxml's ElementPath so you might do s.th. like:
>
>>>> doc.xpath('//gruppe/ <at> *')
> ['Einstellung1', 'feld-server', 'feld-bertlap']
>>>> for attrResult in doc.xpath('//gruppe/ <at> *'):
> ...     print attrResult.attrname

Thanks,

can you explain the syntax of the signs in '//gruppe/ <at> *' ? Or where it 
is documented on the homepage http://lxml.de/index.html?

regards Markus
Jason Viers | 4 May 17:37 2011
Picon

Re: how to get attribute-values in a quick way

On 5/4/2011 11:32, Markus Feldmann wrote:
> can you explain the syntax of the signs in '//gruppe/ <at> *' ? Or where it
> is documented on the homepage http://lxml.de/index.html?

XPath is a w3 standard, so you won't find it specifically documented on 
lxml's site.  There are some good references & tutorials throughout the 
web, though.
http://www.w3schools.com/xpath/xpath_syntax.asp

Jason
Markus Feldmann | 4 May 17:38 2011
Picon
Picon

Re: how to get attribute-values in a quick way

I forgot to wrote that i only need the attributes of the element 
"gruppe" which contains the attribute "Einstellung1". There can can be 
more than one element "gruppe". So the next element "gruppe" would have 
the attribute "Einstellung2" and so on.

How to filter this quick and integrate this in my "for loop"?

regards Markus
Terry Brown | 4 May 17:40 2011
Picon

Re: how to get attribute-values in a quick way

On Wed, 04 May 2011 17:32:47 +0200
Markus Feldmann <feldmann_markus <at> gmx.de> wrote:

> can you explain the syntax of the signs in '//gruppe/ <at> *' ? Or where it 
> is documented on the homepage http://lxml.de/index.html?

http://www.w3schools.com/xpath/default.asp
http://www.w3.org/TR/xpath/
http://en.wikipedia.org/wiki/XPath

Cheers -Terry
Jason Viers | 4 May 17:47 2011
Picon

Re: how to get attribute-values in a quick way

On 5/4/2011 11:38, Markus Feldmann wrote:
> I forgot to wrote that i only need the attributes of the element
> "gruppe" which contains the attribute "Einstellung1". There can can be
> more than one element "gruppe". So the next element "gruppe" would have
> the attribute "Einstellung2" and so on.
> How to filter this quick and integrate this in my "for loop"?
doc.xpath('//gruppe[ <at> Einstellung1]/ <at> *')

This should select all the gruppe elements that have an attribute named 
"Einstellung1", and then selects all that element's attributes.

Jason

Gmane