varun bhatnagar | 31 Jul 09:32 2014
Picon

How to iterate through nodes of xml through xslt

Hi,

I have two xml files.

File1.xml

<?xml version="1.0" encoding="UTF-8"?>
<InfoTag>
<Procedure attrProc="TestProcA" attrLevel="1">
      <downTime>
        <acceptableDownTime>
          <all/>
        </acceptableDownTime>
        <downTimePeriod time="600000000"/>
      </downTime>
    </Procedure>
    
 <Procedure attrProc="TestProcB" attrLevel="2">
      <downTime>
        <acceptableDownTime>
          <all/>
        </acceptableDownTime>
        <downTimePeriod time="600000000"/>
      </downTime>
    </Procedure>
</InfoTag>


File2.xml

<?xml version="1.0" encoding="UTF-8"?>
<InfoTag>
<Procedure attrProc="TestProcC" attrLevel="3">
      <downTime>
        <acceptableDownTime>
          <all/>
        </acceptableDownTime>
        <downTimePeriod time="600000000"/>
      </downTime>
    </Procedure>
    
 <Procedure attrProc="TestProcD" attrLevel="4">
      <downTime>
        <acceptableDownTime>
          <all/>
        </acceptableDownTime>
        <downTimePeriod time="600000000"/>
      </downTime>
    </Procedure>
</InfoTag>


I am trying to fetch an output file which looks like this:

Output.xml

<InfoTag>
<Procedure attrProc="1" attrLevel="### NOT UNIQUE ###">
      <downTime>
        <acceptableDownTime>
          <all/>
        </acceptableDownTime>
        <downTimePeriod time="600000000"/>
      </downTime>
    </Procedure>
 <Procedure attrProc="2" attrLevel="### NOT UNIQUE ###">
      <downTime>
        <acceptableDownTime>
          <all/>
        </acceptableDownTime>
        <downTimePeriod time="600000000"/>
      </downTime>
    </Procedure>
<Procedure attrProc="3" attrLevel="### NOT UNIQUE ###">
      <downTime>
        <acceptableDownTime>
          <all/>
        </acceptableDownTime>
        <downTimePeriod time="600000000"/>
      </downTime>
    </Procedure>
 <Procedure attrProc="4" attrLevel="### NOT UNIQUE ###">
      <downTime>
        <acceptableDownTime>
          <all/>
        </acceptableDownTime>
        <downTimePeriod time="600000000"/>
      </downTime>
    </Procedure>    
</InfoTag>

The number of Procedure tag (<Procedure>) can be different every time. So I have to read this tag every time from each xml and then merge it sequentially.
Can anyone tell me how to achieve this. How can I loop every Procedure tag and and append the attrProc attribute value in a sequential order?

Thanks,
BR,
Varun
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Ivan Pozdeev | 30 Jul 21:16 2014
Picon
Picon

Use the default namespace from nsmap

Hello Lxml,

A googling confirmes that I'm far from the only one bitten badly by the lack of
automatic usage of the default `xmlns'. I've written a wrapper that
auto-prefixes the tags (like many others, I suspect. it's in attachment).

Since starting usgin them by default right off the bat would be controversial,
i suggest to make it an option to `xpath()' and collect user feedback for some
time.

We can use my code for that. It doesn't handle the case where the
automatically-added prefix already exists but is written to be robust
otherwise.

Afterwards, we could start convincing libxml2 team to support NULL for
xmlNs::prefix based on existing user experience.

-- 
Best regards,
 Ivan                          mailto:vano <at> mail.mipt.ru
#parses .csproj files, they have a single `xmlns' at the root tag

import lxml.etree as lxml
e = lxml.parse(project_file)
r=e.getroot()
#there's no way to override default namespace in XPath (1.0 at least)
# so we have to keep prefixing all tags that don't already have a namespace and referencing the file's root namespace
#it's also assumed the queries given should always yield a single result on a correct file
def wrxpath():
	_nsm={'_':r.nsmap[None]}
	#regexes from http://www.w3.org/TR/xml11/#NT-Name ; except colon is excluded
	# from `_namestartchar' (http://www.w3.org/TR/xml11/#dt-name confirms it
	# can only be used to specify namespaces)
	_namestartchar=u'[A-Z] | _ | [a-z] | [\xC0-\xD6] | [\xD8-\xF6] | [\xF8-\u02FF] | [\u0370-\u037D] |
[\u037F-\u1FFF] | [\u200C-\u200D] | [\u2070-\u218F] | [\u2C00-\u2FEF] | [\u3001-\uD7FF] |
[\uF900-\uFDCF] | [\uFDF0-\uFFFD] | '+ \
		(u'[\U00010000-\U000EFFFF]' if sys.maxunicode>0xffff else u'[\uD800-\uDFFF]')	#in 'narrow
build' Python, chars beyond 0xffff
																					# are represented as UTF-16 surrogate pairs which are
																					# incorrect regex range syntax. So match chars
																					# that can be in the surrogate pairs instead
	_namechar=_namestartchar + u' | - | \. | [0-9] | \xB7 | [\u0300-\u036F] | [\u203F-\u2040]'
	_name= '(?: ' + _namestartchar+' ) (?: '+_namechar+' )*'
	#from all NameTest's (http://www.w3.org/TR/xpath/#NT-NameTest),

	# we're only interested in prefixing a standalone "*" and UnprefixedName's
	rx=re.compile(r'(^|/)(\*|(?:%s))(?=([^\w:]|$))'%_name,flags=re.VERBOSE)
	del _namestartchar,_namechar,_name
	def _wrxpath(root,xpath):
		xpath=rx.sub(r'\1_:\2',xpath)
		return root.xpath(xpath,namespaces=_nsm)[0]
	return _wrxpath
wrxpath=wrxpath()

#example: retrieve project properties
pg=wrxpath(e,'PropertyGroup[not( <at> condition)]')
print 'pg=',pg
for name in ('ProductName','ApplicationVersion','MinimumRequiredVersion'):
	t=wrxpath(pg,name)
	print name+"=",t.text
del t,name
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
J. Morris | 29 Jul 09:54 2014
Picon

Dash in encoding name inconsistency, lxml 3.3.5, OSX

Hi,

I’ve come across a weird inconsistency with lxml running on OSX 10.9.4.

Python              : sys.version_info(major=2, minor=7, micro=8, releaselevel='final', serial=0)
lxml.etree          : (3, 3, 5, 0)
libxml used         : (2, 9, 1)
libxml compiled     : (2, 9, 1)
libxslt used        : (1, 1, 28)
libxslt compiled    : (1, 1, 28)

This code should reproduce the problem:

from lxml import etree
parser = etree.HTMLParser(encoding='utf32’)

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-1-c072399cda15> in <module>()
      1 from lxml import etree
----> 2 parser = etree.HTMLParser(encoding='utf32')

/Users/jmorris/anaconda/envs/py27/lib/python2.7/site-packages/lxml/etree.so in
lxml.etree.HTMLParser.__init__ (src/lxml/lxml.etree.c:100669)()

/Users/jmorris/anaconda/envs/py27/lib/python2.7/site-packages/lxml/etree.so in
lxml.etree._BaseParser.__init__ (src/lxml/lxml.etree.c:93393)()

LookupError: unknown encoding: ‘utf32'

However, the following all work perfectly:

parser = etree.HTMLParser(encoding='utf-32')
parser = etree.HTMLParser(encoding='utf-16')
parser = etree.HTMLParser(encoding='utf16')
parser = etree.HTMLParser(encoding='utf8')
parser = etree.HTMLParser(encoding='utf-8’)

For some reason only the utf-32 encoding requires a dash in its encoding string. This happens in python 3.3
as well.

Is this a problem in lxml or is it with one of the underlying libraries?

I know it seems pretty minor, but it’s causing my tests to fail.

Any thoughts?

Thanks,

J.

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Fabio Sangiovanni | 28 Jul 15:59 2014
Picon

lxml 3.3.5 + static deps on CentOS 6.5

Hi everybody,

I'm trying to build lxml 3.3.5 with static deps on a CentOS 6.5 box, so far without success.
CentOS 6.5 ships with libxml2 2.7.6 and libxslt 1.1.26.
I'd like to get the latest versions of both (libxml2 2.9.1 and libxslt 1.1.28).
After a lot of experiments, I came up with the following way to install lxml:

CFLAGS="$CFLAGS -fPIC -lgcrypt -ldl -lgpg-error -lrt" STATIC_DEPS=true pip2.7 install lxml

-fPIC: seems to be mandatory on x86_64 (build fails otherwise)
-lrt: fixes an issue with old versions of GCC (4.4.7 on CentOS 6.5), that results in 'undefined symbol clock_gettime' on import.
-lgcrypt -ldl -lgpg-error: fixes similar import issues when libgcrypt is installed on the system (it's the output of libgcrypt-config --libs).

But still, I get an error on import, that I couldn't be able to fix:



Python 2.7.7 (default, Jun 16 2014, 15:17:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import html
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/lxml/html/__init__.py", line 42, in <module>
    from lxml import etree
ImportError: /usr/local/lib/python2.7/site-packages/lxml/etree.so: undefined symbol: libiconv
>>>



Is there some safe way to get static deps installed with lxml on a CentOS 6.5 system? Am I on the right track or am I totally missing something?
Should I maybe just use the system libs (their versions seems not to be recommended by the docs...)?

I'm using CPython 2.7.7.

Thanks for your help!

-- 
Fabio Sangiovanni
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Robert Onslow | 24 Jul 12:31 2014
Picon

XSLT and WSGI problems

Dear All
I see that there is a well know problem with XSLT and mod_wsgi in Apache.

Just to report that I am using XSLT, uwsgi and nginx, and am also getting a problem.

Is someone working on a solution. I am having to move off LXML completely for the moment ..

Robert
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Martin Mueller | 22 Jul 18:34 2014

problem with removing elements: bug or feature

I have run into the following problem with removing an element, and I
can't figure out whether it's a bug or whether I'm missing something.

I want to make a change to an element, depending on the next element, and
delete that next element. The change and deletion occur in a function.
The main code program like this:

if element.getnext() is not None \
				and element.getnext().get('rend') == 'superscript':
	element = processSuperscripts(element, tree, changelog)

The function first spells out the change and then orders the deletion;

parent = element.getnext().getparent()
parent.remove(element.getnext())

This code works the first time, but it doesn't work on successive
occasions. I have used a work around where I add a flag to the next
element and delete it in a separate loop. Thus my function ends

element.getnext().set('n', 'DELETE')

and the main program adds the loop:

for element in tree.iter(tei + 'w'):
    if element.get('n') == 'DELETE':
    parent = element.getparent()
    parent.remove(element)

This works: all instances of the first element are properly changed and
all instances of the second element are properly deleted.

Is there a way of making the deletion work inside the function without a
second "clean-up" loop?

Martin Mueller
Professor emeritus of English and Classics
Northwestern University

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
varun bhatnagar | 22 Jul 11:07 2014
Picon

Removing an element and strip space - XSLT

Hi,

I am trying to play around with python and xslt. I have an xml and I want to transform it to another xml by deleting its one element. The xml is pasted below:

<?xml version="1.0" encoding="UTF-8"?>
<testNode>
<nodeInfo>
      <nodePeriod nodeTime="600000000"/>
      <nodeBase base="0" />
    </nodeInfo>
</testNode>


I want to remove the <nodeBase> tag and this is how my xsl file looks like:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>

  <xsl:template match=" <at> *|node()">
    <xsl:copy>
      <xsl:apply-templates select=" <at> *|node()"/>
    </xsl:copy>
  </xsl:template>
  
  
<xsl:template match="/testNode/nodeInfo/nodeBase">
</xsl:template>

</xsl:stylesheet>

When I execute it my output looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<testNode>
<nodeInfo>
      <nodePeriod nodeTime="600000000"/>
      
    </nodeInfo>
</testNode>

I want to strip the space between <nodePeriod> and </nodeInfo>
Can anyone suggest a way out to do that?

Thanks,
BR,
Varun
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Martin Mueller | 19 Jul 19:20 2014

finding siblings that are not next or previous

Is there a simple way in lxml to say things like "the third house on the
block," "the next house but one," or "the last house on the block."?  I
understand getnext() and getprevious(), and it's possible to concatenate
those, but it's not very elegant, and I'm not sure how it scales.

I work with TEI documents where <w> elements alternate with <c> element,
and very often what you do with a given <w> element depends on the
attributes of the next <w> element, which is the "next but one" element.

Martin Mueller
Professor emeritus of English and Classics
Northwestern University

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Arne Neumann | 16 Jul 14:17 2014
Picon

How to set xml:base attribute with ElementMaker?

Dear all,

I have to generate XML files that contain snippets like this one,
but I can't find a way to produce a "xml:base" attribute:

<markList xmlns:xlink="http://www.w3.org/1999/xlink" type="tok" 
xml:base="maz-1423.text.xml">
	<mark id="sTok1" xlink:href="#xpointer(string-range(//body,'',1,3))" />
	<mark id="sTok2" xlink:href="#xpointer(string-range(//body,'',5,10))" />
</markList>

I tried to setup the namespaces, but instead of "xml:base", I'll only 
get "ns0:base".

NSMAP={None: 'xml',
        'xlink': 'http://www.w3.org/1999/xlink',
        'xml': 'xml'}

E = ElementMaker(nsmap=NSMAP)
etree.tostring(E("markList", {'type': 'tok', '{%s}base' % NSMAP['xml']: 
'maz-1423.text.xml'}))

'<markList xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="xml" 
xmlns:ns0="xml" ns0:base="maz-1423.text.xml" type="tok"/>'

I'd appreciate any help with this.

Best regards,
Arne Neumann
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Elhadi Falah | 16 Jul 11:23 2014
Picon

Apache2 crashes with segmentation fault

Hello,

We are using lxml in several of our applications with Python 2.6 and from time to time, the application stops responding after a segmentation fault error ( [notice] child pid 10544 exit signal Segmentation fault (11)), and this kind of backtrace:

Jul 1 15:24:48 server1 httpd: *** glibc detected *** /usr/sbin/apache2: munmap_chunk(): invalid pointer: 0x00007f6468bf2c00 ***

Jul 1 15:24:48 server1 httpd: ======= Backtrace: =========

Jul 1 15:24:48 server1 httpd: /lib/libc.so.6(+0x78bf6)[0x7f64767ecbf6]

Jul 1 15:24:48 server1 httpd: /usr/lib/libxml2.so.2(xmlCopyError+0xd1)[0x7f6473311801]

Jul 1 15:24:48 server1 httpd: /usr/lib/libxml2.so.2(__xmlRaiseError+0x30b)[0x7f6473312ecb]

Jul 1 15:24:48 server1 httpd: /usr/lib/libxml2.so.2(+0x393e5)[0x7f64733173e5]

Jul 1 15:24:48 server1 httpd: /usr/lib/libxml2.so.2(xmlParseDocument+0x2dc)[0x7f647332e5cc]

Jul 1 15:24:48 server1 httpd: /usr/lib/libxml2.so.2(+0x50895)[0x7f647332e895]

Jul 1 15:24:48 server1 httpd: /usr/lib/python2.6/dist-packages/lxml/etree.so(+0x8cbc2)[0x7f645691cbc2]

Jul 1 15:24:48 server1 httpd: /usr/lib/python2.6/dist-packages/lxml/etree.so(+0x2c7cf)[0x7f64568bc7cf]


After trying several versions of lxml we are still facing the issue.I've checked for the system memory consumption but everything looks fine to me, plenty of memory available, I don't see any process consuming abnormally.

The issue is reproducible everytime when we execute the commande apache (apache2 reload or apache2 graceful). As workaround for this issue we execute apache2 restart.

We've followed recommendations defined on these 2 links but we're still facing the issue.

http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Python_Simplified_GIL_State_API

http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Multiple_Python_Sub_Interpreters


Library version:

   print("%-20s: %s" % ('Python',           sys.version_info))

Python              : (2, 6, 5, 'final', 0)

   print("%-20s: %s" % ('lxml.etree',       etree.LXML_VERSION))

lxml.etree          : (2, 3, 5, 0)

   print("%-20s: %s" % ('libxml used',      etree.LIBXML_VERSION))

libxml used         : (2, 7, 6)

   print("%-20s: %s" % ('libxml compiled',  etree.LIBXML_COMPILED_VERSION))

libxml compiled     : (2, 7, 6)

   print("%-20s: %s" % ('libxslt used',     etree.LIBXSLT_VERSION))

libxslt used        : (1, 1, 26)

   print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))

libxslt compiled    : (1, 1, 26)

Apache 2.2.14


Here is the source code that generate the issue:

ID_TRANSFORM = os.environ['APPLICATION_WORKING_PATH']+'/statics/xsl/list.xsl'

styledoc = lxml.etree.parse(ID_TRANSFORM)

transform = lxml.etree.XSLT(styledoc)

doc_root = lxml.etree.XML(str(atom))


Could you help us on this case?


Regards


_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
Sam Bull | 12 Jul 17:15 2014

Can I change maxvars?

I'm trying to process some XML files, and a few of them are several
thousand lines long, and with the moderately complicated XSL I'm using,
I seem to be hitting recursion limits.

I'm currently getting this message:
        lxml.etree.XSLTApplyError: xsltApplyXSLTTemplate: A potential
        infinite template recursion was detected.
        You can adjust maxTemplateVars (--maxvars) in order to raise the
        maximum number of variables/params (currently set to 15000).

It says I can adjust the value, but doesn't explain how, nor is this
value mentioned anywhere in the documentation.

I've just had to change the maxdepth, which can be done with
XSLT.set_global_max_depth(), but there doesn't appear to be an
equivalent for maxvars. How can I change this value?
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml <at> lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml

Gmane