1 Mar 2008 09:33
Re: Setting URL from lxml.html.fromstring, etc
Stefan Behnel <stefan_ml <at> behnel.de>
2008-03-01 08:33:56 GMT
2008-03-01 08:33:56 GMT
Hi Ian, Ian Bicking wrote: > OK. Then would the html base attribute just be a read-only property > then? Like: > > def base(self): > return super(HtmlElement, self).base > base = property(base) > > I'm not terribly concerned about whether it is read-only or not. It's a > little fuzzy, since HTML is parsed to the lxml representation, and > though it will probably be serialized to HTML again (if it is serialized > at all) and HTML doesn't have anything like xml:base, the lxml > representation is not itself exactly HTML. And if you serialize to > XHTML, then xml:base is available. Hmm, true. However, if you use lxml.html, you're likely to stay in the HTML world, so I would prefer making this read-only. If you really want an xml:base attribute, you can set it yourself, and if you really want to set the document URL, it's better to be explicit than setting it through an Element. > Also translating HTML to XHTML is kind of an outstanding issue for > lxml.html, and it seems reasonable to me that XHTML could be parsed into > the same classes as HTML. The only real caveat there is that XHTML uses > different (namespaced) tag names. If you remove the tag names, then the > classes and the lookup applies just fine. (Presumably the lookup could > be changed to support XHTML fairly easily.) That's a different topic, so I think we should discuss that in a separate thread.(Continue reading)
RSS Feed