4 Jan 2007 19:45
dumpMetadata utf-8 question
Jay Soffian <jay-rpm <at> soffian.org>
2007-01-04 18:45:37 GMT
2007-01-04 18:45:37 GMT
In dealing with some i18n RPMs recently I noticed that dumpMetadata can generate XML which is unparseable on the receiving end by yum due to feeding libxml2 non-utf8 encoded strings in some cases. The reason for this is two-fold: 1) The RPM in question (constructed for QA purposes) was encoded using the euc_jp encoding. 2) dumpMetadata does not pass all the strings it extracts from an RPM through utf8String. (In particular, the name of the RPM as well as the name portion of each of the PRCO entries.) I've modified dumpMetadata to: a) pass all strings through utf8String; and b) allow you to optionally specify the encoding that was in use when the RPM was constructed. This causes problems downstream with yum when attempting to install the RPM (because yum compares the utf-8 encoded RPM name to the name of the RPM as represented by the raw bytes from the downloaded RPM header and these are not equal, thus yum rejects the header), but at least the XML is valid allowing other RPMs in the repo to be installed. Anyway, I'm wondering whether it was an intentional design decision to not pass all the bits handed to libxml2 through utf8String first. Thanks, j.
RSS Feed