Brad Smith | 16 Apr 2005 06:13
Picon
Favicon

Is there an easy way to get a simple dict of rpm attributes from xml metadata?

Hello everyone,

I'm currently working to add support for xml metadata into Fedora
Tracker, but am at a point where I'd like some advice:

The plan was to have the tracker's back-end script download the xml
files for a repository, then use code from Seth's importmetadata.py
(http://www.linux.duke.edu/~skvidal/metadata/) to load up a packageSack
with all of the metadata in it. I could convert each set of rpm data in
the sack into an instace of tracker's internal repo.rpmInfo class, since
I already have code to handle putting those into the database.

However, in examining packageSack.py, It appears that a lot of
information that tracker stores is not kept when a
packageObject.RpmXMLPackageObject instance is added to the sack. For
example, packageSack.addPackage() does not appear to store the
description field or any of the scripts, all of which Tracker would
need. 

So now I'm thinking that maybe the best approach would be to either
derive a new class from packageSack and override addPackage() with a
method that converts directly to repo.rpmInfo instances or just yank the
code for addPackage() and load*MD() from RpmXMLPackageObject into a new
class, since those three methods are really all I need for my purposes
(I think). 

Does anyone have suggestions for an alternative that I may be missing?
All I know about what tools have already been written is what I've
gleaned from the code I can find, so maybe there's something more suited
to my purposes. Something that takes an xml file and returns a dict akin
(Continue reading)

seth vidal | 16 Apr 2005 23:21

Re: Is there an easy way to get a simple dict of rpm attributes from xml metadata?

> The plan was to have the tracker's back-end script download the xml
> files for a repository, then use code from Seth's importmetadata.py
> (http://www.linux.duke.edu/~skvidal/metadata/) to load up a packageSack
> with all of the metadata in it. I could convert each set of rpm data in
> the sack into an instace of tracker's internal repo.rpmInfo class, since
> I already have code to handle putting those into the database.
> 

sadly the metadata import routines there are a bit out of date.

> Does anyone have suggestions for an alternative that I may be missing?
> All I know about what tools have already been written is what I've
> gleaned from the code I can find, so maybe there's something more suited
> to my purposes. Something that takes an xml file and returns a dict akin
> to what rpm.headerLoad() returns would be ideal. Does such a thing
> already exist? 
> 
> Thanks, and sorry if any of this sounds dumb. =;)

Well it depends on what you're trying to do but you could always just:
import yum
foo = yum.YumBase()
foo.doConfigSetup()
foo.doRepoSetup()
foo.doSackSetup()

foo.pkgSack is a package Sack object and it's filled with PackageObject
instances.

then you can just do:
(Continue reading)

Hans-Peter Jansen | 17 Apr 2005 00:10

Re: Is there an easy way to get a simple dict of rpm attributes from xml metadata?

Saturday 16 April 2005 23:21, seth vidal wrote:
>
> Well it depends on what you're trying to do but you could always just:
> import yum
> foo = yum.YumBase()
> foo.doConfigSetup()
> foo.doRepoSetup()
> foo.doSackSetup()
>
> foo.pkgSack is a package Sack object and it's filled with PackageObject
> instances.
>
> then you can just do:
>
> for pkg in foo.pkgSack:
>     print pkg.returnSimple('description')
>     print dir(pkg)

Unfortunately, only root can do that with yum 2.3.2 here, as user,
I get:

Python 2.4 (#1, Mar 22 2005, 21:42:42) 
[GCC 3.3.5 20050117 (prerelease) (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import yum
>>> foo = yum.YumBase()
>>> foo.doConfigSetup()
>>> foo.doRepoSetup()
Baseurl(s) for repo: ['file:/net/xxx/opt/suse/i386/9.3/cd1']
(Continue reading)

seth vidal | 17 Apr 2005 00:19

Re: Is there an easy way to get a simple dict of rpm attributes from xml metadata?

> Unfortunately, only root can do that with yum 2.3.2 here, as user,
> I get:

after:
foo.doConfigSetup()
add:
foo.conf.setConfigOption('cache', 1)

> Apart from this, yum does really nice here on SuSE 9.3 ;)
> 
> No excessive memory hogging any more like in 2.0 times, and it feels
> fast, man..

good, good.

-sv
Hans-Peter Jansen | 17 Apr 2005 16:11

Re: Is there an easy way to get a simple dict of rpm attributes from xml metadata?

Am Sunday 17 April 2005 00:19 schrieb seth vidal:
> > Unfortunately, only root can do that with yum 2.3.2 here, as user,
> > I get:
>
> after:
> foo.doConfigSetup()
> add:
> foo.conf.setConfigOption('cache', 1)

That did the trick, thanks.

> > Apart from this, yum does really nice here on SuSE 9.3 ;)
> >
> > No excessive memory hogging any more like in 2.0 times, and it feels
> > fast, man..
>
> good, good.

Would you accept a patch to genpkgmetadata.py for an --update option, checking 
timestamps before shooting? 

It still has to be done, but will be a nice job while on plane in my holiday. 
Will do it anyway, but knowing it _may_ be accepted upstream is a major call 
to do it well. ;)

Pete
seth vidal | 17 Apr 2005 16:17

Re: Is there an easy way to get a simple dict of rpm attributes from xml metadata?


> That did the trick, thanks.
> 
> > > Apart from this, yum does really nice here on SuSE 9.3 ;)
> > >
> > > No excessive memory hogging any more like in 2.0 times, and it feels
> > > fast, man..
> >
> > good, good.
> 
> Would you accept a patch to genpkgmetadata.py for an --update option, checking 
> timestamps before shooting? 
> 
> It still has to be done, but will be a nice job while on plane in my holiday. 
> Will do it anyway, but knowing it _may_ be accepted upstream is a major call 
> to do it well. ;)
> 

if you can make the --update option not consume a huge amount of memory,
yes, I will accept it. Every mechanism I've been able to come up with
for --update either involved:
 1. a big cache dir that stores all this information broken apart
(bleah)
 2. or reading in all the current xml and storing it in memory - which
is not-small.

-sv
Hans-Peter Jansen | 17 Apr 2005 18:50

Re: Is there an easy way to get a simple dict of rpm attributes from xml metadata?

Am Sunday 17 April 2005 16:17 schrieb seth vidal:
> >
> > Would you accept a patch to genpkgmetadata.py for an --update option,
> > checking timestamps before shooting?
> >
> > It still has to be done, but will be a nice job while on plane in my
> > holiday. Will do it anyway, but knowing it _may_ be accepted upstream is
> > a major call to do it well. ;)
>
> if you can make the --update option not consume a huge amount of memory,
> yes, I will accept it. Every mechanism I've been able to come up with
> for --update either involved:
>  1. a big cache dir that stores all this information broken apart
> (bleah)
>  2. or reading in all the current xml and storing it in memory - which
> is not-small.

How about just checking timestamps (those in repomd.xml and repomd.xml itself, 
and making sure, that no rpm file is newer than this, otherwise just recreate 
them? Sure, wouldn't be 100% waterproof, but would fit at least my needs!

Yes, this behavior should be correctly documented..

Pete
Brad Smith | 18 Apr 2005 03:27
Picon
Favicon

Re: Is there an easy way to get a simple dict of rpm attributes from xml metadata?

On Sat, 2005-04-16 at 14:21, seth vidal wrote:
> sadly the metadata import routines there are a bit out of date.

Meaning I shouldn't expect any of that to work with an actual xml-ified
repo?

Am I right in thinking that the code you provided would expect to
receive urls from a yum.conf? I need to be able to pass a url (or file
descriptor) for primary.xml or repomd.xml and get back a hash. But hey,
if this is as close as I'm going to get, then I can import and tweak as
necessary. I would have messed around with the code and answered some of
these questions myself, but I'm having a bit of trouble getting it to
run...

> Well it depends on what you're trying to do but you could always just:

<code snipped>

What version of Yum should I be using for this? The version that comes
with FC2 doesn't seem to include a yum module, so I grabbed the 2.1.11
source rpm and copied the modules from it into site-packages. However,
the snippet you provided still didn't work (see below). The problems
persisted when I forced a proper installation of the binary rpm, just in
case I'd missed something (yes, I know this probably breaks package
installation, but for now I just wanted the modules in place). Here's
what happens:

Python 2.3.3 (#1, May  7 2004, 10:31:40)
[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(Continue reading)

seth vidal | 19 Apr 2005 00:12

Re: Is there an easy way to get a simple dict of rpm attributes from xml metadata?

On Sun, 2005-04-17 at 18:27 -0700, Brad Smith wrote:
> On Sat, 2005-04-16 at 14:21, seth vidal wrote:
> > sadly the metadata import routines there are a bit out of date.
> 
> Meaning I shouldn't expect any of that to work with an actual xml-ified
> repo?

ah - try this out - go look at repoview - it's in fedora extras - check
out the code he uses to produce his output - i think that's what you'll
want to do.

-sv
seth vidal | 24 Apr 2005 18:38

Re: Is there an easy way to get a simple dict of rpm attributes from xml metadata?

> How about just checking timestamps (those in repomd.xml and repomd.xml itself, 
> and making sure, that no rpm file is newer than this, otherwise just recreate 
> them? Sure, wouldn't be 100% waterproof, but would fit at least my needs!
> 
> Yes, this behavior should be correctly documented..
> 

So you just want a quick way to say 'if no changes, don't re-index'. I
thought you  were asking for a quick way to really only index the
changed information, instead of re-reading all of it.

The former isn't a big deal - send me a patch and I'll probably pull it
in.

The latter will be difficult to make fast.

-sv

Gmane