seth vidal | 5 Aug 03:43 2003

metadata list and discussion over tapas tonight

Hi all,
 If you're getting this you've been force-subscribed to the rpm-metadata
format list. A list to discuss the data format for repositories of rpms.
If you don't know what I'm talking about then I did a really bad cut and
paste of your address :)

Tonight Adrian, Daniel, James, Jeff and I had dinner at a tapas place in
durham and tried to figure out some generals ideas of what we were
looking for based on the email I've been slowly collecting over the past
couple of weeks.

I took some notes and have some summaries:
We ended up coming close to something like the following - essentially
adrian's suggestion multiple xml files for the data describing the
following things:

distribution information: 
   which repositories/channels exist where and when they were last
updated, etc - useful for giving a place to look for the other files and
seeing what's changed, quickly.

package information not including file lists: 
   all the misc information considered important per package (to be
decided in terms of entries but it must include byte-ranges for the
headers per package) Should include almost complete file lists for
certain packages that can be 'whitelisted' in b/c they end up being used
in a file dependency in a large percentage of the time.

package filelists: complete file listings per package

(Continue reading)

Peter Bowen | 5 Aug 14:11 2003

Re: metadata list and discussion over tapas tonight

> distribution information: 
>    which repositories/channels exist where and when they were last
> updated, etc - useful for giving a place to look for the other files and
> seeing what's changed, quickly.

Good, this matches somewhat with Red Carpet's channels.xml file.

> package information not including file lists: 
>    all the misc information considered important per package (to be
> decided in terms of entries but it must include byte-ranges for the
> headers per package)

Again, seems to jive with what Red Carpet has now.

> Should include almost complete file lists for
> certain packages that can be 'whitelisted' in b/c they end up being used
> in a file dependency in a large percentage of the time.

So common file deps, or those that the server knows the client will
need, are included here?

> package filelists: complete file listings per package

One file per package on the server?

> obsoletes: listing of obsoletes tags - this could be alternatively
> included in package information not including file lists, depends on
> some more discussion - nothing too crazy.

I would like to see this in the package information file.
(Continue reading)

Daniel Veillard | 5 Aug 17:13 2003
Picon

Re: metadata list and discussion over tapas tonight

On Tue, Aug 05, 2003 at 08:11:48AM -0400, Peter Bowen wrote:
> > package filelists: complete file listings per package
> 
> One file per package on the server?

  Hum, no that would be unmanageable, I would think one file per
collection and association betheen the NVRE and the set of file
path it exports. But I'm still not totally convinced it's the right way.
To me extracting the File info which may be needed for resolution
and putting then in the geberal packages description file is 
the simpler way for 99.99% of the cases.

> > - send in your list of things that should appear in the non-file
> > packages listing. What data from the header is critical/useful/desired.
> 
> I think that the absolute minimum list of things that need to be in the
> file are Name, Epoch, Version, Release, Arch, package file size, and
> absolute/relative path to the package file.

  I would tend to add the short description field (the one line one),
and truncated if it happens to be too long.

> Looking at the packageinfo.xml that Red Carpet uses now, the main things
> that I see that we would want to are possibly section, pretty name,
> install only, importance, license, plus some internal information.

  Licence makes sense too, short and very important policy wise...

> Section and pretty name are simply for display purposes, section roughly
> mapping to group, and pretty name allowing the client to display a more
(Continue reading)

Joe Shaw | 5 Aug 17:56 2003

Re: metadata list and discussion over tapas tonight

On Tue, 2003-08-05 at 08:11, Peter Bowen wrote:
> These are the main things that I see that Red Carpet uses that are not
> directly found in RPM headers.  We also have a couple of internal use
> fields that are essentially opaque data to the client, plus some fields
> that are duplicated from the rpm headers.  As long as we either allow
> for extra fields that some clients may not know about, or allow for XML
> Namespaces, extra data in the XML should not be problematic.

I agree with everything Peter said.  I would ask that we consider
keeping around information about older packages, at a minimum so we can
determine importance, but more importantly for a downgrade or rollback
situation.  Keeping around all that info might bloat things quite a bit,
though, maybe we could keep it in a separate file?  I'm not sure.

Joe

Joe Shaw | 5 Aug 18:00 2003

Re: metadata list and discussion over tapas tonight

On Tue, 2003-08-05 at 11:13, Daniel Veillard wrote:
> To me extracting the File info which may be needed for resolution
> and putting then in the geberal packages description file is 
> the simpler way for 99.99% of the cases.

I do too.  Obviously it may not work for cross-repository resolutions,
but I think that's actually a good thing in driving policy away from
file deps.

Joe

seth vidal | 5 Aug 18:52 2003

Re: metadata list and discussion over tapas tonight

On Tue, 2003-08-05 at 11:56, Joe Shaw wrote:
> On Tue, 2003-08-05 at 08:11, Peter Bowen wrote:
> > These are the main things that I see that Red Carpet uses that are not
> > directly found in RPM headers.  We also have a couple of internal use
> > fields that are essentially opaque data to the client, plus some fields
> > that are duplicated from the rpm headers.  As long as we either allow
> > for extra fields that some clients may not know about, or allow for XML
> > Namespaces, extra data in the XML should not be problematic.
> 
> I agree with everything Peter said.  I would ask that we consider
> keeping around information about older packages, at a minimum so we can
> determine importance, but more importantly for a downgrade or rollback
> situation.  Keeping around all that info might bloat things quite a bit,
> though, maybe we could keep it in a separate file?  I'm not sure.
> 

Do you mean older packages that aren't on the repository?

So if I've removed foo-1.1-1.noarch.rpm and replaced it with
foo-1.1-2.noarch.rpm you want to keep the info on foo-1.1-1?

or do you mean if I've just added foo-1.1-2 w/o deleting foo-1.1-1 then
have both sets of information?

-sv

seth vidal | 5 Aug 18:58 2003

Re: metadata list and discussion over tapas tonight

On Tue, 2003-08-05 at 11:13, Daniel Veillard wrote:
> On Tue, Aug 05, 2003 at 08:11:48AM -0400, Peter Bowen wrote:
> > > package filelists: complete file listings per package
> > 
> > One file per package on the server?
> 
>   Hum, no that would be unmanageable, I would think one file per
> collection and association betheen the NVRE and the set of file
> path it exports.

I did a test of just a complete filelist from severn into a single file.
20M uncompressed with gzip -9 it's 1.5MB. Let's say the xml syntax
information increases this by  500K - it's still 2MB.
Not a lot to xfer or store.

>  But I'm still not totally convinced it's the right way.
> To me extracting the File info which may be needed for resolution
> and putting then in the geberal packages description file is 
> the simpler way for 99.99% of the cases.

geberal? I'm not sure I understand what you mean? Internally resolve the
file dep w/i the repository and dump the file lists?

> > I think that the absolute minimum list of things that need to be in the
> > file are Name, Epoch, Version, Release, Arch, package file size, and
> > absolute/relative path to the package file.
> 
>   I would tend to add the short description field (the one line one),
> and truncated if it happens to be too long.

(Continue reading)

Joe Shaw | 5 Aug 19:13 2003

Re: metadata list and discussion over tapas tonight

On Tue, 2003-08-05 at 12:52, seth vidal wrote:
> Do you mean older packages that aren't on the repository?
> 
> So if I've removed foo-1.1-1.noarch.rpm and replaced it with
> foo-1.1-2.noarch.rpm you want to keep the info on foo-1.1-1?
> 
> or do you mean if I've just added foo-1.1-2 w/o deleting foo-1.1-1 then
> have both sets of information?

Sorry, I should have been more clear.

I mean the latter: keeping around older versions of the packages, both
in metadata and in the repositories themselves.

So, yeah, consider foo 1.0 on my system and in the repository.

We release foo 1.1, which fixes a major security hole, and mark the
importance "urgent".

We then release foo 1.1.1, which is just a minor bug fix release, and
mark the importance "minor".

It's important to keep around the metadata for 1.0 and 1.1, in addition
to 1.1.1, so that we can properly convey the severity of the update
(updating 1.0 -> 1.1.1 should still be "urgent", not "minor"), but also
to allow the user to downgrade or rollback from 1.1.1 to 1.1 or 1.0.

Joe

(Continue reading)

Joe Shaw | 5 Aug 19:27 2003

Re: metadata list and discussion over tapas tonight

On Tue, 2003-08-05 at 12:58, seth vidal wrote:
> I did a test of just a complete filelist from severn into a single file.
> 20M uncompressed with gzip -9 it's 1.5MB. Let's say the xml syntax
> information increases this by  500K - it's still 2MB.
> Not a lot to xfer or store.

Not on the Duke backbone, but it's pretty painful on high latency
connections.  It's less of an issue for Red Carpet since we download
most of that data in the background, but it's probably not an efficient
transfer.

> >  But I'm still not totally convinced it's the right way.
> > To me extracting the File info which may be needed for resolution
> > and putting then in the geberal packages description file is 
> > the simpler way for 99.99% of the cases.
> 
> geberal? I'm not sure I understand what you mean? Internally resolve the
> file dep w/i the repository and dump the file lists?

I think he meant "general".  But yes, resolve the file deps internally
within the repository and list those provides/requires in the list of
provides/requires in the metadata file.  We added this recently into Red
Carpet and it works nicely (certainly much better than the whitelisting
we were doing previously), although it does mean a larger repository
overhead.  But I'd rather push the burden onto the server at file
generation time than onto every single client at dependency resolution
time.

> I'd be comfortable with that - if someone wants the whole description
> the various clients can grab the rpm header via the byte ranges and get
(Continue reading)

seth vidal | 5 Aug 20:11 2003

Re: metadata list and discussion over tapas tonight

> Not on the Duke backbone, but it's pretty painful on high latency
> connections.  It's less of an issue for Red Carpet since we download
> most of that data in the background, but it's probably not an efficient
> transfer.
> 

Well I'm not just thinking for duke uses. I have some users of yum who
bitch and moan about bandwidth but none of them seem to frightened by
something measured in single digits of megabytes.

> I think he meant "general".  But yes, resolve the file deps internally
> within the repository and list those provides/requires in the list of
> provides/requires in the metadata file.  We added this recently into Red
> Carpet and it works nicely (certainly much better than the whitelisting
> we were doing previously), although it does mean a larger repository
> overhead.  But I'd rather push the burden onto the server at file
> generation time than onto every single client at dependency resolution
> time.

I'd kinda want to move away from that - it also limits you from being
able to run fun commands like yum provides '/some/misc/file' and being
able to search on it w/some percentage of correctness. I do like the
suggestion of the file listing  as a separate file that adrian had. It
increases the complexity but only barely.

> The descriptions are pretty small and would compress well.  In our XML
> metadata format, which contains the name, EVR, arch, summary,
> description, dependency data (although not file lists), package sizes,
> URLs, and a few other minor things that Peter mentioned earlier, all of
> Red Hat 8.0 comes out to be about 335k compressed, which isn't bad. 
(Continue reading)


Gmane