BP split progress and rationale
Mark A. Jensen <maj <at> fortinbras.us>
2016-06-01 05:06:01 GMT
I've made some significant progress towards a BP split. I know there
have been several tries, but I'm willing to take this one to an
actionable endpoint with YAPC::NA 2016 as a goal date for action.
I have built a graph of all the module dependencies (parent-child and
horizontal) in Neo4j, and have been using this to design module
groupings that encompass functional areas and also have hierarchical
group dependencies such that the dependencies between groups are
minimized. I'm calling the groupings "packages".
I am using the loose convention that "monophyletic" packages (groups of
modules that fall within a namespace) are named after the namespace, and
"polyphyletic" packages are named "BioPerl::<functional name>". The
following packages are currently pretty solid. The descriptions indicate
mainly what is encompassed by the contained modules, not rules for
BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e.,
many Bio::*I, Bio::Factory::*, Build helper classes.)
BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can do
without annotations (e.g., fasta)
BioPerl::Alignment - alignment objects and parsers
BioPerl::Annotation - most annotation modules
BioPerl::SeqFeature - most SeqFeature modules
BioPerl::Tree - most Tree related modules
BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces
BioPerl::Search - The blast parsing and tiling
There are quite a few more. Examples of the logic: BioPerl::Base
contains all of its dependencies. BioPerl::Sequence requires only
BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment
requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires
Base, Sequence, and SeqFeature. And so on.
With a structure like this, a user who just needs Bio::PrimarySeq and
Bio::SeqIO to read some fasta files can get away with installing
BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to
the full 805 modules, including that broadly useful one
Once finished, I'll propose setting many of the namespaces free as
separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others.
These can be packaged with their appropriate BioPerl::* prerequisites in
the metadata. I expect this will allow natural selection to operate much
more efficiently on the obsolete modules.
I will set up CPAN::Meta compliant metadata for everything.
I have more thoughts but this is already too long.