Two possible paths forward for scheduler improvements: Mesos and PyGMO
Nick Coghlan <ncoghlan <at> redhat.com>
2014-09-19 08:01:11 GMT
We're currently contemplating two paths to a "better scheduler".
1. Kill the bespoke scheduler, and replace it with Mesos.
We still don't know just how big the ramifications of that would be, but
if anyone wants to try out Mesos, this is probably the place to start:
(That may involve waiting until Fedora 21 is actually released, but
that's not too far away)
2. Improve the bespoke scheduler with PyGMO
The heart of the current scheduler is the "schedule_queued_recipes"
function. That essentially treats the recipe queue and the idle systems
as a 2-D matrix, and tries to map one to the other. However, it does so
incrementally on a recipe-by-recipe basis, which makes it difficult to
determine a "best fit" option that tries to get entire recipe sets
running immediately, minimises the amount of unused RAM or disk space, etc.
At PyCon New Zealand, I was introduced to a multi-objective optimiser
library published by the European Space Agency: https://esa.github.io/pygmo/
Whereas switching to Mesos would be a big architectural change, adopting
PyGMO to make the current scheduler *better* might be feasible by
switching "schedule_queued_recipes" to an approach where it:
1. Reads the current recipe queue and idle system sets from the database
2. Organises them into a format suitable for handing over to PyGMO
3. Runs PyGMO over the data set with an appropriate cost function to be