Re: A new approach for monitoring simulation data
Rayene Ben Rayana <rayene.benrayana <at> gmail.com>
2010-01-20 08:08:54 GMT
Dear Ontje and Stefan,
Thank you for your design effort, I really appreciated reading your proposal.
I have some comments though.
On 19 janv. 2010, at 16:30, Stefan Scherfke wrote:
> Hello,
>
> Ontje and I want to present and discuss a new way for monitoring SimPy
> simulations. We have attached a commented sample implementation of our approach
> with some usage examples.
>
> Before we describe our idea, we want to point out some things we didn’t like
> about SimPy’s ``Monitor`` and ``Tally`` classes:
>
> * They use global variables.
> * If you monitor multiple attributes, timestamps are stored with each of them
> (data duplication, requires more memory as necessary).
Totally agree !
> * They combine data collection and analysis in one class. These are different
> functional aspects and should in our opinion be separated.
> * Coupling with Histogram and GUI stuff.
Partially agree. IMHO, an object usually gathers several different functional aspects.
The only link between these aspects is that they use the same data : the object attributes.
I think that having data collection and analysis in the same class is not a real issue.
>
> We never used any of both for our projects, because they never really fitted our
> needs. So we debated intensely how an optimal monitoring would look like. It
> should meet the following requirements:
>
> * Easy to use (simple API, little typing)
> * Memory and CPU efficiency
>
> * No impact on simulation speed if you don’t use it.
> * As little impact as possible if you use it.
>
> * Flexibility and easy extensibility
> * Separation of data collection and data analysis
>
>
> Description of our approach
> ===========================
>
> We know that there are many ways one might want to monitor simulations. We
> focused on the use case, that you want to monitor several attributes for one
> process—maybe with timestamp—so that your data looks something like this:
>
> ([t0, t1, t2], [x0, x1, x2], [y0, y1, y2])
Can you please add Monitor.zip() that rotates the vectors using the python builtin zip function ?
http://docs.python.org/library/functions.html
This may be useful sometimes.
>
> Our approach can easily be extended to meet other use cases, too, but more on
> this later. We also state that our approach does not cover data analysis. This
> is something that should in our opinion be done by specialized frameworks like
> e.g. NumPy, SciPy and Matplotlib.
I agree that analysis should be minimal but not non-existent. No need to install SciPy if the analysis can be
done in one line of code.
Also, you may add an easy way to convert a Monitor to a numpy array.
>
>
> Basic idea
> ----------
>
> For performance reasons our monitoring works on SimPy processes rather than on
> the simulation itself. This has the advantage of a minimized or even nonexisting
> performance impact on the simulation speed if you monitor only a few processes or
> don’t monitor anything at all.
>
> We also wanted a very easy API: Create a monitor like ``m = Monitor(config)``
> and trigger the monitoring of a process’ current state by just calling ``m()``.
m() does not seem to be a good idea. IMHO, m is an instance of a Monitor class,
calling it does not really make sense.
m.append() or m.add() or m.watch() is easier to understand.
A little more typing but much clearer.
>
>
> How to exactly use it
> ---------------------
>
> Monitors should be instanciated in ``MyProc.__init__()``. You must pass
> some configuration for the *series* you want to create. Each *series* is
> defined by a *name* and a *collector* function::
>
> class MyProc(Process):
> def __init__(self):
> # ...
> self.monitor = Monitor(
> ('a', func1),
> ('b', func2),
> # ...
> )
Great !
Did you manage to add optional parameters to the functions ?
>
> You can then retrieve the data for e.g. the series “a” via ``self.monitor.a``.
> The collector functions describe, how the data should be collected. The two
> common cases here are:
>
> 1. Automatically get an attribute’s value (e.g. ``self.a``)
> 2. Manually pass the value to be monitored
>
> You can solve both cases with custom lambda functions, but we’ve also create a
> shortcut for each of them.
I am not sure that shortcuts are a good idea. Lambda functions are enough.
I agree that lambda functions are a bit hard to understand in the beginning.
But once this hurdle passed, this is useful to every python developer.
Adding shortcuts adds a complexity level IMHO.
>
> And advanced example::
>
> class MyProc(Process):
> def __init__(self, sim):
> # ...
> self.monitor = Monitor(
> ('time', sim.now),
> get(self, 'a', 'b'),
> ('diff', manual)
> )
>
> This will create a monitor containing the series ``time``, ``a``, ``b`` and
> ``diff``, where ``time`` stores the values returned by ``sim.now()``, ``self.a``
> and ``self.b`` get collected automatically and and a value for ``diff`` has to
> be passed manually. All this happens with the PEM of a process::
>
> def run(self)
> while True:
> # ...
> self.monitor(diff=self.get_diff()
>
> That’s it!
>
>
> Conclusion
> ==========
>
> We hope we could make our idea clear. We think our solution is quite clean and
> flexible. It can easily be extended (e.g. by adding other monitor classes) and
> also easily be added to SimPy without interfering with any existing
> functionality. Since it isn’t integrated into the simulation core, it won’t slow
> down the simulation, if you don’t want monitoring.
>
> The attached script contains the whole monitoring framework and an example
> simulation that tests and shows how to use it. You need SimPy, NumPy and
> Matplotlib to execute it (NumPy and Matplotlib are only required for the example
> analysis part in the end, so if you uncomment these lines, you don’t need them).
>
> Other monitor classes we thought of are:
>
> * a *SimpleMonitor* that just creates one list of values (e.g. [x0, x1, ...]),
> * a *GroupedMonitor* that collects groups of series (useful for a central
> collector process which just monitors other processes in the simulation) and
> * aggregate monitors like e.g. and AvgMonitor. This could be useful if you are
> just interested in the average of all values (and not the values themselves),
> so it only needs to store the sum and count of all values.
>
> What do you think of it?
>
> Cheers,
> Ontje and Stefan
>
Other things to think of :
1. An easy way to obtain a throughput plot (delta_value/delta_time)
2. Checking if the monitored data changed since last time. If not, give the user the possibility
to skip the storage of that line.
example :
Instead of :
[time, x, y]
1, 14, 5
2, 14, 5
3, 14, 5
4, 12, 6
We can store :
[time, x, y]
1, 14, 5
4, 12, 6
Useful when data does not change too often.
>
> <monitoring.py>------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for Conference
> attendees to learn about information security's most important issues through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev_______________________________________________
> Simpy-users mailing list
> Simpy-users <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/simpy-users
Thank you again for your work. Please keep in mind that these are just my thoughts as a developer of a network
simulator based on SimPy (NetPyLab, http://rayene.github.com/netpylab/ ). Therefore, others may
disagree as they may have different needs.
Cheers,
Rayene BEN RAYANA
Telecom Bretagne, RSM Dpt
------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev