Re: [Taverna-hackers] Wrapping a Runner with a web service
Hi Scott, all,
On the face of it a workflow is an entity that consumes some data,
performs some processing and returns some results, having potentially
also had a side effect on its execution environment. At this level it
seems perfectly plausible that we can turn a workflow into a service.
The problems start to emerge when we match the full capabilities of the
workflow system against the limitations of 'vanilla' web services.
1) Workflows can run for a long time. Potentially they can run for
months; if you were to expose a workflow of that kind as a service you'd
have to move beyond the synchronous invocation pattern that a
conventional SOAP service supports.
- there are existing ways to do this of course, we're not the only ones
with long running services, but they immediately move you out of the
'really simple service' realm. Whether you do this with e.g. WSRF or
with a custom service interface (so your service interface is to the
workflow *engine* not to the workflow itself) is another choice to make.
2) Workflows can consume and emit very large data. Web services have no
consistant mechanism to handle this - there are approaches to do with
attachments and there are ad hoc mechanisms such as returning URLs to
results and accepting the same as inputs.
Both these issues may or may not be a problem in any given case, and
there are certainly simple workflows (short running, small data) which
could be wrapped up in a naive synchronous service interface. Exactly
what values of 'short' and 'small' we use here is a gray area, but it's
not going to give you much room to work in.
Another option is to create a hybrid system, where you force all data to
be passed by reference (HTTP URLs are pretty good for this). Taverna can
internally handle this, and we can even force the workflow to return
results in that form so your service logic wouldn't have to do very
much. This works around the data size issue (as URLs are 'small') but
not the run time problems.
The run length issues could be mitigated by adopting the typical
'submit, poll, get results, destroy' pattern. Of course, you'd have to
have some way of identifying the workflow instance you wanted to
interact with as part of the call (or in WS or HTTP headers).
There is, however, a final problem with 'really' making these things
available as services. In T2 the 'workflow engine' is really a set of
components such as security agents, reference management systems and the
like. To run a workflow you need to assemble and mutually configure
these components - in some cases there is an obvious trivial assembly
which could be done implicitly, but in general you're looking at
configuring this federation prior to workflow launch - this would
immediately make the service taverna-specific.
There's nothing to say, of course, that these approaches are mutually
exclusive. We could envisage a heirarchy of functionality where as you
move up the heirarchy you get increased control (configuration,
monitoring, support for long running workflows) at the cost of increased
interface complexity. At the top of this heirarchy would be the full
blown peer management service framework, at the bottom would be vanilla
synchronous web services (with no exposure of the workflow system at
all). Any given application would live somewhere between these two extremes.
Tom