Peter Tröger | 16 Feb 2009 10:56
Picon

Conference call -Feb 17th - 17:00 UTC (incl. task list)

Dear all,

the bi-weekly DRMAA call is scheduled on Feb 17th, 2009. The meeting  
starts at

17:00 UTC == 18:00 CET (Berlin/Poland time) == 9:00 PST (Vancouver time)

Phone conference line sponsored by Sun:

Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285

Preliminary meeting agenda:

1. Meeting secretary for this meeting?
2. Acceptance of last meeting minutes
3. Voting: sub-state as string vs. sub-state as string to be parsed  
vs. sub-state as integer / enum
4. Job state discussion

We still have the following opinions under discussion:

Dan: TERMINATED state = resubmission might help, FAILED state =  
resubmission unlikely to help (machine problem, misconfiguration)
Andre: TERMINATED state = triggered by an external entity, FAILED  
state = job terminated by itself
Roger: FAILED state = job command line could not be executed
Drmaa1: FAILED state = job was running, but did not finish  
successfully for some reason

(Continue reading)

Roger Brobst | 16 Feb 2009 16:47
Picon

Re: Conference call -Feb 17th - 17:00 UTC (incl. task list)


> We still have the following opinions under discussion:
> 
> Dan: TERMINATED state = resubmission might help, FAILED state
> resubmission unlikely to help (machine problem, misconfiguration)
> Andre: TERMINATED state = triggered by an external entity, FAILED  
> state = job terminated by itself
> Roger: FAILED state = job command line could not be executed
> Drmaa1: FAILED state = job was running, but did not finish  
> successfully for some reason
> 
> How is the new TERMINATED state related to the wif_ functions ?
> 
> Issue #5875 (originally form the PBS experience report) criticizes  
> that FAILED currently expresses both user-requested termination and  
> job failure. How is this issue solved by the newly chosen approach ?

My concern about being able to distinguish between a
started and failed job vs. a job which was not started
is addressed by the 'jobState == FAILED && wasAborted == TRUE'.
Upon review of the spec, I see that I had forgotten that
'wasAborted' was effectively a substate of FAILED.

> Issue #5875 (originally form the PBS experience report) 
> criticizes that FAILED currently expresses both 
> user-requested termination and job failure.
> How is this issue solved by the newly chosen approach ?

It would seem, in general, the teminatingSignal could be
used to distinguish between a crash and a user-requested
(Continue reading)

Daniel Templeton | 17 Feb 2009 22:19
Picon

Meeting Minutes

o Attendees:
  - Hrabri Rajic
  - Peter Troeger
  - Daniel Templeton
  - Roger Brobst
  - Daniel Grueber
o 3 Feb "unofficial" Minutes -- accepted
o Job substates
  - Three alternatives: integer, string, or object
    . Integers
      Highly portable
      Must be interpreted according to DRM-specific header information
      Could be misinterpreted because of DRM mismatch
      Misinterpretation results in confused application
      Natural in all languages, but may clash stylistically with native 
enumerations
    . Strings
      Highly portable
      Must be interpreted according to DRM-specific header information
      Could be misinterpreted because of DRM mismatch
      Misinterpretation results in confused application
      Requires string interpretation -- lots of sources for errors
      Natural to Python, unnatural to other languages
    . Object -- void* or hash or Object, depending on the language
      Allows maximum flexibility per language and DRM
      Must be interpreted according to DRM-specific header information
      Failure to interpret correctly results in a segfault or 
ClassCastException
      Natural in most (all?) languages
  - We also have to consider how to represent in IDL
(Continue reading)

Peter Tröger | 18 Feb 2009 19:05
Picon

DRMAA2: Job Sub-State Discussion

Dear all,

this discussion thread is intended to finalize the job sub-state  
discussion from the last phone conference.

Quoting from the DRMAA2 Draft2:

"The jobStatus() method SHALL return the job status, together with an  
implementation specific sub state. This is intended to be a more  
detailed description of the current DRMAA job state, for example the  
specific kind of HOLD state (user-triggered, system-triggered). It   
MUST be allowed by the language binding to not retrieve this  
information (e.g. by passing a NULL value). Applications SHOULD NOT  
expect this information to be available in all cases."

Daniel T. did a great job in summarizing the current discussion  
status, which is about the data type for this sub-state information.

We have three alternatives: integer, string, or untyped struct /  
object pointer. All three constructs are expressible in IDL.

Strings require string comparison, which is only natural in scripting  
languages.
Integers might look inappropriate if the language supports native  
enumerations.

In all three cases, the returned information must be interpreted  
according to DRM-specific information (header file / type definition).  
This normally leads to a build dependency, e.g. in the form of DRM- 
specific string constants, numerical constants, or  type definitions.
(Continue reading)

Peter Tröger | 18 Feb 2009 19:23
Picon

DRMAA2: Partial time stamp discussion

Dear all,

this discussion thread is intended to clarify the future of partial  
time stamps in DRMAA2. This feature of DRMAA 1.0 allows the expression  
of relative time information (e.g. "Start the job on next Monday"). It  
is used for the job template attributes "startTime" and "deadlineTime".

The DRMAA2 survey clearly showed that this is an unnecessary feature  
from the user perspective. From our actual knowledge, no  
implementation beside SGE supports it. The standard implementation  
strategy instead is to support complete date / time information in the  
DRMAA string format, which is not compatible to anything.

The Condor experience report discusses this to a great extend, and  
proposes ISO8601 resp. RFC822 as new data format for time information.  
The following questions need to be clarified:

Is there any unknown use case / existing application for the partial  
time stamp feature ? Does it prevent us from dropping it ?

What time standard could act as replacement ?

Is this time standard mappable to all relevant programming languages  
(and their native date / time representation) ?

Is this time standard mappable to all relevant DRM systems ?

Can the standard also support the expression of an amount of time  
(TimeAmount IDL type, used  for limit values in job templates) ?

(Continue reading)

Peter Tröger | 19 Feb 2009 11:45
Picon

Re: DRMAA2: Partial time stamp discussion

Answers inline ...

> this discussion thread is intended to clarify the future of partial
> time stamps in DRMAA2. This feature of DRMAA 1.0 allows the expression
> of relative time information (e.g. "Start the job on next Monday"). It
> is used for the job template attributes "startTime" and  
> "deadlineTime".
>
> The DRMAA2 survey clearly showed that this is an unnecessary feature
> from the user perspective. From our actual knowledge, no
> implementation beside SGE supports it. The standard implementation
> strategy instead is to support complete date / time information in the
> DRMAA string format, which is not compatible to anything.
>
> The Condor experience report discusses this to a great extend, and
> proposes ISO8601 resp. RFC822 as new data format for time information.
> The following questions need to be clarified:
>
> Is there any unknown use case / existing application for the partial
> time stamp feature ? Does it prevent us from dropping it ?

I never saw anybody using this.

> What time standard could act as replacement ?

I obviously vote for ISO8601 or RFC822. RFC1123 is an updated version  
of RFC 822 with Y2K support.

> Is this time standard mappable to all relevant programming languages
> (and their native date / time representation) ?
(Continue reading)

Peter Tröger | 25 Feb 2009 10:12
Picon

Pending discussions

Dear all,

please remember that we have some pending discussion until the phone 
conference next week. Your participation in the DRMAA2 development 
process is heavily required ...

http://www.ogf.org/pipermail/drmaa-wg/2009-February/001007.html

http://www.ogf.org/pipermail/drmaa-wg/2009-February/001008.html

Thanks,
Peter.
--
  drmaa-wg mailing list
  drmaa-wg@...
  http://www.ogf.org/mailman/listinfo/drmaa-wg

Daniel Gruber | 2 Mar 2009 09:53
Picon

Re: DRMAA2: Job Sub-State Discussion

In order to reduce the risk of crashing applications when using
the object pointer approach, it could be recommended to
check for the DRM implementation before accessing (or getting)
the substate.

Better alternative:
Because this recommendation could be overlooked
easily a new parameter for the jobStatus()/jobSubStatus() function
could be added which names the specific DRM/DRMAA implementation.
Based on this String(?) parameter the jobSubStatus() function
returns a safe null pointer (when the DRM is not matched)
or the element/class when it was matched. Sure this mechanism
could also be misused by checking for the DRM system while
runtime but for the application writer this parameter is mandatory
and he has to think about why this parameter has to be used.

my 2 cents,

Daniel

On 02/18/09 19:05, Peter Tröger wrote:
> Dear all,
>
> this discussion thread is intended to finalize the job sub-state  
> discussion from the last phone conference.
>
> Quoting from the DRMAA2 Draft2:
>
> "The jobStatus() method SHALL return the job status, together with an  
> implementation specific sub state. This is intended to be a more  
(Continue reading)

Peter Tröger | 2 Mar 2009 10:37
Picon

DRMAA2: TERMINATED vs. FAILED state

Dear all,

this discussion thread is intended to finalize the discussion about job 
states after execution end in DRMAA2.
In DRMAA1, there is only the FAILED state, expressing that the job was 
running but did not finish successfully for some reason. Piotr proposed 
a separation between FAILED and TERMINATED jobs:

http://www.ogf.org/pipermail/drmaa-wg/2009-January/000985.html

We meanwhile had different proposals regarding this idea:

Option 1)
TERMINATED state = resubmission might help,
FAILED state = resubmission unlikely to help (machine problem, 
misconfiguration)

Option 2)
TERMINATED state = triggered by an external entity,
FAILED state = job terminated by itself

Option 3)
FAILED state = job command line could not be executed
TERMINATED state = something else happened

Option 4)
Stick with FAILED only, and express special circumstances via the new 
job sub-state information

Issue #5875 (originally form the PBS experience report) criticizes that 
(Continue reading)

Peter Tröger | 2 Mar 2009 10:53
Picon

Conference call -Mar 3rd - 17:00 UTC

Dear all,

the bi-weekly DRMAA call is scheduled on Mar 3rd, 2009. The meeting
starts at

17:00 UTC == 18:00 CET (Berlin/Poland time) == 9:00 PST (Vancouver time)

Phone conference line sponsored by Sun:

Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285

Preliminary meeting agenda:

1. Meeting secretary for this meeting?

2. Acceptance of last meeting minutes

3. Job sub-state data type - final voting: 
http://www.ogf.org/pipermail/drmaa-wg/2009-February/001007.html

4. Partial time stamp replacement:
http://www.ogf.org/pipermail/drmaa-wg/2009-February/001008.html

5. TERMINATED vs. FAILED state discussion:
http://www.ogf.org/pipermail/drmaa-wg/2009-March/001012.html

6. Discussion Kick-Off: Remodeling the JobInfo interface

Please prepare yourself regarding the open discussion points. 
(Continue reading)


Gmane