[auton-users] Reservations/Condor Testing

Jacob Joseph jmjoseph at andrew.cmu.edu
Tue Dec 7 14:25:55 EST 2004


To further test the Condor job batch submission system, mentioned last 
week, I have reserved 3 more lops, for a total of: lop2, 5, 6, 7, loq1, 
and loq3.  Lop1, 3, 4, loq2, and lor1 are still available for general use.

If you've got jobs to run, I'd appreciate wider testing at this point. 
Jobs may be submitted using lop1, with the following directions.

1. Create a spec file to tell Condor how to configure and run your job. 
  It is of the form:
--------------------------------------
# configure the job
Executable     = <path from current dir>
Arguments      = foo1 foo2 foo3
Image_Size     = 1300 Meg

Universe       = vanilla

Error  = condor.err.$(Process)
Output = condor.output.$(Process)
Log    = condor.log.$(Process)

# submits the job
Queue
--------------------------------------

Everything but 'Executable', 'Universe', and 'Queue' are optional, but 
the 'Image_Size' is very important for matching with an appropriate 
machine to run on.  The Error, Output, and Log are the respective 
outputs and '$(Process)' is just a number of the run.  If you were to 
queue multiple runs with a number after 'Queue', the process would 
increment for each one.

It is important that your job require nothing other than the Arguments 
here.  That is, it's not going to have any of your standard environment 
variables, path or any such thing.  I do recommend absolute paths if in 
doubt.  You certainly can get around this by using a script with the 
appropriate environment as your executable.  If you're running java 
code, this means you'll have to set the CLASSPATH as a java argument. 
Do note that you may also use "Getenv = True" to submit the job with all 
of your current environment variables.


2. Submit the spec file using the 'condor_submit <foo.spec>' command.

You'll then see it added to the queue, visible with 'condor_q' and 
subsequently matched to an available machine.  You may observe a short 
delay before the job executes.

The status of the cluster machines may be observed with 'condor_status'. 
  Of course, each of these commands has any number of options, but 
hopefully this will get you started.  Do note that in order to allow 
single jobs to utilize all memory in a machine, 'condor_status' will 
report an extra vm (CPU) in each machine.  Don't believe it.  Each is 
still a dual CPU and 2 jobs will be run provided they each use less than 
half the available memory.

There is a comprehensive user manual available at:
http://www.cs.wisc.edu/condor/manual/v6.7/2_Users_Manual.html

Thanks,
-Jacob



More information about the Autonlab-users mailing list