[auton-users] Reservations/Condor Testing
Jacob Joseph
jmjoseph at andrew.cmu.edu
Tue Dec 7 14:25:55 EST 2004
To further test the Condor job batch submission system, mentioned last
week, I have reserved 3 more lops, for a total of: lop2, 5, 6, 7, loq1,
and loq3. Lop1, 3, 4, loq2, and lor1 are still available for general use.
If you've got jobs to run, I'd appreciate wider testing at this point.
Jobs may be submitted using lop1, with the following directions.
1. Create a spec file to tell Condor how to configure and run your job.
It is of the form:
--------------------------------------
# configure the job
Executable = <path from current dir>
Arguments = foo1 foo2 foo3
Image_Size = 1300 Meg
Universe = vanilla
Error = condor.err.$(Process)
Output = condor.output.$(Process)
Log = condor.log.$(Process)
# submits the job
Queue
--------------------------------------
Everything but 'Executable', 'Universe', and 'Queue' are optional, but
the 'Image_Size' is very important for matching with an appropriate
machine to run on. The Error, Output, and Log are the respective
outputs and '$(Process)' is just a number of the run. If you were to
queue multiple runs with a number after 'Queue', the process would
increment for each one.
It is important that your job require nothing other than the Arguments
here. That is, it's not going to have any of your standard environment
variables, path or any such thing. I do recommend absolute paths if in
doubt. You certainly can get around this by using a script with the
appropriate environment as your executable. If you're running java
code, this means you'll have to set the CLASSPATH as a java argument.
Do note that you may also use "Getenv = True" to submit the job with all
of your current environment variables.
2. Submit the spec file using the 'condor_submit <foo.spec>' command.
You'll then see it added to the queue, visible with 'condor_q' and
subsequently matched to an available machine. You may observe a short
delay before the job executes.
The status of the cluster machines may be observed with 'condor_status'.
Of course, each of these commands has any number of options, but
hopefully this will get you started. Do note that in order to allow
single jobs to utilize all memory in a machine, 'condor_status' will
report an extra vm (CPU) in each machine. Don't believe it. Each is
still a dual CPU and 2 jobs will be run provided they each use less than
half the available memory.
There is a comprehensive user manual available at:
http://www.cs.wisc.edu/condor/manual/v6.7/2_Users_Manual.html
Thanks,
-Jacob
More information about the Autonlab-users
mailing list