Multi Nodes Communication via MPI on Auton Cluster?

Mon Apr 5 20:00:46 EDT 2021

Hi Zhe,

I hope you don't mind me replying to the mailing list as you are asking a
question that might be of concern to others.

I hold a terminal degree in pure mathematics and work in math-physics when
I have time. I am sure most of you know infinitely more about computing
than I. Please take my answer with a grain of salt. There are two things in
pre GPU era that people who were involved in HPC
(high-performance computing) had to understand. One is OpenMPI and the
second one is MPI computing.

OpenMP is a way to program on shared memory devices. This means that
parallelism occurs where every parallel thread has access to all of your
data. You can think of it as: parallelism can happen during the execution
of a specific for a loop by splitting up the loop among the different
threads.
Our CPU computing nodes are built for multi-threading computing.
Unfortunately, most of you are using Python.  Python doesn't support
multi-threading due to GIL (global-interpreter-lock). Thus you have
people spawning numerous scripts and crushing machines. I don't know about
R which is essentially a fancy wrapper on the top of pure C to tell you how
efficient it is. I do know enough about Julia to tell you that
multi-threading is built in.  Julia uses Threads. at threads macro to
parallelize loops and Threads. at spawn to launch tasks on separate system
threads. Use locks or atomic values to control the parallel execution.

https://docs.julialang.org/en/v1/manual/parallel-computing/

MPI is a way to program on distributed memory devices. This means that
parallelism occurs where every parallel process is working in its
own memory space in isolation from the others. You can think of it as every
bit of code you've written is executed independently by every process. The
parallelism occurs because you tell each process exactly which part of the
global problem they should be working on based entirely on their process
ID. Historically, with the exception of the short Hadoop period when we run
the Rocks cluster which comes pre-configured for distributed computing, we
didn't utilize distributed computing. If you force me to speculate why that
was the case I think it is due to the fact that the primary method of
hardware acquisition in our lab was(still is) accretion. Our infrastructure
was too inhomogeneous put together in an ad hoc fashion rather than the
careful design. Blame it on the funding sources. We have never had the
luxury of spending half a million dollars on the carefully designed cluster
utilizing InfiniBand. Currently, our hardware is homogenous enough and 40
Gigabit InfiniBand are dirt cheap due to the fact that national labs have
largely migrated to 100 Gigabit that I could clamp a few CPU or even GPU
clusters if I get few thousands for used InfiniBand. IIRC Python uses a
multiprocessing library for multiprocessing

https://docs.python.org/3.8/library/multiprocessing.html

and does support distributed computing

https://wiki.python.org/moin/ParallelProcessing

but I am not familiar with it. Julia which I use does have native support
for distributive computing. Please see the above link.

The way in which you write an OpenMP an MPI program, of course, is also
very different.

MPI stands for Message Passing Interface. It is a set of API declarations
on message passing (such as to send, receive, broadcast,
etc.), and what behavior should be expected from the implementations. I
have not done enough of C and Fortran programming to know how to correctly
use MPI. Also for the record, I don't know C++. People who know me well are
well aware of how irritated I get when C and C++ are interchangeably used
in a single sentence.

The idea of "message passing" is rather abstract. It could mean passing the
message between local processes or processes distributed across
networked hosts, etc. Modern implementations try very hard to be versatile
and abstract away the multiple underlying mechanisms (shared
memory access, network IO, etc.).

OpenMP is an API that is all about making it (presumably) easier to write
shared-memory multi-processing programs. There is no notion of
passing messages around. Instead, with a set of standard functions and
compiler directives, you write programs that execute local threads in
parallel, and you control the behavior of those threads (what resource they
should have access to, how they are synchronized, etc.). OpenMP
requires the support of the compiler, so you can also look at it as an
extension of the supported languages.

And it's not uncommon that an application can use both MPI and OpenMP.

I am afraid if you were hoping for the pre-configured distributed
environment which will enable you to execute the single magic command like
mpiexec you will be disappointed. This is an instance where using Pittsburg
Supercomputing Center is probably more appropriate. There are limitations
to a one-man IT department model currently utilized by the Auton Lab. You
just exposed the ugly truth.

For the record, I would be far happier to spend more time on genuine HPC
and never be bothered with trivialities but budgetary constraints are the
major obstacle.

Most Kind Regards,
Predrag

P.S. Please don't get me started with HPC GPU computing :-)

On Mon, Apr 5, 2021 at 5:00 PM Zhe Huang <zhehuang at cmu.edu> wrote:

> Hi Predrag,
>
> Sorry to bother you. I have been trying to run my experiment across
> multiple nodes (e.g. on both gpu16 and gpu17) in a distributed manner. I
> saw there is MPI backend pre-installed on the Auton cluster. However, I
> tested it and I felt like it didn't work at all (I was using this command
> on gpu16 to run jobs on gpu17: mpiexec -n 8 -hosts gpu17.int.autonlab.org
> echo "hello").
>
> Actually, is there no cross-node communication on the cluster at all or
> did I do it wrong? If the latter is the case, could you point me to a
> one-liner working example? Thanks.
>
> Sincerely,
> Zhe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20210405/f4e9b2b6/attachment.html>