Multi Nodes Communication via MPI on Auton Cluster?

Mon Apr 5 20:33:04 EDT 2021

Hi Predrag,

Thanks a lot for clarification.

In my original question, I should have stated that I have already
implemented my code via mpi4py to make it work distributedly. I should have
asked the question in another way directly related to the cross-node
InfiniBand communication (gladly you provided this info. in your response
so I am clear right now). I have already checked a lot and tried to set up
jobs running across nodes in different ways but failed. At the time I was
sending you the email, I am 99% sure that the infrastructure doesn't
support this but I just want to confirm with you.

I feel guilty that due to my vagueness, you end up with such a long
feedback. Thank you for your detailed and informative explanation. Very
appreciated.

Most sincerely,
Zhe

On Mon, Apr 5, 2021 at 8:01 PM Predrag Punosevac <predragp at andrew.cmu.edu>
wrote:

> Hi Zhe,
>
> I hope you don't mind me replying to the mailing list as you are asking a
> question that might be of concern to others.
>
> I hold a terminal degree in pure mathematics and work in math-physics when
> I have time. I am sure most of you know infinitely more about computing
> than I. Please take my answer with a grain of salt. There are two things in
> pre GPU era that people who were involved in HPC
> (high-performance computing) had to understand. One is OpenMPI and the
> second one is MPI computing.
>
> OpenMP is a way to program on shared memory devices. This means that
> parallelism occurs where every parallel thread has access to all of your
> data. You can think of it as: parallelism can happen during the execution
> of a specific for a loop by splitting up the loop among the different
> threads.
> Our CPU computing nodes are built for multi-threading computing.
> Unfortunately, most of you are using Python.  Python doesn't support
> multi-threading due to GIL (global-interpreter-lock). Thus you have
> people spawning numerous scripts and crushing machines. I don't know about
> R which is essentially a fancy wrapper on the top of pure C to tell you how
> efficient it is. I do know enough about Julia to tell you that
> multi-threading is built in.  Julia uses Threads. at threads macro to
> parallelize loops and Threads. at spawn to launch tasks on separate system
> threads. Use locks or atomic values to control the parallel execution.
>
> https://docs.julialang.org/en/v1/manual/parallel-computing/
>
> MPI is a way to program on distributed memory devices. This means that
> parallelism occurs where every parallel process is working in its
> own memory space in isolation from the others. You can think of it as
> every bit of code you've written is executed independently by every
> process. The parallelism occurs because you tell each process exactly which
> part of the global problem they should be working on based entirely on
> their process ID. Historically, with the exception of the short Hadoop
> period when we run the Rocks cluster which comes pre-configured for
> distributed computing, we didn't utilize distributed computing. If you
> force me to speculate why that was the case I think it is due to the fact
> that the primary method of hardware acquisition in our lab was(still is)
> accretion. Our infrastructure was too inhomogeneous put together in an ad
> hoc fashion rather than the careful design. Blame it on the funding
> sources. We have never had the luxury of spending half a million dollars on
> the carefully designed cluster utilizing InfiniBand. Currently, our
> hardware is homogenous enough and 40 Gigabit InfiniBand are dirt cheap due
> to the fact that national labs have largely migrated to 100 Gigabit that I
> could clamp a few CPU or even GPU clusters if I get few thousands for used
> InfiniBand. IIRC Python uses a multiprocessing library for multiprocessing
>
> https://docs.python.org/3.8/library/multiprocessing.html
>
> and does support distributed computing
>
> https://wiki.python.org/moin/ParallelProcessing
>
> but I am not familiar with it. Julia which I use does have native support
> for distributive computing. Please see the above link.
>
>
> The way in which you write an OpenMP an MPI program, of course, is also
> very different.
>
> MPI stands for Message Passing Interface. It is a set of API declarations
> on message passing (such as to send, receive, broadcast,
> etc.), and what behavior should be expected from the implementations. I
> have not done enough of C and Fortran programming to know how to correctly
> use MPI. Also for the record, I don't know C++. People who know me well are
> well aware of how irritated I get when C and C++ are interchangeably used
> in a single sentence.
>
> The idea of "message passing" is rather abstract. It could mean passing
> the message between local processes or processes distributed across
> networked hosts, etc. Modern implementations try very hard to be versatile
> and abstract away the multiple underlying mechanisms (shared
> memory access, network IO, etc.).
>
> OpenMP is an API that is all about making it (presumably) easier to write
> shared-memory multi-processing programs. There is no notion of
> passing messages around. Instead, with a set of standard functions and
> compiler directives, you write programs that execute local threads in
> parallel, and you control the behavior of those threads (what resource
> they should have access to, how they are synchronized, etc.). OpenMP
> requires the support of the compiler, so you can also look at it as an
> extension of the supported languages.
>
> And it's not uncommon that an application can use both MPI and OpenMP.
>
> I am afraid if you were hoping for the pre-configured distributed
> environment which will enable you to execute the single magic command like
> mpiexec you will be disappointed. This is an instance where using
> Pittsburg Supercomputing Center is probably more appropriate. There are
> limitations to a one-man IT department model currently utilized by the
> Auton Lab. You just exposed the ugly truth.
>
> For the record, I would be far happier to spend more time on genuine HPC
> and never be bothered with trivialities but budgetary constraints are the
> major obstacle.
>
> Most Kind Regards,
> Predrag
>
> P.S. Please don't get me started with HPC GPU computing :-)
>
>
>
> On Mon, Apr 5, 2021 at 5:00 PM Zhe Huang <zhehuang at cmu.edu> wrote:
>
>> Hi Predrag,
>>
>> Sorry to bother you. I have been trying to run my experiment across
>> multiple nodes (e.g. on both gpu16 and gpu17) in a distributed manner. I
>> saw there is MPI backend pre-installed on the Auton cluster. However, I
>> tested it and I felt like it didn't work at all (I was using this command
>> on gpu16 to run jobs on gpu17: mpiexec -n 8 -hosts gpu17.int.autonlab.org
>> echo "hello").
>>
>> Actually, is there no cross-node communication on the cluster at all or
>> did I do it wrong? If the latter is the case, could you point me to a
>> one-liner working example? Thanks.
>>
>> Sincerely,
>> Zhe
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20210405/ef1910cc/attachment-0001.html>