high system cpu usage with recent numpy updates

Mon Apr 30 17:32:24 EDT 2018

Thanks Ben. You beat me to the punch sending this out. I give credit to you
for first suggesting updates to scipy were responsible. As Ben pointed out
in person, this is expected to become more of an issue as the machines in
the lab have their packages updated, motivating this group email so
everyone knows now. Providing a bit more material below on the subject in
case anyone is interested:

The place this flag suggestion was initially found was:

https://stackoverflow.com/questions/22418634/numpy-openblas-set-maximum-number-of-threads
There are some conversations and related links there that might be
enlightening. Checking the release notes on the newer versions of scipy and
numpy were also useful.

For me, this became an issue when multiple threads were being spawn despite
the fact that multithreading was not explicitly invoked in the code in
question.
Specifically, using only scipy sparse matrix libraries and looking at the
column NLWP in htop,  some machines could be seen using one thread and
others using as many threads as thread contexts.
Machines that had single threads listed for the processes were using:
>>> import scipy
>>> scipy.__version__
'0.12.1'
>>> import numpy
>>> numpy.__version__
'1.7.1'
and machines that invoked multiple threads were using:
>>> import scipy
>>> scipy.__version__
'1.0.0'
>>> import numpy
>>> numpy.__version__
'1.14.0'
For the runs in question, life went much better after forcing the number of
threads to one. As Ben said, using more threads does not imply that runs
will go no slower.

On Mon, Apr 30, 2018 at 3:59 PM, Matthew Barnes <mbarnes1 at andrew.cmu.edu>
wrote:

> This happens even with basic multiprocessing in Python, for example the
> multiprocessing.Pool.map operation. Don't be like me and accidentally start
> 2500 processes :)
>
> On Mon, Apr 30, 2018 at 3:53 PM Benedikt Boecking <boecking at andrew.cmu.edu>
> wrote:
>
>> All,
>>
>> With newer versions of numpy (and maybe scipy) it is possible that some
>> operations use all available CPUs by default (thanks to David Bayani for
>> pointing this out). This can also happen if you use packages that rely on
>> numpy and scipy such as statsmodels. On our servers this appears to be
>> caused by the use of the open MP API.
>>
>> While automatic multi processing can be a great feature, it can cause
>> trouble if it is combined with additional multi processing (e.g. your own
>> use of the multiprocessing or joblib libraries) or when multiple users
>> unwittingly spawn too many threads at the same time.
>>
>> If you want to control the number of threads used through open MP, use
>> the OMP_NUM_THREADS environment variable  when you run your python code
>> (with a reasonable number of threads):
>>
>> [user at server ~]$ OMP_NUM_THREADS=8 python yourscript.py
>>
>> Also, it is a great habit to run top or htop to monitor your resource
>> consumption to make sure you aren’t inconveniencing other users of our
>> lab’s resources.
>>
>> Best,
>> Ben
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20180430/eff52a1f/attachment.html>