[auton-users] LOW1 maintenance, Apr. 12 - 15
    Donghan (Jarod) Wang 
    donghanw at cs.cmu.edu
       
    Wed Apr 18 10:02:28 EDT 2012
    
    
  
Hi all,
LOW1 is up and running.
The memory stress test comes up clean. it is the case that Memtest86+
is not guaranteed to be able to pinpoint faulty memory, so I decided
to put the machine back in to production to see if the error comes up
again during normal operation.
Best,
Jarod
On Sun, Apr 15, 2012 at 9:54 PM, Donghan (Jarod) Wang
<donghanw at cs.cmu.edu> wrote:
> Hi lab,
>
> The LOW1 maintenance will be extended to Tuesday, Aril 17th. The extension
> is necessary because the large volume memory, 512GB, on LOW1 needs long time
> for testing program(memtest86) to detect the faulty module.
>
> I apologize for any inconvenience that you may experience.
>
> Thanks,
> Jarod
>
>
> On Wed, Apr 11, 2012 at 8:06 PM, Donghan (Jarod) Wang <donghanw at cs.cmu.edu>
> wrote:
>>
>> Hi all,
>>
>> This is a friendly reminder that LOW1 will be down tomorrow morning (Apr.
>> 12th) through 15th for maintenance. Thanks for your attention.
>>
>> Jarod
>>
>>
>> On Tue, Apr 10, 2012 at 2:57 PM, Donghan (Jarod) Wang
>> <donghanw at cs.cmu.edu> wrote:
>>>
>>> Hello all,
>>>
>>> As you may notice that LOW1 has a faulty memory module; it needs to be
>>> replaced by a new module. During the maintenance, all services on LOW1 will
>>> be unavailable; the maintenance will take place Apr. 12th through 15th. If
>>> this is a problem, please let me know at your earliest convenience.
>>>
>>> If you ever saw following message on LOW1, you have already experienced
>>> the memory issue.
>>>
>>> low1 kernel: Northbridge Error (node 0): DRAM ECC error detected on the
>>> NB.
>>>
>>> The maintenance is critical for resolving the issue, thus providing you a
>>> reliable computing environment on LOW1. During the downtime, a series of
>>> tests will be taken to identify the faulty module; due to the large memory
>>> capacity of LOW1, 512GB, it takes longer to perform detection.
>>>
>>> Thanks,
>>> Jarod
>>>
>>> --
>>> Donghan (Jarod) Wang
>>> Research Programmer
>>> Robotics Institute
>>> Carnegie Mellon University
>>> 5000 Forbes Avenue
>>> Pittsburgh, PA 15213
>>> Email: donghanw at cs.cmu.edu
>>> Tel: +1 412 268 1238
>>
>>
>>
>
    
    
More information about the Autonlab-users
mailing list