Systems  
Status displays
System status
Retired systems
 
 
 
 
 
 
 
 
 
 
 
 

Systems Log

2009-03-27 Neolith NSC supports Earth Hour. Neolith, NSC:s main compute resource will have planned downtime Saturday 2009-03-28 between 20:30 and 21:30. System service will be performed during the downtime. Jobs that can't finish before 20:30 will remain in the queue and start when the downtime is over. The unusual choice of scheduling this on a Saturday evening is because this lets NSC participate in the WWF Earth Hour 2009 campaign.
WWF Earth Hour page
Linköpings universitet supports Earth Hour.

2008-10-05 All systems Due to service of the electrical power station that supplies the electrical power to NSC, all NSC resources (including but not limited to Neolith, Mozart, Green, Dunder, Tornado and Gimle) will be offline from Sunday October 5th at 08:00. All systems were back online by Sunday evening. We apologize for the inconvenience of this full service stop. Please note that queued jobs should not be affected by this stop.

2008-08-15 Mozart Due to a planned service stop Mozart was shut down on Friday, August 15. NSC apologizes for the inconvenience.

2008-06-02 All systems Due to an unplanned power outage all systems stopped on Monday at 16:51 P.M. All systems were back in operation by 20:00 P.M. NSC apologizes for the inconvenience.

2008-04-20 Mozart Mozart stopped on Sunday afternoon and was back in service by noon on Monday. NSC apologizes for the inconvenience.

2008-04-19 All systems Due to an unplanned power outage all systems stopped on Saturday at 08:30 A.M. By 14:30 Saturday afternoon, most systems (not Bluesmoke & Tank) were back in operation. Tank was back in service on Monday by 11:00 A.M and Bluesmoke at 18:00 P.M. NSC apologizes for the inconvenience.

2008-04-04 All systems except Mozart Due to an unplanned power outage all systems except Mozart stopped on Friday at 16:45 P.M. All systems were back in service by Sunday. NSC apologizes for the inconvenience.

2008-04-03 All systems Due to a planned service stop for work on the power supply system all NSC systems were shut down at 09:00 A:M. The systems were back in service at 21:30 P.M. NSC apologizes for the inconvenience.

2008-03-12 Neolith Due to an unplanned power outage on Wednesday morning, Neolith was shut down. The system was restarted at 13:00 P.M. NSC apologizes for the inconvenience.

2008-03-11 Green
Green had a schedulede service stop March 11, 9:00-12:00 A.M. The following was done: OS security updates, filesystem checks, batch queuing system upgrade. Note: Due to a bug in the batch queuing system, we are unable to create a reservation within the batch queuing system for this stop as we normally do. Everyone will be logged out at the time of the stop. No jobs will be able to run and any running jobs will be killed. NSC apologizes for the inconvenience.

2008-03-04 Neolith
Neolith had a scheduled service stop on Tuesday, March 4. The system was back in production at lunch time. The service stop included configuration and fail-over tests of a few central services in the cluster.


2007-10-01 Mozart
Mozart was unavailable from 8:00 A.M. until 14:30 P.M. due to hardware maintenance. Some system hardware has been changed. NSC apologizes for the inconvenience the stop has caused.

2007-08-21 All systems
Due to an UPS-failure all systems lost power on Tuesday morning. All systems were back in operation on Tuesday afternoon. NSC apologizes for the inconvenience the stop has caused.

2007-05-26--28 All systems
Due to a planned power outage for the whole NSC computer building a stop will take place fromSaturday evening May 26 at 18:00. All systems will be in full operation on Monday morning May 28. The stop includes; Monolith, Mozart, Bluesmoke, Green, Dunder, Tornado, Storage, Tank and sundry smaller systems, including NSC web, support and e-mail services. We at NSC regret the changes of time and apologizes for the inconvenience.

2007-05-12--14 All systems
The announced stop of all systems, due to work on the power distribution on Campus, has been postponed.

2007-05-09 Mozart
A problem with Mozart occured at 13.20 today. NSC together with SGI are looking into the situation and trying to find out what caused the stop. Jobs that could be restarted have been restarted by the queueingsystem. NSC apologizes for the inconvenience.

2007-02-05--06 Monolith
On February 5, at 06:00 a.m. NSC stopped Monolith including the login nodes for system maintenance. Monolith, apart from the SMHI part, was back in service February 6 at 11:00 a.m. The SMHI file systems were back in service by 20:00. During the stop all major file systems were checked and security upgrades were carried out. NSC apologizes for the inconvenience.


2006-12-18 Mozart
On December 18 we upgraded Mozart's OS from SLES9 to SLES10. The system was unavailable from 12:00 until 21:00. NSC apologizes for the inconvenience.

2006-09-12 Mozart
September 12 we will replace a faulty memory module in Mozart. The machine was unavailable from 13:00 until 15:30. Jobs longer than time available until Tuesday afternoon will be started after the maintenance stop. We hope to resolve the problems we have experienced the last week with crasched jobs. NSC apologizes for the inconvenience.

2006-08-29 Monolith
Last night Monolith lost two thirds of its power supply. Monolith was back in service at 14:45 PM. NSC apologizes for the inconvenience.

2006-08-27--28 All systemw
Due to a power failure in NSC:s computer hall on Saturday night, all systems were shut down. The systems were restarted on Monday morning. NSC apologizes for the inconvenience.

2006-08-23 Mozart
A problem with Mozart occured at 12:44. NSC together with SGI is looking into the situation and are trying to find what caused the reboot. Jobs that could be restarted have been restarted by the queuing system. NSC apologizes for the inconvenience.

2006-08-14 Mozart
Mozart will be shut down at 10:00 A.M. on Monday August 14th. A faulty CPU will be replaced. NSC apologizes for the inconvenience.

2006-08-10 Mozart
A hardware test will be performed on Mozart during Thursday August 10th. The machine will be unavailable from 08:00 AM to 17:00 PM. Jobs that cannot finish before Thursday morning08:00 will be delayed and started after the maintenance stop. We hope to resolve the problems we have experienced the last week with unscheduled stops of the machine. NSC apologizes for the inconvenience.

2006-07-27 All systems
A false fire alarm in the computer room brought all systems down in the early morning.

2006-07-02 All systems
Sunday July 2nd the power distribution system within Valla Campus will be interrupted and all NSC services will be unavailable from Sunday morning 2006-07-02 at 08:00 A.M. All systems will be restarted Monday morning.

2006-05-30 Blixt
Blixt will be shut down at 11:00 AM in order to move the system to a new facility.

2006-05-23 Mozart
In preparation for extending the /home file system the Mozart system will be taken down at 08:00 Tuesday May 23 and restarted by Tuesday noon. Jobs that cannot finish before Tuesday 08:00 will be delayed and started after the interrupt. NSC apologizes for the inconvenience.

2006-05-22 Mozart
Due to I/O-problems on the /home file system Mozart will be restarted at 15:00. We apologize for the inconvenience.

2006-05-22 Monolith and Tornado
There will be a service maintenance stop on Monolith and Tornado during Monday. Monolith will stop at 12:30 p.m. The stop includes also the login nodes. Firmwar on the disk systems will be upgraded.

2006-05-17 Dunder
During Wednesday there will be a stop on Dunder due to work on OS, MPI and Infiniband.

2006-05-15 Mozart
In preparation for extending the /home file system the Mozart system will be taken down at 13.00 Monday May 15 and restarted Monday night. Jobs that cannot finish before Monday 13.00 will be delayed and started after the interrupt. NSC apologizes for the inconvenience.

2006-02-27 SGI3K
Due to service maintenance the SGI3K system will be unavailable on Monday 2006-02-27 from 12:00 until 15:00. A mother board will be replaced by SGI. Checkpointable jobs will be resumed after the break. NSC apologizes for the inconvenience.

2006-02-23 SGI3K
Due to problems with one mother board the SGI3K system stopped working at 10:07 a.m. and was unavailable during part of the day. Restartable jobs were resumed after the stop. NSC apologizes for the inconvenience.


2005-12-14 SGI3K
Due to problems with the tape facility the SGI3K system will be restarted Wednesday morning 2005-12-14 at 07.00. Checkpointable jobs will be resumed after the break. NSC apologizes for the inconvenience.

2005-11-01 All systems
Due to maintenance of the power distribution in our computer room all NSC systems will be out of service from 09.00 November 1. Please note that Monolith will shutdown already at 08.00. The systems are estimated to restart in the afternoon, by 17.00 the latest. NSC apologizes for the inconvenience this stop may cause.

2005-09-10--11 All systems
Due to construction work in our computer room (installation of humidity control) all NSC HPC systems were out of service during the end of week 36. Monolith was shutdown on Friday Sep 9 at 15.00. The SGI3K, Bluesmoke, Otto, Green, Dunder and Tornado were shut down on Saturday Sep 9 at 07.00. All systems were restarted on Sunday Sep 11 at 15:00. NSC apologizes for the inconvenience this stop may have caused.

2005-08-22 SGI3k
Monday August 22, 10:00-12:00 we will have a hardware maintenance stop to replace hardware that caused problems on August 18. Checkpointable jobs will be resumed after the break. NSC apologizes for the inconvenience.

2005-08-17 SGI3k
At 5:22 a.m. the SGI3k system crashed. All running jobs were lost. When the jobs were restarted the scratch-area /nsc_scr were filled up an a number of jobs terminated. The scratch area has now been cleaned. Please resubmit your jobs. We apologize for the inconvenience.

2005-07-28--29Monolith
The Monolith systems was stopped for system maintenance July 28, 12:00 until July 29, 15:00.The focus of this maintenance stop is to run tests on the nodes local storage (system and /disc/local).

2005-07-16 All systems
Due to circumstances beyond our control, the cooling system in our computer room failed on Saturday. This caused most of our systems to go down in an uncontrolled fashion. We apologize for the inconvenience.

2005-07-14 All systems
Due to circumstances beyond our control, the cooling system in our computer room failed at around 9:30. This caused most of our systems to go down in an uncontrolled fashion. We apologize for the inconvenience.

2005-06-28--29 Monolith
The Monolith system was stopped for maintenance June 28, 11.00 until June 29 at 14.30. During the maintenance stop we have upgraded kernels, Linux software and replaced some faulty disk system hardware.

2005-04-04 SGI
Due to power supply problems after the interrupt during the weekend, the SGI system will have a hardware maintenance interrupt on April 4 between 14.00 and 16.00. Checkpointable jobs will be continued after the break.

2005-04-01--03 Monolith
The Monolith system will be taken down for software and hardware maintenance at 15.00, April 11. During the weekend work will be perfomed on the cooling system in NSCs computer room. Monolith will be back in full operation Monday morning, April 4. We apologize for the inconvenience.

2005-04-02--03 All systems in NSCs computer room
Due to work on the cooling system in NSCs computer room Monolith, Sgi3k, Bluesmoke, Blixt, Green and Otto will be shut down on Saturday morning at 06.00. We will be back in full operation Monday morning April 4. For the Sgi3k we plan to be back in full operation Sunday afternoon and checkpointable jobs will be restarted after the interrupt. We apologize for the inconvenience.

2005-02-21 Monolith
Due to SCALI network problems Monolith was stopped February 21 at 15.55 p.m until 21:15 p.m. All running jobs were stopped and those marked as rerunnable (this is default) were requeued. We apologize for the inconvenience this stop may have caused.


2004-10-04 Monolith
Monolith was stopped for system maintenance October 4 at 9:00 a.m until 19:30 p.m. In order to improve some stability problems having occured lately the following was done: the driver software for the SCI-network has been downgraded. We have also slightly tuned the configuration of the driver. Finally we have replaced NFS over UDP with NFS over TCP throughout the system. We are apologize for the inconvenience this stop may have caused.

2004-09-08 Monolith
Monolith was stopped for system maintenance September 8th, at 9.a.m. until 13.00 p.m. A bad disk serving th file system /home was replaced. The stop did not affect users of login-2.nsc.liu.se. We are sorry for the inconvenience.

2004-08-31 All systems
Due to a failure in the cooling system in NSC:S computer room all systems were shut down during the night. By September 1, at 04:20 a.m. Monolith was back in service and had restarted most of the running jobs. In the morning SGI3K and Bluesmoke were also back in service. We are sorry for the inconvenience.

2004-08-23 Monolith
Monolith was stopped for system maintenance between 11.00 a.m. until 08:00 p.m. The stop included also the login nodes. In our earlier security upgrade, we lost some of our SCALI network bandwidth. Our main issue with the service stop was to regain that, with a new Linux kernel configuration.

2004-08-21 All systems
Due to a failure in the cooling system in NSC:s computer room all systems were shut down on Saturday 02:30 a.m. The systems were back in service by 12:00 a.m.

2004-08-02 Otto
The computer Otto is now available after the security breach.

2004-08-02 Dayhoff
Now Dayhoff is back in operation.

2004-07-09 Ingvar and Lustig
The computers Ingvar and Lustig are now retired.

2004-07-09 Bluesmoke
Now Bluesmoke is back in operation.

2004-07-09 Monolith
Now Monolith is back in operation.

On July 7 we sent (paper) mails to all of our regular SMHI and SNAC users with new passwords. The send list was the same as last time. i.e. if you got a paper mail with a password last time, this week we have sent you a new one.

If you have got an account through Berkant Savas, and still need it, please send an e-mail to support@nsc.liu.se.

If you did not receive your password information at the latest on July 9 with your regular mail, please send us an e-mail with an explicit request to open your account when you need it next time.

Please keep your alertness for signs of intrusions on your computer systems. The attackers are using trojaned ssh clients at a number of academic sites to steal ssh passwords, which have later been used to gain ssh access to other systems.

We are very sorry about the long service stop.

As always, please mail your questions and comments to support@nsc.liu.se.

2004-06-30 SGI 3800
The SGI3k machine is back in service. However, for security reasons all users have been issued new passwords by paper mail again. These letters were sent 2004-06-29. For Monolith the timeplan is not yet clear. The system is still down for analysis and verification following the last intrusion, and will also go through a hardening process. It will likely remain unavailable for the rest of this week, at least. Our users will receive further information as we have it. Please accept our apologies for the inconvenience.

2004-06-22 System outage
Due to a thunderstorm the cooling system in NSC:s computer room failed. When temperatures started to rise, an emergency shutdown sequence was initiated and several computing resources were taken down including Monolith and Sgi3k. The systems are now back on-line. We apologize for the inconvenience.

2004-06-16 Security breach - accounts disabled
During the last few days a number of intrusions have been detected on NSC systems. User accounts have been accessed using stolen passwords. In a couple of cases the intruder has managed to gain elevated priviliges. Apparently the passwords were initially stolen from users who used ssh to log in to NSC systems from compromised computers on Chalmers. Due to the widespread nature of this security breach, all accounts on all NSC systems will be disabled. New passwords will be sent out by (paper) mail. More information on that process will be available as we work out the details.

If you have used ssh (or scp) to access other systems from NSC systems the last few days, you are strongly advised to change your password on those other systems. Also, please inform the administrators of those systems about this issue. (Refer them to support@nsc.liu.se for further technical information.) We understand that these measures will be of great inconvenience to our users. However we have to treat security issues in the strictest possible fashion, and we hope for your understanding.

2004-05-13 Monolith
There were problems for some users to log in on Monolith this morning. It depended on a problem with one of the disks. We have fixed it before 08:45.

2004-04-22 SGI 3800
During last weekend one redundant part in the disk-subsystem failed and we needed to take the system down Thursday to replace the faulty part. This stop occurred between 08:30-08:50. All jobs that could be checkpointed were restarted.

NSC regrets the inconvenience.

2004-04-18 All systems
Because of maintenance of the electricity network all systems at NSC were unavailable during most of Sunday April 18. This meant all NSC systems located on campus were unavailable, including Monolith, Ingvar, Otto, Dayhoff, SGI 3800, Bluesmoke, and smaller systems, including NSC web and e-mail services. Monolith, SGI3K, Ingvar and Otto were shut down early Sunday morning, around 07.00 and were back on-line during Sunday afternoon. SGI3K jobs were checkpointed and resumed after the service stop. Dayhoff was shut down late Friday afternoon and was returned to service at noon on Monday. Bluesmoke was shut down for additional maintenance 15.00 on Thursday afternoon (16/4), and was back in service late Tuesday. We apologize for the inconvenience.

2004-03-23--25 Monolith
The Monolith was unavailable due to software maintenance March 23, 11:00 a.m. - March 25, 01:00 a.m. The Monolith is now back in full service. We have tried a major upgrade of the SCALI software and have run many tests trying to make sure that we do not introduce new problems for our ScaMPI users. We are sorry to report that we had to go back to our old version of the software, since the new software created an unwanted instability in the SCALI network.

2004-02-23--29 Monolith
We have moved Monolith to a new computer room. The new environment for Monolith, with regard to cooling and electrical power, is much better, hopefully resulting in less future down-time.

Please tell us if you notice that Monolith behaves in any strange way! We have run a lot of tests on the system, but there might be unnoticed problems. We beg for your patience with possible remaining problems. In the rare case that you use IP numbers when connecting to the system, you need to know that we have changed them. Currently they are:

	monolith.nsc.liu.se		130.236.100.65
	login-2.monolith.nsc.liu.se	130.236.100.66
Also the IP addresses of www.nsc.liu.se and status.nsc.liu.se are changed. In case you have trouble getting your web browser to read these web pages, you may need to reset the DNS cache of your browser, e.g. by restarting your browser.

2004-02-11--12 Otto
We have moved Otto to a new computer room.

2004-02-11 Monolith, Ingvar
Monolith and Ingvar were unavailable Wednesday 11 February between 8 - 11 a.m due to maintenance work on the cooling system in our computer room.

We shut down all compute nodes of Monolith. We kept the login nodes up and running, so editing and file copying will probably work in the normal way. Your queued jobs stayed queued as usual during the stop. The SGI was available.

2004-01-29 SGI
The SGI 3800 was unavailable January 29 because of software maintenance. The operating system was upgraded as well as the compilers. Now email from SGI works well (but email to SGI is not permitted).

Recompilation is recommended!

Running jobs were terminated since checkpointing was not possible across operating-system upgrade. Long and verylong jobs were not accepted in the days prior to the maintenance.


2003-11-25 -- 12-01 SGI
The SGI 3800 at NSC is back in full service after the move to new facilities. Unfortunately the batch system (LSF) had to be restarted with a backupdatabase from afternoon Monday 24, which means that some jobs were lost or exited prematurely. Please check the status of your jobs and resubmit when appropriate. University users and SMHI development: If you have trouble to log in and you are not using DNS, please be aware that the system now has a new IP-adress: 130.236.100.71

Regrettably email from SGI did not work after the move. We fixed the problem during January.

2003-11-22 Monolith
NSC suffered from a complete power loss at 8:30. All systems were back up in operation at 12:00. Previously running jobs were lost. We are very sorry for the inconvenience.

2003-11-18 Monolith
November 18th, at 11:00 a.m. NSC stopped Monolith for system maintenance. Monolith was fully operational again 6:00 p.m. the same day. This was done:

  1. Scali network maintenance.
  2. Last week we installed a new version of the Scali network programs. Now we went back to the old versions, due to stability problems during the last week.

2003-11-12 Monolith
A little past noon today an electrician working in our house was unlucky and dropped a thin cable where it must not be. The Monolith servers and a third of the compute nodes went down and with the nodes a lot of the jobs were lost. Now Monolith is back in service again and those jobs that were marked as restartable (most of the jobs, as this is the default behaviour) have been automatically restarted. There might be problems that we do not know about, like jobs losing some of their compute nodes but not all, and you have to look for error messages in your job results. We at NSC are very sorry about the inconvenience.

2003-11-11--12 Monolith
Monolith was stopped for system maintenance from November 11, 11:00 a.m. until November 12, 10.15 a.m. We have now upgraded ScaMPI and PBS to the newest versions.

2003-11-10 Monolith & Ingvar
Due to a power failure at 10.45 a.m. Monday, November 10, Monolith and Ingvar stopped. The systems were restarted again at 12.40. We apologize for the inconvenience.

2003-10-14--15 Monolith
Monolith was stopped for service maintenance October 14, 11:00 a.m. - October 15, 14:00 p.m. We apologize for the inconvenience. The following has been done:

  1. Upgrade of the Linux kernels on our file servers, to get rid of an NFS problem, that a few users have had big problems with.
  2. A new version av ScaMPI has been tested. It was less stable than the old one, so we did not upgrade.
  3. We have made the /home file system bigger, because of several accidents with compute jobs that have overfilled it. Still, please do run jobs in your /disk/global* directory and not in your /home directory. And please do not use your /home directory for big data that easily may be reproduced with a new job run on Monolith. The /home file system is mainly meant for small configuration files that are difficult to reproduce.
  4. Minor hardware service of the Scali network.

2003-10-10 SGI
In order to improve the overall reliability, maintenance work on the cooling equipment in the computer room was performed starting Friday night October 10th at 18.00 ending Saturday morning October 11th at 02.00. The SGI3K was unavailable during that time. Running jobs were checkpointed and restarted after the interrupt. NSC apologizes for the inconvenience.

2003-09-24--26 SGI
The SGI 3800 system was down Wednesday September 24 from 10:00 until 18:00 for an operating system upgrade. Regrettably the checkpointed jobs could not be restarted.

The update was not successful. We are now back to the version we had Wednesday morning 2003-09-24. Please check your batch-jobs. They might have been affected. NSC apologizes for the inconvenience.

2003-09-23 Monolith and Ingvar
Due to the power failure that struck large parts of Sweden around lunchtime, several of our systems went down, including Ingvar and Monolith. They are now back in operation.

2003-09-23 Lustig
Due to the power failure that struck large parts of Sweden around lunchtime, several of our systems went down, including Lustig. Lustig is also back now.

2003-09-09 Monolith
Tuesday morning, September 9th, at 11:00 a.m. NSC stopped Monolith for system maintenance. Monolith was available again 11:00 p.m.

  1. We have changed to a Linux kernel that makes at least one of our important categories of parallel applications run much faster.
  2. The default Intel Fortran compiler (ifc) is now version 7.1. Earlier it was version 6.0.
  3. Our RAID disk systems have been updated with a new firmware to make them handle disk problems better. Some other minor changes have been done to get better read/write throughput.
  4. The Scali network programs are updated to fix some bugs and the Scali network itself has been tested to see if we can get it to work even better than today.

2003-08-25 SGI
We have had problems with some file systems this morning. The system was restarted and is back from 09:30. We are sorry for the inconvenience.

2003-08-19 Monolith
On Tuesday at 11:00-15:00 we reserved Monolith for system maintenance. Jobs that could not be scheduled to finish before Tuesday were held in queue and started when Monolith became available again.
During the system maintenance period we:

  1. Installed a new patched driver for the SCALI network (a severe race condition was discovered last week).
  2. Installed a new ScaMPI library with a modified backoff algorithm (to make nodes behave more friendly against each other).
This will hopefully resolve some of the problems we have seen with the SCALI network. There is no need to recompile or relink any applications.

2003-08-07 SGI
Due to a failed I/O subsystem the SGI crashed at 16:00. Back after a few hours. We are sorry for the inconvenience.

2003-08-06 SGI
Due to a failed I/O subsystem the SGI crashed at 16:30. Back after a few hours. We are sorry for the inconvenience.

2003-07-26--29 Monolith
Monolith was not available from Saturday morning due to a disk server crash. We had critical problems with two of the storage servers. It is back in operation from Tuesday 16:00.

2003-07-29 SGI
Due to swap disk problems the system behaved very badly this morning. Unfortunately all runnng jobs were lost. System back at 08:50. We are sorry for the inconvenience.

2003-07-25 SGI
Service stop on Friday July 25 at 13:00 to replace the faulty power supply. Back in operation at 17:00.

2003-07-23 SGI
The system stopped at 05:51 Wednesday due to a faulty CPU power supply, which must be replaced. The machine up and running from Wednesday 11:00.

2003-07-15--16 Monolith
Monolith was not available due to system maintenance from Tuesday 10:00 until Wednesday 17:00.

2003-07-01 SGI
The system stopped responding around 12:10. We managed to get a dump out of the system and will send it over to SGI so they can analyse whats gone wrong. All jobs that could be restarted were restarted. System up again around 13:30.

2003-07-01 Monolith
Monolith was not available during the morning. It is now back in operation. We had a problem with one of the processors.

We are also having a problem with those who switched project at midnight. We are changing the project information manually.

2003-06-26 SGI
The SGI 3800 was unavailable due to replacement of a CPU on Thursday June 26, between 11 and 13. All running jobs were checkpointed and later resumed.

2003-06-17--18 Monolith
On Tuesday morning at 10:00 a.m. we stopped Monolith for system maintenance and some system changes. This stop included also the login nodes. The system was available again from Wednesday 10:00. We did not disturb the run queue, so jobs that could not be started before the system stop started automatically some time after we had started Monolith again.

2003-06-09--10 Monolith
The thunderstorm Monday morning has regrettably stopped some of our computers. Monolith back in service Tuesday 13:00.

2003-09-09--10 Lustig
The thunderstorm Monday morning has regrettably stopped some of our computers. Lustig back in service Tuesday 12:00.

2003-05-27 SGI
The SGI 3800 was scheduled to be unavailable due to replacement of a power supply on Tuesday, May 27, between 12 and 14. Regrettably it crashed at 11:00 and all running jobs were lost. Back at 12:30 with the new power supply installed.

2003-05-18 Monolith
We had a problem with the queuing system Sunday morning.

2003-05-13--15 Monolith
From Tuesday morning at 10:00 a.m. we stoppped Monolith for system maintenance and some system changes. This stop included also the login nodes. The answer from the login-node was that the password was not accepted!

As planned Monolith is available again from 10:00 a.m. on Thursday morning.

We did not disturb the run queue, so jobs that could not be started before the system stop start automatically after that Monolith is back.

The main issue for the system stop was to improve system storage bandwidth and response times and thus also interactive response times on the login nodes. System storage bandwidth has been increased, as planned, and the SCALI network runs better than ever before. Some of the /disk/global users have been moved to the new file system /disk/global4, to increase disk performance for all users. The /home file system is not splitted (yet), but is moved to a separate file server.

2003-05-08--09 Grendel
Grendel was unavailable due to extensive maintenance work from 15:00 Thursday until 19:00 Friday.

2003-04-26--28 Monolith
We once again experienced serious disk problems on Monolith from Saturday 04:00. The computer is back in operation from Monday 1:30 pm.

The /disk/global/ file system has broken down, again. This time the file system was damaged beyond repair, so all data on it has been lost. We can only, once more, offer our sincere apologies. Since /disk/global is only intended as a temporary storage area, there are no back-ups made of it. Ironically, we had a number of improvements planned to avoid these kinds of problems, but this crash came before we had time to implement the planned changes. Some of the less time consuming changes have been made now; the new /disk/global file system is smaller (1 TB instead of 2 TB), and the underlying file system type has been changed to reiserfs. This should improve stability and performance, respectively. Then, at a later date, another file server will be added to Monolith, and approximately half the users will be migrated there, to share the load. We will get back to you on this. On a medium to long term scale, we are investigating alternative storage solutions.

The problem has been due to the heavy load on the /disk/global/ system. We have now received a software update that will be able to handle also a heavy load.

2003-04-23 SGI
Due to city power failure last night the SGI 3800 had to be restarted. All jobs had to be restarted from the beginning.

2003-04-22 SGI
Due to a failed disk on the SGI 3800 we needed to restart the system at noon in order to replace the failed disk. Checkpointable jobs were resumed.

2003-04-12 Monolith
We once again experienced serious disk problems on Monolith from Saturday 19:15 until around noon Sunday. Monolith was very slow Monday morning because of remaining disk problems. We are studying various options to achieve an acceptable solution.

2003-04-03 Monolith
We once again experienced disk problems on Monolith. During the restore process the computer was regrettably very slow!

2003-03-25--27 Monolith
From Tuesday morning we experienced disk problems on one of Monolith's storage nodes. The problem was fixed Thursday noon.

2003-03-20 Mail
The NSC mail server was down Thursday 19:00 - 21:00 for updating.

2003-03-17--18 Monolith and Ingvar
Monolith and Ingvar were stopped due to loss of cooling Monday evening. Back in operation at 09:00.

2003-03-09 Mail
The NSC mail server was down Sunday until Monday morning 08:00.

2003-03-07 SGI
The SGI 3800 system at NSC crashed Friday night at about 21:00. The system was back online about 22:00. All jobs had to be restarted from the beginning of the job. The reason for the crash is under investigation in cooperation with the vendor SGI.

2003-03-07 Monolith
Accounting calculations has been updated. One node hour equals two CPU hours.

2003-02-23 Mail
The NSC mail server was down Sunday until Monday morning 08:00.

2003-02-20 Monolith
The entire Monolith was down for maintenance Thursday 13:00-21:00.

2003-02-13 SGI
The SGI 3800 system at NSC was down for maintenance Thursday February 13 08:00-09.00. All checkpointable running jobs were checkpointed before maintenance was started and later resumed when the system was back on line. Problem with one job, its user has been informed.

2003-02-10 Local clusters
Today 10:45 - 12:15 we had a cooling problem in our computer room, therefore the local clusters were not in operation.

2003-01-31 T3E
The T3E service was discontinued January 31st!

2003-01-21 SGI
The SGI 3800 system at NSC crashed this morning at about 8:00. The system was back at 11:00. All jobs had to be restarted from the beginning of the job.

The reason for the crash is under investigation in cooperation with the vendor SGI.

2003-01-16--20 T3E
The T3E stopped at 13:13 and was back a few times but stopped again. We did full maintenance on it on Monday, and it was available from Monday 15:15.

2003-01-16--27 Monolith
We needed to perform a number of maintenance and troubleshooting tasks on Monolith, including upgrading BIOS on all nodes and replacing faulty network cabling. Because of this, Monolith was unavailable to users for a little more than one week.

2003-01-13 T3E
Due to a Gigaring problem we had a maintenance stop on the T3E System Monday around 9:00.


2002-12-28 SGI
The SGI 3800 crashed of unknown reason at 10:00 and was back at 11:00. All running jobs were restarted from the beginning of the job.

2002-12-18 Monolith
On Wednesday we installed a large storage area on Monolith. The system was therefore unavailable for a few hours and regrettably became unstable.

2002-12-05 T3E
Due to torus problem (broken PE) we had a maintenance stop on the T3E System Thursday 12:00 - 16:00. All running jobs were checkpointed and later resumed.

2002-12-04 SGI
The SGI 3800 crashed of unknown reason at 15:30 and was back at 17:30. All running jobs were restarted from the beginning of the job.

2002-11-21 T3E
The T3E was restarted around noon, back after one hour.

2002-11-14 SGI
Thursday November 14, between 7.00 and 10.00 we performed a system reboot of the SGI 3800 to clear out some issues related to memory that is over allocated.

All running jobs were checkpointed and later resumed. Batch queues "long", "long_med", "long_par" and "verylong" were not started until after the maintenance.

2002-11-12 Ingvar
The software on Ingvar was upgraded, the machine was therefore stopped a few hours on Tuesday.

2002-10-22 SGI
The SGI3K system was down October 22, 12:00-14:00, to replace a bad processor. All running jobs were checkpointed and restarted from checkpoint when the system was back up.

2002-10-14 SGI
The SGI3K system was unavailable Monday October 14, between 11:00 and 16:00. We installed new disks (840 GByte) to allow for a larger scratch file system for academic users. Please direct all your scratch-files to this new filesystem /nsc_scr. As on the old /scratch file system, files not accessed for two weeks will be deleted. The old /scratch file system will be available at least until 2002-11-01.

2002-09-27 SGI
We needed to bring down the SGI 3800 system in order to clean up some software issues related to the recent operating system upgrade.

The system work took place Friday 27 September 08:00-10:00 approximately, during which time the system was unavailable to users. Running jobs were checkpointed and resumed.

2002-09-24 SGI
The SGI3K system was unavailable Tuesday September 24 due to system maintenance.

The operating system was upgraded and new versions of compilers and math libraries were installed. Swap and checkpoint areas were also increased. We expected this to take the whole day which meant that you were not able to access the system from around 08:00 until sometime in the evening. All jobs that were running were checkpointed and later restarted. Since both the operating system and the checkpoint areas were modified we could not guarantee that all jobs would be restarted from checkpoint. We regrettably had some jobs which could not be restarted.

2002-09-03 T3E
Tuesday 3 Sept 12:00 - 15:00 we had preventive maintenance on the Cray T3E. The system was unavailable under this period. All running jobs were checkpointed and restarted when the system was up running again.

2002-07-23 SGI
The SGI 3800 system was shut down Tuesday at 11:00 for replacement of a disk, upgrading the memory, and investigating yesterdays crash. The work was finished 17:15 and everything seems OK.

2002-07-22 SGI
The SGI 3800 system crashed again Monday at 09:15. It was a problem with the checkpoint disk. The computer was available from 13:20 but it did not work well, it did in some cases produce an empty output file.
The broken disk in /checkpoint was disconnected until a new one arrived. Unfortunately some jobs did not checkpoint correctly due to this and could not continue running, please resubmit these jobs to the batch system. This also applied to pending jobs, they have also been lost.

2002-07-18--19 SGI
The SGI 3800 system crashed again Thursday at 16:35. Service engineers from SGI have replaced the system disk controller and have been running tests. The computer is available from Friday 15:30.

2002-07-16--17 SGI
The SGI 3800 system crashed again Tuesday at 15:30. Replaced a disk controller and swapped two CPU cards. Replaced a certain CPU. We have run some tests and the computer is available from Wednesday 10:40.

2002-07-16--17 T3E
The T3E status via the Web and xlotto did not work from Tuesday 12:15 until Wednesday 15:00. The status via ftp worked fine.

2002-07-12--13 SGI
The SGI 3800 system crashed Friday at 23:00 and was back Saturday 10:25.

2002-07-11--12 SGI
The SGI 3800 system crashed Thursday, twice in a row between 11.00 and 12.00. A memory board was found bad and was replaced Friday July 12, between 11.00 and 13.00. All running jobs were checkpointed and resumed, if possible.

2002-07-09 SGI
The SGI3K was down between 11.00 and 14.00 Tuesday July 9, for hardware maintenance. All running jobs were checkpointed and resumed when the system was back up.

2002-07-05--07 NSC
Power outage at NSC on Friday, July 5, due to electrical work. The outage started on Friday afternoon and lasted until Sunday 15:00. This effected the PC-cluster Ingvar, the NSC website, the email and FTP services. The Cray T3E and SGI 3800 located at SAAB were available though.

2002-07-03 Lustig
The computer "Lustig" was temporarily unavailable because it was moved to a new physical location in order to make room for the new Linux cluster. Lustig has a new IP number 130.236.190.14 and is now operating without the graphic pipes.

2002-07-01--02 SGI
NSC has expanded the SGI 3800 system to 128 processors and 128 GB of memory! This happened July 1-2 2002, during which time the system was unavailable. The work started Monday morning. All running jobs were checkpointed and were resumed when the system was back up Tuesday afternoon.

2002-06-24 T3E
The T3E system was rebooted 11:30 - 12:10 due to minor os_upgrade. All running jobs were checkpointed and restarted.

2002-05-27 SGI
We had a problem with the SGI3K system at 21:00 because of too heavy memory utilization. Back in operation Tuesday morning.

2002-05-19 T3E
The T3E status via the Web did not work Sunday. The status via ftp worked fine.

2002-05-10--13 T3E
The T3E status via the Web stopped working after the reboot, back Monday morning. The status via ftp worked fine.

2002-05-10 T3E
The T3E was rebooted around 07:30.

2002-05-01 T3E
The T3E status via the Web stopped working at 03:43 on May 1 and was back during the afternoon May 2. The status via ftp is working fine.

2002-04-18 SGI
We still had stability problems with the SGI3K system. Hardware maintenance took place at 14:00.

2002-04-16--17 SGI
We have had several crashes Tuesday on the SGI3K system and have indications of hardware problems. We tried to run the system with reduced capacity overnight but the machine stopped Tuesday at 18:00. It was started again Wednesday at 07:55.

System maintenance took place Wednesday 11:00 - 14:30.

2002-04-16 SGI
The SGI3K was restarted this morning since we had partial failures in some services and preferred to have a controlled reboot with the possibility to save most jobs through the checkpoint facility.

The system was down between approx. 10:20-10:30.

Most jobs were checkpointed and resumed after the reboot. A handful of jobs were not checkpointable and we have reason to believe that they were all MPI jobs that were run without the "-cpr" option to "-mpirun". As described in the User Guide, you must submit MPI jobs with

mpirun -cpr -np ...
to be able to checkpoint and restart the job.

2002-03-18 T3E
Temporary problem from 13:35, back at 14:25.

2002-03-18 T3E
Due to minor OS upgrade, the Cray T3E was rebooted Monday March 18 around 07:00. All running jobs were checkpointed and restarted.

2002-02-24 T3E
The T3E system was rebooted at 14:00 because of a network problem.

2002-02-22 SGI
The DMF tape robot access for users is back from 14:30 on Friday.

2002-02-21 SGI
System maintenance was made on the SGI at NSC Thursday February 21, in the afternoon between 14.00-17.00.

2002-02-20 Email
The email to and from NSC was down from the evening of Wednesday, February 20, until Thursday morning 09:15.

2002-02-20 SGI
We are still having problems with the system. It was back about 09:00. All jobs were restarted from the beginning, since the system crashed without any checkpointing of jobs.

2002-02-19 SGI
Due to recent crashes we performed system maintenance Tuesday February 19 between 12.00-19.00. All running jobs were checkpointed and later resumed.

2002-02-15 SGI
The system crashed at 03:30 and was restarted at 07:30. All running jobs were lost and requeued. NSC apologizes for the inconvenience.

2002-02-12 SGI
Due to maintenance of the power distribution system and system software test the SGI 3800 was down Tuesday February 12, 10:00 - 15:00. Checkpointable jobs were continued after restart. NSC apologizes for the inconvenience.

2002-02-05 T3E
The T3E system was rebooted Tuesday about 18:00 due to work with replacement of a transducer for pressure. This transducer was the reason for the stop of the T3E during Saturday afternoon, February 2.

2002-01-09 SGI
It was not possible to log in to the SGI 3800 from Wednesday evening until Thursday 09:15.

2002-01-08 T3E
We made a short maintenance break at 16:39 until 17:35 on the T3E in order to replace the faulty processor 96.

2002-01-03 T3E
We had a short maintenance break at about 16:30 until 17:00 on the T3E system. Just to replace the transducer that made the system stop 2001-12-28 and 2002-01-01.

2002-01-01 T3E
Hardware failure at 01:49. Back at 10:09.


2001-12-28 T3E
Hardware failure at 13:07. The fix required some work. Back at 15:30. Some jobs were regrettably lost. We regret the inconvenience.

2001-12-14 T3E
At 15:15 we lost one PE. We reconfigured the system and rebooted at 19:15. All jobs were checkpointed.
PE 96 was still down. We waited for a replacement part. Fixed 8 January.

2001-12-12 T3E
Due to work on the cooling system we stopped the T3E system 11:00 - 11:45. The jobs were checkpointed and restarted in the normal manner.

2001-12-04 SGI
At 15:30 the system crashed, all running batch jobs were terminated. We are investigating the reason for this but all terminated jobs are lost and needs to be restarted, unfortunately. Check the status of your jobs and resubmit your jobs accordingly.

2001-12-03 SGI
1) The system crashed early this morning and was rebooted. All running jobs were restarted from the beginning.

2) The project accounting was not reset December 1. This means that users that were in the "bonus" group on November 30, still are. We are working on fixing the problem, but until it is fixed we will manually move bonus jobs to the correct queues. The problem has now been fixed, bonus is working

3) The interactive queue is closed. We have an intermittent problem where interactive jobs are not started. We are working on fixing the problem but the queue will remain closed until we have resolved the issue. The problem has now been fixed, the interactive queues are working

4) Mail does not work on the SGI at the moment.

2001-11-29 SGI
The software upgrade on the SGI 3800 last Wednesday was not successful. We have therefore scheduled a new maintenance pass this Thursday (November 29) between 8-12 for the upgrade of the operating system and message passing libraries on the SGI 3800 at NSC. This was successful and the system is in full operation from 15:00.

Running jobs could not be checkpointed since the operating system was upgraded. Subsequently, batch queues were not permitted to start new jobs prior to the upgrade.

We are sorry about the inconvenience and the short notice but do not see much of an alternative. If you have any questions about the upgrade, please do not hesitate to send us an email at support@nsc.liu.se.

2001-11-21 SGI
The operating system and message passing libraries on the SGI 3800 at NSC was not upgraded on Wednesday November 21. The system was unavailable between 10.00 and 15.00. Running jobs were checkpointed and restarted.

2001-11-15 T3E
The T3E system was unavailable 07:00 - 08:00 due to maintenance.

2001-11-12 SGI
The computer was down from 09:00 until 09:30. It was not working well before the shutdown.

2001-10-30 T3E
The computer communication to SMHI and the academic community went down at 03:30 due to an interface problem. Back at 09:06.

2001-10-28 SGI
The computer was down because of a full scratch disk from 07:50 until 11:30 on Sunday.

2001-10-16 T3E
The computer went down at 12:38. Back 12:58.

2001-10-16 SGI
The computer was down for power distribution maintenance mid-day on Tuesday. Checkpointable jobs were continued after the interrupt.

2001-10-11 T3E
The computer was down for maintenance 17:30 to 19:00.

2001-10-05 SGI
Due to system performance testing, the SGI3K at NSC was offline Friday October 5, 08:00-12:00. Checkpointable jobs were continued after the interrupt.

2001-10-04 SGI
The system crashed at 23:30.

2001-09-26 SGI
Due to system performance testing, the SGI3K at NSC was offline Wednesday September 26, 08:00-13:30. Checkpointable jobs were continued after the interrupt.

2001-09-17 SGI
Some problems at about 11:00. Back at 11:52.

2001-09-04 SGI
A new Unix kernel was installed last Thursday. So far it looks good and we have not seen any of the problems we had the previous week.

In order to investigate an issue related to the tape robot we needed to bring the system down for a couple of hours. This happened Tuesday September 4 between 10 and 12. All jobs were checkpointed and later restarted.

2001-08-28--29 T3E
We got problems with the internal communication on the T3E on Tuesday evening, and the problem reappeared Wednesday 10:27. The problem was fixed Wednesday 23:25.

2001-08-29 SGI
Unfortunately, system work was going on until Thursday morning on the SGI at NSC. Before Thursday morning, only the test queues were enabled (test and test_par).

In addition, the following jobs were started (manually):

  • All single processor jobs.
  • All Gaussian-98 (single and parallel) jobs.

Parallel (MPI, PVM, Shmem and OpenMP) jobs were not started until after a new Unix kernel was installed Thursday morning, 2001-08-30 at 08:20. The checkpointing failed, so you have to resubmit any lost jobs.

The system is working from 08:50.

2001-08-27 SGI
We got problems with the system at 10:20 and the system was partially available from 14:15 and completely from 17:30.

2001-08-16 Communication
Heavy local thunderstorm stopped the local computers at NSC (including Web and email) around 11 a.m. for about an hour.

2001-08-16 SGI
System restart probably due to thunderstorm. All jobs were restarted at about noon, some from checkpoint.

2001-08-16--09-06 SGI
During the next three weeks we will perform system tests on the SGI 3800 at NSC to evaluate the performance of the now fully configured system. During most of these tests interactive access will still be possible but we will inactivate all batch queues and checkpoint all running jobs. You can still submit batch jobs but they will not be started until after the system test. Some tests might require closing down all access to the system. Before each test, users will be notified through the sgi3k email list as well as a login message.

The first test was Thursday August 16, between approximately 08:30 and 11:30.

The second test was Thursday August 23, between approximately 08:30 and 11:30.

No tests are planned for the week 27 - 31 August.

2001-08-10 SGI
Short stop 13:30 - 14:00 due to a power problem.

2001-08-06 SGI
The SGI 3800 processors were being upgraded from 400 MHz to 500 MHz during the week 32 (August 6 - 10). The system was available already from Wednesday afternoon, August 8.

2001-07-25 T3E
The computer was down 04:07 to 08:30 due to a cooling problem.

2001-07-04 SGI
Hardware maintenance took place on Wednesday afternoon, July 4. Checkpointable jobs were continued after the restart.

2001-07-02 SGI
Hardware maintenance was done this afternoon. Checkpointable jobs unfortunately failed and all jobs were restarted from the beginning. NSC apologizes.

2001-05-11 T3E
The T3E was rebooted Friday morning at about 07:00 for a minor system maintenance. Back 07:30.

2001-05-10 T3E
The T3E was rebooted Thursday morning at about 07:30 for a minor system maintenance. Back 08:25.

2001-04-19 T3E
The T3E was rebooted today at about 17:30 for a minor OS upgrade.

2001-04-19 SGI
Thursday afternoon April 19 the operating system was upgraded on the SGI 3800 system at NSC. All running jobs were checkpointed but unfortunately, due to the nature of the upgrade, the jobs were not able to restart.

The computer was back in operation early Friday morning, but cron was not back in operation until Friday noon.

2001-04-12 T3E
The T3E was not rebooted Thursday morning as previously announced.

2001-03-15 T3E
The T3E was rebooted today at about 16:30 for hardware maintenance.

2001-03-07 SGI
The SGI 3800 is now available.

2001-03-05 SGI
The SGI 2400 is unavailable from March 5.

2001-02-19 T3E
The T3E was rebooted on Monday at about 07:00 due to a minor OS upgrade.

2001-02-16 WWW
Search with Swedish characters now works again.

2001-01-02 T3E
The T3E was partly unavailable from 15:46 until 16:35 today.

2001-01-02 T3E
The T3E project data base was not completely updated, some erroneous electronic letters were sent to some users about them having already used all time for January. The data base has now been properly updated.


2000-12-29 T3E
The T3E was partly unavailable from 13:20 until 14:40 today. One processor was faulty.

2000-12-19 T3E
The T3E was down for maintenance Tuesday December 19 from 16:45.

2000-12-04 T3E
The T3E was rebooted today December 4 at 16:40 due to a failing processor.

2000-11-30 T3E
The T3E was rebooted Thursday November 30 at 16:50 due to a minor OS-upgrade.

2000-11-23 T3E
The T3E was rebooted today at 16:30, back at 17:30.

2000-11-09 T3E
The T3E was down a short while for maintenance from 16:30 today.

2000-11-06 SGI 2000
Due to preparations for the installation of the SGI3K the SGI2K was down Monday November 6 from 07:00 until 11:00. Some running jobs were lost.

2000-10-31 T3E
The T3E was down from 10:57 today. Back 11:25.

2000-09-26 SGI 2000
SGI 2000 was unavailable from 08:00 until early afternoon of Tuesday 26 September due to software maintenance.

2000-09-08 T3E
The T3E was down from 02:12 until 08:47. Problem with a command processor.

2000-09-01 C90
The C90 service was discontinued September 1st!
Available for SMHI until September 18.

2000-08-24 T3E
The T3E was down for maintenance from 17:30 Thursday afternoon.

2000-08-06 SGI 2000
On Wednesday August 16 we performed software maintenance (no interrupt).

2000-08-15 T3E
The T3E was scheduled for maintenance from 17:30 on Tuesday afternoon. This was however delayed to Wednesday morning. We regret the temporary stops in the T3E operation during Tuesday afternoon and Wednesday, due to problems with one of the 272 processors.

2000-08-07 SGI 2000
On Monday August 7 we performed software maintenance (no interrupt).

2000-08-01 C90
The C90 stopped on Monday evening at about 20:00 due to a CPU problem. Back on Tuesday at 08:25.

2000-07-10 Communication
The network stopped at 06:50 this morning! Back at 08:50.

2000-07-06 Documentation
The Cray documentation in DynaWeb on
manul.nsc.liu.se was temporarily down. It now works again!

2000-07-06 SGI 2000
The computer was down for maintenance today until early evening.

2000-07-04 SGI 2000
The batch system LSF was not working from Monday 11:00 until Tuesday 11:00.

2000-06-30 C90
The C90 stopped on Friday afternoon at 17:38. Restarted 10:18 on Saturday.

2000-06-29 Communication
A short power failure at the university caused a stop of all campus computers at 13:45. Lustig was started at 16:00. Banan was started June 30 at 14:30.

2000-06-13 T3E
From 08:36 until 10:20 the T3E was not available from the university network.

2000-06-13 T3E
The T3E was rebooted at 07:00.

2000-06-08 T3E
The T3E was down from 07:00 due to microcode update for the fibre channel disks.

2000-05-11 T3E
The T3E was down from yesterday 22:28 until today 08:38.

2000-05-05 C90
The C 90 was rebooted at 07:00 this morning in order to finally fix the memory problem.

2000-05-02 C90
The C 90 halted during a few nights due to a memory problem. The problem was fixed this morning but some extra work is in progress.

2000-03-30 T3E
The T3E was down today from 17:30 for test of new UNICOS 2.0.5. Back to 2.0.4 as soon as possible.

2000-03-22--29 C90
The C 90 halted during the Wednesday evening (March 22) due to a memory problem and was restarted Thursday morning with regrettably only half the usually available primary memory. The queues were therefore quite long. The memory board was replaced on Wednesday 29 March. All running jobs were checkpointed and restarted 14:00. The operating system was however not immediately aware that we were back to a total of 256 Mwords memory.

2000-03-16 C90
The C90 and its disks were reconfigured on Thursday 16 March during the regular service. Any job that was running but not finished at 16:00 was killed and may not be restarted.

2000-03-06 C90
The C 90 was restarted Monday at 3 p.m. with regrettably only half the usually available primary memory. It is now Tuesday 2 p.m. back to full capacity.

2000-03-05--06 C90
The C 90 was regrettably unavailable from Sunday 11:00 until Monday 08:30.

2000-02-12--14 C90
The C 90 was regrettably unavailable from Saturday 06:52 until Monday 08:19.

2000-02-04--07 Communication
Due to reconfiguration of the network at Linköping university we have had some problems with the network.

2000-02-04 C90
A disk became faulty early in the morning and was repaired 09:00. During the day we had some problems with the queues, so some jobs were regrettably delayed.


1999-10-22 C90
Cray C90 was down 12:30 - 15:30 for special maintenance (60 Hz MG Set Drive Belt replacement).

1999-10-08 T3E
The I/O system became faulty at 22:45 Thursday night. The system was again available from 10:30 Friday morning.

1999-09-28 Communication
Short interrupts occurred in the communication 1999-09-28 08:00 - 10:00 due to a reconfiguration of the routers between the university and the Cray computers.

1999-07-20 T3E
The T3E was down 13:50 -16:10 on July 20 due to hardware maintenance.

1999-07-14 T3E
The T3E has been upgraded with 40 additional processors. It was unavailable from the morning of Tuesday July 13 until the evening of Wednesday July 14.

1999-07-14 C90
The C90 was down for a short while at 16:00 due to a severe thunderstorm.

1999-06-06 Communication
Due to work on the electrical net at the university the NSC mail, ftp, and web servers were unavailable on Sunday June 6 from 7 am to 5 pm.

1999-05-03 T3E
The operating system on the T3E was upgraded Monday May 3, 07:00 - 09:00.

1999-04-24 T3E
The T3E crashed at 23:45 on Saturday evening, due to a pump pressure fault. Back on Sunday morning 10:49.

1999-04-21 T3E
The T3 crashed at 12:00. The file system /nsc/home/ had to be restored, all files stored there between 02:00 and 12:00 that day were lost.

1999-04-11 T3E
The T3 was down due to I/O problems from Saturday at 23:00 until Sunday at 12:50. The data migration facility was not working until 15:00 on Monday 12 April.

1999-04-09 Communication
Due to a sudden failure of one of the SUNET servers in Stockholm the whole network was down from this morning (no national or international connection, only the local networks were up). Everything was working again from 10:30.

1999-03-16 T3E
The T3E stopped at 16:04, investigation in progress. Up and running again at 16:45.

1999-03-05 Communication
Due to the recent move of the university computing centre the Linköping university mail and some other network facilities were down from 5 pm to 7 pm.

1999-03-02 C90
The C90 was reconfigured with more secondary storage and striped for better performance.

1999-02-24 T3E
The T3E stopped at 04:25:33 due to "Pe 0xe6 Panic EMERGENCY (kernel) ASSERT: nc1vnops.c: 1838 (part >=0)". Up and running again at 06:01.

1999-02-13--14 Communication
Due to the move of the university computing centre the network could be partly down during this weekend. The centre performed the move in such a way that the network continued to work most of the time.

1999-02-04 T3E
Stop on the T3E from 14:08 until 16:00 due to another file problem.

1999-02-01 T3E
A file problem caused disturbances until 11:15. We now know the reason and will be able to avoid it in the future.

1999-01-28 Communication
On Thursday January 28 the telecommunications (including data transfer) to and from SAAB and the Cray's was broken between 00:00 and 01:00 due to updating.

1999-01-25 T3E
At 01:30 timeout on all disks, the operator tried a restart at 02:30 but failed. Success at 06:55.

1999-01-23 C90
The C90 behaved badly from 15:00, because the tape robot had not been working during the T3E maintenance, and too many files had been created on the system. The file system /stor/smhi/ became full at 03:40 on Sunday and at 19:07 also /usr/spool/ became full. Rebooted Monday 07:45.

1999-01-23 T3E
The T3E stopped at 01:55 due to problems with a power supply. The planned maintenance of the external power supply for the T3E was performed together with fixing the memory error of processor 3 as well as the internal power supply and increase of the /nsc/home/ disk area to the double size. The T3E was in operation again at 13:24, earlier than the planned time 16:00.

1999-01-10 T3E
Processor number 3 of the T3E stopped at about 15:00 Sunday. The problem was solved Monday morning but reappeared at noon. The faulty processor was later renumbered in order not to break the continuous block of 256 Mbytes processors.


1998-12-29 NSC
The NSC computers for email, ftp, and WWW were unavailable Tuesday 07:00-10:45 due to work on the electric power in the NSC building.

1998-12-28 C90
The C90 stopped at 23:37 on Sunday 27 December. Back in operation Monday 08:30.

1998-12-14 T3E
The T3E stopped at 09:31 today. Restarted at 10:33. Stopped again at 11:56. In operation again 13:32.

1998-12-13 C90
The C90 stopped at 02:32 on Sunday 13 December, due to power failure. It was restarted 11:24.

1998-11-24 C90
Problems with a processor caused a halt from 22:28 until 23:40.

1998-11-19 T3E
Problems with the file system caused a reboot between 08:48 and 09:08.

1998-11-13 T3E
We had bad luck with the T3E on this Friday the 13th. The error also occurred on Saturday 21:50 and the T3E was restarted Sunday 09:00. We have on Monday changed a parameter concerned with disk storage (which was expanded on 1998-11-12), and the problem is hopefully resolved.

1998-11-11 C90
We had a hardware error on a power supply for a disk controller from yesterday 22:00. This caused problems for some jobs. The power supply has now 10:30 been replaced.

1998-10-29 T3E
The T3E was down for service 16:00 - 20:00.

1998-10-26 Tape station
We had service on the tape station from 16:30 for a few hours.

1998-10-16 C90
We had hardware errors from 08:12 this morning. Repair was made at 11:00, but we started with the SMHI weather forecast. The jobs were released at 11:15.

1998-10-01 T3E
The regular service pass was somewhat delayed due to the work on the C90. The T3E was back in service a bit later than 20:00.

1998-10-01 C90
During the regular service pass 17:30 - 20:00 the C90 was down due to a memory board exchange.

1998-09-28 T3E
During the regular service pass 07:00 - 09:00 we upgraded to Programming Environment 3.0.2.

1998-09-17 C90
During the regular service pass 17:00 - 20:00 the C90 was down due to disk reconfiguration:

  • A faulty disk drive was replaced.
  • /nsc/tmp was striped over four disks which increased its bandwidth to approx. 44 Mbytes/s. This will hopefully shorten the execution time of I/O-bound applications such as Gaussian.

1998-09-11 T3E
We regret that the batch system on the T3E was not working from Friday 17:30 until Monday 08:45. It was a scheduling problem.

1998-09-07 T3E
During the regular service pass 07:00 - 08:00 we installed the new version 2.0.3.22 of the unicos/mk operating system.

1998-09-06 Support
During Sunday afternoon our mailbox was flooded and any mail sent to NSC from Sunday 14:00 until Monday 08:00 was regrettably not received.

1998-09-03 T3E
During the regular service pass 16:00 - 20:00 we tested the new version 2.0.3.22 of the unicos/mk operating system.

1998-08-29 T3E
The processor number 3 became faulty at 11:25. This problem caused the GRM to hang. The machine was back with all processors at 15:20.

1998-08-29 C90
An electrical power failure due to thunderstorm stopped the C90 from 10:00 until 12:10.

1998-08-25 C90
A thunderstorm gave an electricity failure, the C90 stopped from 16:15 until 18:35.

1998-08-25 T3E
The processor number 4 became faulty. It was switched Wednesday morning.

1998-08-24 T3E
The faulty processor number 158 was switched this Monday morning.

1998-08-16 T3E
Sunday evening the T3E closed down because of an alarm from the cooling system. The problem was fixed 07:00 Monday morning.

1998-08-13 C90
A thunderstorm gave an electricity failure in Linköping at about 13:00, the C90 stopped for a while, but the new non-interrupt facility on the T3E worked well. It can handle longer interrupts than the somewhat simpler system protecting the C90.

1998-08-06 T3E
At the regular service the faulty PE was repaired.

1998-07-31 T3E
A reboot was done at 08:30 due to memory error in a PE.

1998-07-30 Support
We have today had a problem with the support mail system. The messages were delayed, and no confirmation was sent. No messages were lost. The system now works again.

1998-07-30 T3E
A reboot at about 10:30 was necessary due to a SCSI problem.

1998-07-22 T3E
A reboot at about 09:00 was necessary due to an application causing problems with starting check-pointed NQE jobs.

1998-07-20 T3E
There was a disk crash on the T3E late Friday evening, 17 July. The system was regrettably not available from that time. No user files were lost. The disk was replaced on Monday morning and the system was up and running at 09:00.






Page last modified: 2009-03-27 14:27
For more information contact us at info@nsc.liu.se.