CTBP Header



General Info
CTBP System News and Updates
Obtaining an Account

Resources For Users
Cluster status
wiki
FAQ
HOW-TOs
Hardware
Software
Policies

UCSD
 

Latest News and Updates
06/17/08
01:46pm
1:45pm One of the AC units in the CTBP server room has failed, and most of the clusters nodes have been taken down in the room to avoid damaged hardware.

05/19/08
09:42am
There was a power failure on of the cluster switches early Sunday causing some nodes and file servers being inaccessible. This was fixed on Sunday around 10am.

03/06/08
08:33am
All systems are up. Please report any problems to ctbp-help.

03/05/08
05:13pm
5pm After a campus-wide power outage today at 11am we have most systems up but there is an intermittent connectivity to our clusters. We will send an announcement when all systems are back 100%.

02/01/08
10:48pm
The ctbp1 cluster is now open.

01/30/08
05:23pm
The ctbp1 cluster is down for maintenance. We should be able to enable access tomorrow (1/31/08) around noon.

01/22/08
11:05am
The ctbp1 cluster is now open in test mode. If there are no significant issues we will go to production early next week.

01/17/08
10:35am
We are currently upgrading the ctbp1 cluster with new nodes. The cluster capacity has been reduced and the cluster will go down later today. A new announcement will be posted when the upgraded cluster is online.

12/17/07
06:34pm
The /gpfs filesystem on ctbp1 crashed around 5:30pm today. We are bringing it back on-line right now.

10/17/07
02:07pm
ctbp1 experienced a failure today around 11am. The system is back on-line.

10/04/07
02:36pm
2:25pm ctbp1 frontend is back on-line. Sorry for the inconvenience.

10/04/07
01:34pm
12:25pm ctbp1 frontend is currently down, we are looking into this.

09/18/07
10:41am
*** SCHEDULED MAINTENANCE ***
The cluster GPFS file server will be offline on Thursday evening from 5-6pm. During that period /gpfs file system will not be accessible from ctbp1.

09/12/07
12:44pm
12:44pm /gpfs file server is back on-line, the faulty RAM which caused the crash has been replaced.

09/12/07
11:27am
11:27am The /gpfs file server crashed - we are looking into this right now. This will most likely affect jobs on ctbp1.

07/19/07
08:35pm
8:35pm The cluster frontend ctbp1 crashed again due to a hardware failure. It is back up but the failed part will have to be replaced. A scheduled downtime will be announced when we receive the part.

07/12/07
12:05pm
12:05pm The ctbp1 cluster frontend crashed and was rebooted. Sorry for the inconvenience.

05/31/07
10:42am
10:41am The /home directory NFS server had to be rebooted. This should not impact jobs running from /gpfs or /scratch file systems. Apologies for any inconvenience. Please report any problems to ctbp-help.

05/31/07
09:45am
9:45am We are experiencing problems with user home directories on the cluster. We are looking into this.

03/20/07
11:52am
11:53am We have experienced, together with a big part of the campus, a power outage which lasted 15 min. All CTBP servers including clusters are coming up, should be in production in the next 15 min. If you experience any problems please send email to ctbp-help.

02/07/07
02:07pm
New Myrinet-enabled NAMD version (2.6) has been installed in /soft/linux/pkg/NAMD_2.6-gm. If you encounter any problems please email ctbp-help.

11/28/06
11:33am
11:31am A glitch in ctbp1 configuration caused mounting of the home directories to fail on the nodes. This has been fixed and jobs in the queue should continue to run. Sorry for the inconvenience.

09/22/06
01:06pm
1:04pm Both clusters are back online.

09/18/06
10:04pm
9:59am Due to a cooling unit failure in the server room both clusters were emergency shut down. We expect them to be back on-line sometime tomorrow.

07/31/06
09:28am
9:35am A short power outage in the building forced reboot most of the cluster and termination of running jobs. Sorry for the inconvenience.

06/21/06
06:36pm
6:45pm The ctbp1 frontend crashed and has been restarted. Some jobs may have been terminated during this process. Sorry for the inconvenience.

06/16/06
03:24pm
Amber v.9 is now available on both clusters.

03/29/06
03:17pm
2:33pm The ctbp1 NAS (/gpfs) is experiencing problems. This should be fixed shortly.

02/04/06
12:49pm
12:43pm The ctbp1 frontend crashed and was rebooted. Some of the jobs were terminated in the process. Sorry for the inconvenience.

01/23/06
02:45pm
9:50am The clusters are back in production.

01/20/06
03:59pm
3:50pm We are experiencing cooling problems in the server room, most of the cluster nodes will be shut down. ETA for the repair is sometime on Monday.

01/20/06
10:23am
10:15am All clusters are back in production. Please report any problems to ctbp-help @ ctbp.ucsd.edu.

01/19/06
01:19pm
1:05pm We were hit again by a west campus-wide power outage. All CTBP servers are back up but both clusters are down due to AC failure. We will be powering the clusters up after the AC unit is fixed.

01/12/06
10:58am
10:55am There was a power outage this morning and all CTBP servers and computing clusters were rebooted. All systems are back up now.

01/03/06
10:22am
9:50am /gpfs was rebooted.

11/28/05
10:22am
9:52am ctbp1 was rebooted. This has probably effected some of the running jobs on the cluster.

11/23/05
01:01pm
12:32pm /gpfs crashed again, the NAS box is being rebooted.

11/20/05
10:30am
10:28am GPFS went off line this morning, the problem has been corrected but some jobs may have failed due to GPFS unavailability.

10/18/05
02:35pm
1:15pm The NFS server has been rebooted which fixed the /gpfs problem.

10/18/05
01:57pm
12:45pm We are currently experiencing problems with /gpfs partition on ctbp1. We are looking into this.

09/10/05
03:47pm
The NFS server with /gpfs partition for the ctbp1 cluster crashed earlier today and this may have caused some jobs to fail. The server is back on-line.

07/27/05
12:36pm
Due to high server room temperature additional nodes had to be shut down around 11am today. Jobs running on those nodes were terminated. Sorry for the inconvenience.

07/12/05
05:26pm
This morning's power spike left some of the cluster switches in non-operative state. They were rebooted and all nodes should be accessible now.

06/25/05
06:08am
CTBP NIS and mail servers went off line yesterday night. This has been resolved and both are running now. Sorry for the inconvenience.

06/21/05
05:45pm
Two racks (c5 and c7) had to be shut down (and jobs running on them terminated) for insufficient cooling in the server room.

06/17/05
10:09am
9:43am The nodes on both clusters are being powered on, we should be back in full production shortly.

06/14/05
04:24pm
4:23pm The AC is down again, all cluster nodes are off. ETA for the repair is unknown at this moment, please watch this space for updates.

06/13/05
03:54pm
The AC failed again, the cluster is down.

06/11/05
01:30pm
One of the cooling units in the server room failed today around 1:30pm. The clusters are down until the unit is fixed.

06/09/05
07:39pm
There will be another power shutdown in Urey Hall this Saturday, June 11th, 2005. All CTBP computers including the clusters and mail server will be turned off late Friday (June 10, 2005). All servers should be back up and network connectivity restored on Saturday (June 11, 205) after 1pm.

05/17/05
11:43am
Due to power shutdown in Urey Hall all CTBP computers including the cluster and mail server will be turned off late Friday (May 20, 2005). All servers should be back up on Saturday afternoon (May 21, 2005).

05/09/05
04:00pm
Power failure in the server room caused half of the cluster reboot - jobs running there were terminated. Sorry for the inconvenience.

05/05/05
04:20pm
Thu May 5 16:18:26 PDT 2005
One of the AC units in the server room failed and CTBP cluster was shut down. The cluster should be back up later today or early tomorrow.

04/27/05
10:20am
The CTBP cluster will be shut down this Friday (4/29/05) for system maintenance - a bad network switch has to be replaced. All jobs will be terminated and queuing will be disabled. The cluster should be back in production by 12pm.

04/01/05
09:24am
System upgrade is scheduled on Tuesday 4/5 and Wednesday 4/6. The cluster will not be available during that time. Also the mail and web server will be down for a short period of time on Wednesday (4/6) morning.

02/14/05
05:26pm
Mon Feb 14 17:23:46 PST 2005

ctbp1 had to be rebooted. Sorry for the inconvenience.


01/10/05
10:54am
Due to weather related power outage all servers were down from 8pm yesterday until 4am this morning. All systems are back up now.

12/28/04
03:12pm
Tue Dec 28 15:11:17 PST 2004
The cluster is back up in production.

12/27/04
10:21pm
Mon Dec 27 22:19:33 PST 2004
The cluster was emergency shut down due to high temperature in the server room.

12/07/04
10:21am
* System Notice *
All CTBP servers (including the mail/web server and cluster including all nodes) will be unavailable for users on Thursday, December 9, 2004 from 8am due to system maintenance. We will send a new announcement when the systems will back in production. Please do not log in before we are finished - the systems will be rebooted several times during the PM and you could loose your work.

12/01/04
08:35am
The AC unit has been fixed, all cluster nodes are back in production.

11/29/04
06:58pm
Mon Nov 29 18:56:45 PST 2004
Due to AC failure in the server room we are emergency shutting down half of the cluster until this is fixed tomorrow. Sorry for the inconvenience.

10/24/04
11:12am
The cluster frontend was emergency rebooted. All running jobs should not be affected.

09/21/04
11:23am
Intel compilers (ifort/icc) version 8.1 have been installed on the cluster. See CTBP software page for more details.

08/25/04
02:12pm
Newest, patched Intel compiler (version 8.0.046) has been installed on the cluster. See CTBP software page for more details.

06/29/04
09:28am
Due to power outage network connectivity to the cluster and web/mail server was lost last night and some nodes were rebooted. All systems are up now.

06/18/04
10:34am
*************************
Scheduled downtime notice
*************************
ctbp1.ucsd.edu and ctbp2.ucsd.edu will be down for a short system maintenance on Monday 6/21/04 from 2pm to 3pm. ctbp1.ucsd.edu will be switched from test to production mode after this. All batch queue policies will apply, for details please check SGE site policies.

06/04/04
08:45pm
The upgraded and expanded CTBP cluster is now open for testing. Please note that the cluster can be shut down anytime during the testing period. We are expecting to go to production mode in 2-3 days.

During the upgrade we migrated cluster queuing system from OpenPBS to GridEngine (SGE). All PBS scripts must be modified to run under SGE. Please see SGE how-to page for examples. During the testing period there are no job limits enforced so please be considerate of other users.

04/27/04
11:49am
Matlab version 6.5.1 has been installed. For more details please see CTBP software page.

03/26/04
02:44pm
Amber version 8 has been installed on the cluster. Please see CTBP software page for more details.

02/27/04
12:35pm
A brief campus-wide power surge at 11:45 today caused cluster reboot - all jobs were killed.

01/25/04
05:48pm
New version of Intel C/C++/F77/F90 compilers (v8.0) has been installed on the cluster. For more information and usage please see Cluster HOW-TO.

01/21/04
03:36pm
*********************************
Scheduled Downtime Notice
*********************************
All three main CTBP servers (ctbp, ctbp1 and ctbp2) will be down for system maintenance on Friday 1/23/04 from 10am to approx 3pm. A Dell technician is scheduled to update RAID BIOS during that time. None of the servers will be accessible during the downtime and jobs running on the cluster will be terminated. No incoming mail will be lost, it will be queued by the UCSD mail system until ctbp.ucsd.edu is back up again.

12/26/03
11:02am
After a power failure in Urey Hall all systems are back on-line. All jobs running on the cluster at the time of the outage were purged.

12/20/03
01:29pm
Failed hard drive has been replaced and the RAID array rebuilt. ctbp1.ucsd.edu is back online.

12/19/03
06:32pm
RAID array on the cluster frontend (ctbp1) had a failure. It is being rebuilt now and frontend will be accessible again after this is finished.

11/27/03
01:03pm
The cluster frontend (ctbp1) crashed and has been rebooted.

11/25/03
12:01pm
There was a power failure in the server room which brought down the main network switch. Although none of the computers crashed the networking link was down which affected most of the running jobs on the cluster.

11/16/03
11:15pm
Our cluster made it on the TOP500 Supercomputer Sites list.

09/30/03
10:43am
The CTBP cluster has been opened for testing. There is only a limited support available at this moment but we are interested in hearing your comments or requests - please send them to ctbp-help @ ctbp.ucsd.edu.



Please direct any questions or comments related to this web page to ctbp-help @ ctbp.ucsd.edu
Last modified: August 24 2007 12:57:48 pm.