|
|
|
| Latest
News and Updates |
06/17/08 01:46pm | 1:45pm One of the AC units in the CTBP server room has failed, and most of
the clusters nodes have been taken down in the room to avoid damaged
hardware. |
|
05/19/08 09:42am | There was a power failure on of the cluster switches early Sunday
causing some nodes and file servers being inaccessible. This was fixed
on Sunday around 10am. |
|
03/06/08 08:33am | All systems are up. Please report any problems to ctbp-help. |
|
03/05/08 05:13pm | 5pm After a campus-wide power outage today at 11am we
have most systems up but there is an intermittent connectivity to our
clusters. We will send an announcement when all systems are back 100%. |
|
02/01/08 10:48pm | The ctbp1 cluster is now open. |
|
01/30/08 05:23pm | The ctbp1 cluster is down for maintenance. We should be able to
enable access tomorrow (1/31/08) around noon. |
|
01/22/08 11:05am | The ctbp1 cluster is now open in test mode. If there are
no significant issues we will go to production early next week. |
|
01/17/08 10:35am | We are currently upgrading the ctbp1 cluster with new
nodes. The cluster capacity has been reduced and the cluster will go
down later today. A new announcement will be posted when the upgraded
cluster is online. |
|
12/17/07 06:34pm | The /gpfs filesystem on ctbp1 crashed around
5:30pm today. We are bringing it back on-line right now. |
|
10/17/07 02:07pm | ctbp1 experienced a failure today around 11am. The system
is back on-line. |
|
10/04/07 02:36pm | 2:25pm ctbp1 frontend is back on-line. Sorry
for the inconvenience. |
|
10/04/07 01:34pm | 12:25pm ctbp1 frontend is currently down, we
are looking into this. |
|
09/18/07 10:41am | *** SCHEDULED MAINTENANCE ***
The cluster GPFS file server will be offline on Thursday evening from
5-6pm. During that period /gpfs file system will not be
accessible from ctbp1. |
|
09/12/07 12:44pm | 12:44pm /gpfs file server is back on-line, the faulty RAM which caused the crash has been replaced. |
|
09/12/07 11:27am | 11:27am The /gpfs file server crashed - we are looking into this right now. This will most likely affect jobs on ctbp1. |
|
07/19/07 08:35pm | 8:35pm The cluster frontend ctbp1 crashed again due to a hardware failure. It is back up but the failed part will have to be replaced. A scheduled downtime will be announced when we receive the part. |
|
07/12/07 12:05pm | 12:05pm The ctbp1 cluster frontend crashed and was rebooted. Sorry for the inconvenience. |
|
05/31/07 10:42am | 10:41am The /home directory NFS server had to be rebooted. This should not impact jobs running from /gpfs or /scratch file systems. Apologies for any inconvenience. Please report any problems to ctbp-help. |
|
05/31/07 09:45am | 9:45am We are experiencing problems with user home directories on the cluster. We are looking into this. |
|
03/20/07 11:52am | 11:53am We have experienced, together with a big part of the campus, a power outage which lasted 15 min. All CTBP servers including clusters are coming up, should be in production in the next 15 min. If you experience any problems please send email to ctbp-help. |
|
02/07/07 02:07pm | New Myrinet-enabled NAMD version (2.6) has been installed in
/soft/linux/pkg/NAMD_2.6-gm. If you encounter any
problems please email ctbp-help. |
|
11/28/06 11:33am | 11:31am A glitch in ctbp1 configuration
caused mounting of the home directories to fail on the nodes. This has
been fixed and jobs in the queue should continue to run. Sorry for the
inconvenience. |
|
09/22/06 01:06pm | 1:04pm Both clusters are back online. |
|
09/18/06 10:04pm | 9:59am Due to a cooling unit failure in the server room
both clusters were emergency shut down. We expect them to be back
on-line sometime tomorrow. |
|
07/31/06 09:28am | 9:35am A short power outage in the building forced reboot
most of the cluster and termination of running jobs. Sorry for the
inconvenience. |
|
06/21/06 06:36pm | 6:45pm The ctbp1 frontend crashed and has been
restarted. Some jobs may have been terminated during this
process. Sorry for the inconvenience. |
|
06/16/06 03:24pm | Amber v.9 is now available on both clusters. |
|
03/29/06 03:17pm | 2:33pm The ctbp1 NAS (/gpfs) is experiencing problems. This should be fixed shortly. |
|
02/04/06 12:49pm | 12:43pm The ctbp1 frontend crashed and was rebooted. Some
of the jobs were terminated in the process. Sorry for the inconvenience. |
|
01/23/06 02:45pm | 9:50am The clusters are back in production. |
|
01/20/06 03:59pm | 3:50pm We are experiencing cooling problems in the server
room, most of the cluster nodes will be shut down. ETA for the repair
is sometime on Monday. |
|
01/20/06 10:23am | 10:15am All clusters are back in production. Please
report any problems to ctbp-help @ ctbp.ucsd.edu. |
|
01/19/06 01:19pm | 1:05pm We were hit again by a west campus-wide power
outage. All CTBP servers are back up but both clusters are down due
to AC failure. We will be powering the clusters up after the AC unit
is fixed. |
|
01/12/06 10:58am | 10:55am There was a power outage this morning and all CTBP servers and computing clusters were rebooted. All systems are back up now. |
|
01/03/06 10:22am | 9:50am /gpfs was rebooted. |
|
11/28/05 10:22am | 9:52am ctbp1 was rebooted. This has probably
effected some of the running jobs on the cluster. |
|
11/23/05 01:01pm | 12:32pm /gpfs crashed again, the NAS box is
being rebooted. |
|
11/20/05 10:30am | 10:28am GPFS went off line this morning, the problem has
been corrected but some jobs may have failed due to GPFS
unavailability. |
|
10/18/05 02:35pm | 1:15pm The NFS server has been rebooted which fixed the
/gpfs problem. |
|
10/18/05 01:57pm | 12:45pm We are currently experiencing problems with
/gpfs partition on ctbp1. We are looking
into this. |
|
09/10/05 03:47pm | The NFS server with /gpfs partition for the ctbp1 cluster
crashed earlier today and this may have caused some jobs to fail. The
server is back on-line. |
|
07/27/05 12:36pm | Due to high server room temperature additional nodes had to be shut
down around 11am today. Jobs running on those nodes were
terminated. Sorry for the inconvenience. |
|
07/12/05 05:26pm | This morning's power spike left some of the cluster switches in
non-operative state. They were rebooted and all nodes should be
accessible now. |
|
06/25/05 06:08am | CTBP NIS and mail servers went off line yesterday night. This has been
resolved and both are running now. Sorry for the inconvenience. |
|
06/21/05 05:45pm | Two racks (c5 and c7) had to be shut down (and jobs running on them
terminated) for insufficient cooling in the server room. |
|
06/17/05 10:09am | 9:43am The nodes on both clusters are being powered on,
we should be back in full production shortly. |
|
06/14/05 04:24pm | 4:23pm The AC is down again, all cluster nodes are
off. ETA for the repair is unknown at this moment, please watch this
space for updates. |
|
06/13/05 03:54pm | The AC failed again, the cluster is down. |
|
06/11/05 01:30pm | One of the cooling units in the server room failed today around
1:30pm. The clusters are down until the unit is fixed. |
|
06/09/05 07:39pm | There will be another power shutdown in Urey Hall this Saturday, June
11th, 2005. All CTBP computers including the clusters and mail server
will be turned off late Friday (June 10, 2005). All servers should be
back up and network connectivity restored on Saturday (June 11, 205)
after 1pm. |
|
05/17/05 11:43am | Due to power shutdown in Urey Hall all CTBP computers including the
cluster and mail server will be turned off late Friday (May 20,
2005). All servers should be back up on Saturday afternoon (May 21,
2005). |
|
05/09/05 04:00pm | Power failure in the server room caused half of the cluster reboot -
jobs running there were terminated. Sorry for the inconvenience. |
|
05/05/05 04:20pm | Thu May 5 16:18:26 PDT 2005 One of the AC units in
the server room failed and CTBP cluster was shut down. The cluster
should be back up later today or early tomorrow. |
|
04/27/05 10:20am | The CTBP cluster will be shut down this Friday (4/29/05) for system
maintenance - a bad network switch has to be replaced. All jobs will be
terminated and queuing will be disabled. The cluster should be back in
production by 12pm. |
|
04/01/05 09:24am | System upgrade is scheduled on Tuesday 4/5 and Wednesday 4/6. The
cluster will not be available during that time. Also the mail and web
server will be down for a short period of time on Wednesday (4/6)
morning. |
|
02/14/05 05:26pm | Mon Feb 14 17:23:46 PST 2005
ctbp1 had to be rebooted. Sorry for the inconvenience. |
|
01/10/05 10:54am | Due to weather related power outage all servers were down from 8pm
yesterday until 4am this morning. All systems are back up now. |
|
12/28/04 03:12pm |
Tue Dec 28 15:11:17 PST 2004
The cluster is back up in production. |
|
12/27/04 10:21pm |
Mon Dec 27 22:19:33 PST 2004
The cluster was emergency shut down due to high temperature in the
server room. |
|
12/07/04 10:21am | * System Notice *
All CTBP servers (including the mail/web server and cluster including all
nodes) will be unavailable for users on Thursday, December 9, 2004 from
8am due to system maintenance. We will send a new announcement when the
systems will back in production. Please do not log in before we are
finished - the systems will be rebooted several times during the PM and
you could loose your work.
|
|
12/01/04 08:35am | The AC unit has been fixed, all cluster nodes are back in production. |
|
11/29/04 06:58pm | Mon Nov 29 18:56:45 PST 2004
Due to AC failure in the server room we are emergency shutting down
half of the cluster until this is fixed tomorrow. Sorry for the
inconvenience. |
|
10/24/04 11:12am | The cluster frontend was emergency rebooted. All running jobs should not be affected. |
|
09/21/04 11:23am | Intel compilers (ifort/icc) version 8.1 have been installed on the cluster. See
CTBP software
page for more details. |
|
08/25/04 02:12pm | Newest, patched Intel compiler (version 8.0.046) has been installed on
the cluster. See CTBP software
page for more details. |
|
06/29/04 09:28am | Due to power outage network connectivity to the cluster and web/mail
server was lost last night and some nodes were rebooted. All systems
are up now. |
|
06/18/04 10:34am |
*************************
Scheduled downtime notice
*************************
ctbp1.ucsd.edu and ctbp2.ucsd.edu will be
down for a short system maintenance on Monday 6/21/04 from 2pm to
3pm. ctbp1.ucsd.edu will be switched from test to
production mode after this. All batch queue policies will apply, for
details please check SGE site
policies. |
|
06/04/04 08:45pm |
The upgraded and expanded CTBP cluster is now open for testing.
Please note that the cluster can be shut down anytime during the
testing period. We are expecting to go to production mode in 2-3 days.
During the upgrade we migrated cluster queuing system from OpenPBS to
GridEngine (SGE). All PBS scripts must be modified to run under
SGE. Please see SGE how-to
page for examples. During the testing period there are no job
limits enforced so please be considerate of other users. |
|
04/27/04 11:49am | Matlab version 6.5.1 has been installed. For more details please see
CTBP software
page. |
|
03/26/04 02:44pm | Amber version 8 has been installed on the cluster. Please see CTBP software page for more details. |
|
02/27/04 12:35pm | A brief campus-wide power surge at 11:45 today caused cluster reboot -
all jobs were killed.
|
|
01/25/04 05:48pm | New version of Intel C/C++/F77/F90 compilers (v8.0) has been installed on the cluster. For more information and usage please see Cluster HOW-TO. |
|
01/21/04 03:36pm | *********************************
Scheduled Downtime Notice
*********************************
All three main CTBP servers (ctbp, ctbp1 and ctbp2) will be down for
system maintenance on Friday 1/23/04 from 10am to approx 3pm. A Dell
technician is scheduled to update RAID BIOS during that time. None of
the servers will be accessible during the downtime and jobs running on
the cluster will be terminated. No incoming mail will be lost, it will
be queued by the UCSD mail system until ctbp.ucsd.edu is back up
again. |
|
12/26/03 11:02am | After a power failure in Urey Hall all systems are back on-line. All jobs running on the cluster at the time of the outage were purged. |
|
12/20/03 01:29pm | Failed hard drive has been replaced and the RAID array rebuilt. ctbp1.ucsd.edu is back online. |
|
12/19/03 06:32pm | RAID array on the cluster frontend (ctbp1) had a failure. It is being rebuilt now and frontend will be accessible again after this is finished. |
|
11/27/03 01:03pm | The cluster frontend (ctbp1) crashed and has been rebooted. |
|
11/25/03 12:01pm | There was a power failure in the server room which brought down the
main network switch. Although none of the computers crashed the
networking link was down which affected most of the running jobs on
the cluster. |
|
09/30/03 10:43am | The CTBP cluster has been opened for testing. There is only a limited support available at this moment but we are interested in hearing your comments or requests - please send them to ctbp-help @ ctbp.ucsd.edu. |
| |
Please direct any questions or comments related to this web page to
ctbp-help @ ctbp.ucsd.edu
Last modified: August 24 2007 12:57:48 pm.
|