Network downtime
Published: 5-20-2011 7:40 am
Greetings all,
On Wednesday afternoon of this week we received a phone call from HP
indicating that it was seeing some troublesome error messages being reported on
our primary storage unit that holds software and data for a great number of
servers on campus. HP's concern was that if the problem was not repaired, it
could lead to loss of data. Unfortunately, the repair process itself could also
potentially lead to loss of data, not a good conundrum to be in.
First and foremost, rest assured that all of our campus data is backed up
to tape and housed outside of the Campus Services building in the event of an
emergency, so if the system were to fail, we do have tape backups of
everything. Tape, however, takes quite some time to restore and the recovery
process would be rather drawn out, so we wanted to have a much faster recovery
plan in place, should we have a data loss situation.
We have a second storage system on campus that is not yet fully sized to be
a full replica of the first machine, but HP has loaned us enough drives and we
have been busily copying all of the content from the first storage unit to the
second one. Most of our campus servers now have up to date data on both storage
systems. We will receive more drives this morning and sometime late this
afternoon/evening we hope to have the remaining servers duplicated on both
units.
As soon as that process is complete we're going to need to take nearly all
of the campus servers down and attempt the repairs on the first unit. If those
repairs are successful and quick we should be able to restore service fairly
rapidly. If those repairs are not so successful, we will reconfigure the
servers to use the second storage unit and bring things up as quickly as we can,
though it may take several hours to accomplish this.
The net result is that some time this evening or tomorrow morning we will
be taking nearly the entire network down to enact these repairs. We don't know
exactly when yet, it will depend on how quickly things finish copying over this
afternoon. We want to start as soon as we can to ensure that we have no
negative impact on classes next week or freshman registration.
If you had planned on working over the weekend, please e-mail me with phone
contact information and we'll make an effort to call you prior to taking the
systems down. We hope to be done with this process Saturday afternoon at the
latest, but at this point we have very few guarantees, other than the safety of
our data, which we are very confident of, due to the (now) multiple backups we
have in place.
Sorry for the long winded e-mail, but we wanted you to be briefed on the
situation as accurately as we could.
Thanks everyone for your patience and when we have service restored we'll
send out another notice.
JD