Network outage this morning
Published: 3-20-2014 8:50 am
As I'm sure all of you are aware, we had a severe network outage last night
that impacted pretty much every technology service we have. While we are still
piecing together what happened, what we know is this:
1) The first symptoms that we've been able to track down occurred shortly
after 2AM this morning.
2) The primary core switch/router that connects servers to the rest of
campus had one of it's two power supplies without power when we arrived at 7AM
3) One of the groups of ports on that switch appeared to be powered
4) Our notification system that is supposed to send us text messages in
the event of a failure didn't do so.
We think we have the system restored at this point. If folks in your area
are still experiencing issues, please have them reboot their computer (if they
haven't tried that since 8:30AM) and then call the help desk if they are still
From what we can see, this appears to have been a cascading failure - a
perfect storm if you will. The power supply that was off in the primary switch
has a redundant unit that should handle the load and for some reason appears not
to have. The primary switch has a secondary switch that is supposed to take
over seamlessly and that didn't happen either. The notification system, which
is designed to be largely isolated from other systems so that it functions even
during a severe issue either had an independent problem at exactly the wrong
time or was impacted by the big failure when it shouldn't have been.
Rest assured that we'll be pouring through logs from many systems and
engaging in forensic exercises to figure out exactly what happened so that we
can design the system to be more resilient.
Thank you for your patience and we're sorry for the interruption to your
In other news, Happy Vernal Equinox!