Blogs

   
Michael Petrov
Co-Founder, CEO
6/15/2012
A blow for Storage centralization/ consolidation fans…

It may sound that I am against enterprise storage, but everything should have its limits and safeguards. It is petty that because of cases like the one I will reveal below, with such  nice architectural design and engineering comes such a devastating blow, throwing technology evolution under the bus. It hurts a lot, when for the last 5 years a 100% uptime statistic holds true, only to be knocked down to 99.9% following one major event.  Here is the time line of events today:

 

1:30pm – an email from Storage Vendor

Hello, this message is to communicate a potential issue on your Storage Array. * Just released- an ETA regarding a potential code issue on systems with iSCSI attached hosts.  The SP (storage processor) may reboot after 248 days of runtime.  A timer overflow condition within the operating system may occur after approximately 248 days of runtime.  If this occurs, iSCSI based network traffic can be interrupted.  Depending on the length and severity of the interruption, the storage may interpret it as non-responsive driver software and initiate a SP reboot to clear the issue. 

 

1:46pm - The storage system received too many events in this poll cycle to display here. Contact your service provider to determine if any storage system error events occurred that you should address.

 

Jun 13 13:45:21 2012 CS_PLATFORM:NaviEventMonitor:CRITICAL:4 CLARiiON event number 0x7403 APM000000000  SP N/A  SoftwareRev 7.31.25 (0.84)  BaseRev 05.31.000.5.502  Description Storage Processor (SP B) is faulted. See alerts for details.. ExtCode1=Error; ExtCode2=VNXSISPA.

Time Stamp 06/13/12 17:45:11 (GMT) Event Number 7404 Severity Error Host VNXSISPA Storage Array APM000000000 SP A Device N/A Description Standby Power Supply (Bus 0 Enclosure 0 SPS B) is faulted. See alerts for details.

Time Stamp 06/13/12 17:44:57 (GMT) Event Number 2580 Severity Error Host VNXSISPA Storage Array APM000000000 SP N/A Device N/A Description Storage Array Faulted Bus 0 Enclosure 0 : Faulted Bus 0 Enclosure 0 SPS B : Removed

Time Stamp 06/13/12 17:44:43 (GMT) Event Number a23 Severity Critical Error Host VNXSISPA Storage Array APM000000000 SPA Device SP B Description Peer SP Down.

Time Stamp 06/13/12 17:45:11 (GMT) Event Number 7403 Severity Error Host VNXSISPA Storage Array APM000000000 SP A Device N/A Description Storage Processor (SP B) is faulted. See alerts for details.

Time Stamp 06/13/12 18:04:40 (GMT) Event Number 71274001 Severity Warning Host VNXSISPA Storage Array APM00112300499 SP N/A Device N/A Description FLARE_CPU_WATCHDOG: Flare not rescheduling

00 00 0c 00 02 00 34 00 d3 04 00 00 01 40 27 a1 01 40 27 a1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 71 27 40 01 66 62 65 5f 73 68 69 6d

Time Stamp 06/13/12 18:26:34 (GMT) Event Number 2080 Severity Error Host VNXSISPB Storage Array APM000000000 SP N/A Device N/A

 

3:05 pm- FULL SYSTEM crash.

Down for 40 minutes, the storage failure affected: 4 node Oracle RAC, 3 Node SQL Cluster, 2 node Exchange cluster, and  8 nodes VMWare cluster running around 60 virtual

 

Now… If you are a CTO or IT director you may want to question these manufacturers, especially when they preach consolidation and centralization.

It is not that the technology is bad but… we still cannot trust it 100%. This is why we implement SANs in DR for our clients.

 

   

Replies

Leave a reply

Name (required)
Email (will not be published) (required)

Number from the image above
  
Latest blog posts
VNX Versions
11/10/2014
Subscribe to the blog by e-mail

Sign up to receive
Digital Edge blog by e-mail


Subscribe    Unsubscribe