Blogs

   
Michael Petrov
Co-Founder, CEO
1/12/2011
Complexity of cleaning up networks with SAN/NAS environments in case of virus outbreak.

Sometimes clients do get viruses. There are cases when regardless of all security and antivirus measures computers still get infected.

This case introduced here is interesting in multiple ways.

1.       The way the client got infected. We had a snow storm on the East Coast. People could not get to work so we started getting requests from management to open VPN connections to the environment for a lot of people. Before the snow storm we had it opened for management only and assumed that people on the VPN are clean and we were letting them into the secured perimeter over encrypted tunnel without any limitations. We did not consider massive VPN connections from home computers and got a user that brought an unknown “creature” to the network.

 

2.       Even protected servers got the infection because the virus managed to disable enterprise antivirus software and infect most of the network. This was a failure on the part of the antivirus software but I don’t want to mention the vendor name here. Unfortunately we didn’t have time to learn the behavior of the virus and figure out all the details of the infection and propagation.

 

3.       We enjoyed our segmentations of functional blocks so we got infections only on terminal servers and it didn’t spread to database servers, mail services and other heavy lifters. So the segmentation philosophy proved itself again. The way we think is that we should manage each functional segment separately versus practices of managing things centrally. Some admins like to manage one big firewall, one big database etc. Even though there are advantages of minimizing managing points, those architectures expose networks and systems to bigger failures. We do know that technology fail where you would not expect no matter how experienced you are. We all had cases when regardless of planning for failures, redundancy, backups, something still happens that nobody could imagine. So segmentation reduces the risk of global failures localizing problematic components.

 

4.       The client has a huge NAS running by EMC Celerra. More than 8 million files are hosted on one single share and we have 7 shares. When antivirus scans through those shares, it takes more than 5 hours. So infected users could plant viruses back on NAS to the areas that were scanned already.  So we were constantly chasing our tail. We also use the same Celerra as a native SAN powering SQL Server cluster, Oracle RAC and Exchange cluster. Those servers and volumes were not affected by the virus because of our segmentation policies.

 

 

5.       Microsoft response. We were actually very surprised by Microsoft help in this situation. It took much longer for the antivirus software vendor to react so we decided to contact Microsoft. After we opened a PPS ticket we got informed that virus outbreaks are Microsoft priority and any fees will be reimbursed back to us. We got an American support team (not offshore). Very knowledgeable person.  He told us why we are having such a problem cleaning up – apparently the virus infects DLLs and EXEs and encrypts them. So decryption of such files is a problem.  Some antivirus software doesn’t pick them up because of that. The virus is also polymorphic, so its signature changes from time to time. We have explained the way we found successful in preventing virus spread. The engineer told us that we are on the right path but gave us an official “Microsoft” way – a KB article - of preventing spreading like this.

Besides that Microsoft provided us with two tools. One is for online scanning, the other one for offline. Those tools are not auto-updatable, but IT groups can call Microsoft at any time and request fresh copy of those tools as they are updated by Microsoft team every 6 hours.

Online tool is to scan memory and file system of production servers. However if the server has a ROOTKIT then the tool could be useless as ROOTKITs hide viruses. To clean such server, the best way is to rebuild the whole server. Another way is to shut it down, reboot it using a CDROM disk built with the second – offline Microsoft tool and scan the server while OS on local disks is not loaded in the memory.

Another way to detect ROOTKITs is to share disk drives and scan shares remotely from a clean server. ROOTKIT hides viruses and itself because it is patching OS, change memory management and file system. When you are remote, it cannot hide and antivirus on the remote machine would detect infected files including rootkit itself.

We saw cases when remote scan detected infected file, but when we were on the infected server, we could not see this file at all as ROOTKIT was hiding it.

 

After successful blocking of spreading and offline scanning of servers one by one we could successfully clean up the whole environment. The most important for our client was the fact that they were not down a single minute as our architecture allowed us to keep the system afloat even though we had to do offline scanning on multiple boxes and viruses and virus scans made some production elements slower.

 

 

 

Lessons learned:

We learned that our segmentation philosophy worked great as we kept production of the client uninterrupted.

We also learned that we really need to limit user’s abilities on the network. The users of this client had too many rights and privileges on terminal servers.  

We decided on a plan to upgrade the client’s network topology helping preventing viruses from spreading, stealing passwords or data.

We developed a constructive plan to implement better AD policies, terminal services usage policies, VPN and other.

We also are in the process of rebuilding some elements of infrastructure using virtual private cloud elements. Virtual server architecture will allow us to rebuild servers from snapshots quickly rather than cleaning them up.  

   

Replies

gqbigc on 1/14/2013 1:45:05 AM

wjunye

 
Anonymous on 9/22/2011 10:30:12 PM

Hmm, sound like a real story..

 

Leave a reply

Name (required)
Email (will not be published) (required)

Number from the image above
  
Latest blog posts
VNX Versions
11/10/2014
Subscribe to the blog by e-mail

Sign up to receive
Digital Edge blog by e-mail


Subscribe    Unsubscribe