Friday, June 8, 2012

NTFS event ID 55 after disk extend on Server 2008

This isn't the first time I've ran into this, but hopefully it will be the last.  The first time was with a volume containing SQL data.  This time it was one with Exchange mailbox stores.  A little bit of thought upfront will go a long way in preventing this so hopefully the following helps.  (Chances are though, if you are reading this, you probably searched Google after it was already done, but hopefully it still helps and keeps you from jumping off the cliff!)

My Scenario:  An SBS 2008 server was running low on space.  It was running as a VM inside vSphere with the appropriate the licenses to grow disks without downtime.  It was 2:30AM on a Thursday and I was just trying to knock an item off my to do list before bed.

My Typical Steps:  I increased the disk size on the VM, then went into Windows, Disk Management, rescan disks, right clicked the volume and selected Extend.  Since it was 2008 (non R2), there is a bug that seems to set the size 1MB larger than possible, so adjust it down and then back up (if you have ever done it, you will know what I'm referring to) and finished.

The Problem:  After running the extend, I got an error message saying "Invalid parameter".  In Disk Management, the volume appeared to be resized.  However, I noticed that in Windows Explorer, the drive was not showing the additional space.

What I Did Next (not quite right):  In the past when I have seen that, I found that running "diskpart", selecting the appropriate volume, then running "extend filesystem" typically fixes Explorer.  I tried it and received an error (wish I would have done a screen capture on it) but it also said the command was successful.  Explorer still didn't show the free space, and then I noticed a Windows error reporting window stating that "store.exe" crashed:

I checked Event Viewer and found NTFS event ID 55's:

Since this was the Exchange volume, I then went into Exchange Management Console and noticed the mailbox store was down and when I tried to mount it I received an error:

Now panic sets in.  Corrupt disk/volume, database that will not mount, it was 3AM and I was fighting something that should have been point, click, done.  Backups are good, but this server is several hundreds of GB so a restore would take way to long and potentially lose hours of data.

What I Did Next (better):  Since it is a VM on VMware, I had the luxury of creating a quiesced snapshot before proceeding so I did that.  Next, I stopped all services that might be interacting with the volume in question.  In this case, it was Exchange only so I stopped those services (and the Microsoft Search service).  Next, I dropped to a command line and ran chkdsk (no /F yet) on the drive.  It found an error:

Attribute record (128, "") from file record segment 6)

Did a little research on this and it looked like doomsday based on what I read.  Most people said they had to format.

Since I had my snapshot (and backups as worst case), I went ahead and tried running "chkdsk E: /F", allowed it to dismount the volume, and it ran through and said it fixed the problem.  I was then able to restart all the services and the mailbox store mounted!  Mailboxes were available and mail started flowing again.  Yay!

Hindsight:  I might have just been able to restart the Information Store service but that wouldn't have fixed the NTFS corruption.  After the first time we went through this, our team documented a pretty failsafe procedure for extending disks like this on Windows Server 2008 and up that I failed to follow.  (Windows 2003 is a different beast but can be similar using diskpart and/or a tool called ExtPart.exe (32-bit only) if you need to extend a critical volume.)  Basically, for 2008 and up, we have found that the following seems to work well:  extend the disk in VMware, fresh reboot, stop the appropriate services in the guest, take a snapshot with VMware, then try to extend the disk using diskpart extend, then diskpart extend filesystem.  So far, we have never had an issue doing it that way.  I decided to be a bit of a renegade because of lack of sleep and I have had such good luck lately extending disks in 2008 R2 on the fly through the GUI that I didn't feel the need to abide by the safe method.  Err... I mean... I just needed something good to write about so I decided to do this on purpose... yeah... 

No more renegade warrior for me, at least, not tonight :)  Special thanks to JK for somehow sensing I was in trouble, waking up and replying to an email, and being a calm voice with several good ideas (and for not rubbing it in my face that I didn't follow the safe method).

Note: This really only seems to be an issue on 2008 (non R2) that I have seen.