Lightning never strikes twice in the same place, unless its where IT is concerned. In my experience, if you have a single copy of data with multiple points of failure, it is pretty much Sods Law that all those points will be effected at pretty much the same time, resulting in the loss of that data. I think that it is safe to say that many people also still see RAID arrays as an alternative to backups...
Over the last weekend, we have been working hard to recover data from a failed NAS system for one of our customers. We are now reaching the point where we have to advise the customer that the data may well be lost, apart from spending potentially thousands of pounds on a very advanced physical recovery and still with a sliver of a chance of recovery.
The data was archived data which had been stored away on a NetGear NAS system in the configuration of a 2-drive X-RAID. The drive was on the customer's site when we began managing some of their systems but wasn't (and still isn't) one of the devices that we supported at the time. Unfortunately, just as we were onboarding the systems that we were managing, the RAID in the NAS drive failed. I checked the status of the drives at the request of the customer. I told them that Disk 1 had failed and that the RAID was in a degraded state and the disk needed to be replaced. as soon as possible. Again, unfortunately for the client, before they could arrange to replace the HDD, the NAS drive itself failed, with no power to the unit at all. Lightning has struck twice in the same place.
At first, this was a non-critical issue since the archived data wasn't being used, until just before last week that is when an old customer of theirs requested some support and their data was of course on the now broken RAID. It never rains and then it pours!
We then happily took on the task of attempting a data recovery but first had to identify just how the NetGear NAS system built its arrays since most systems that use a proprietary RAID format are unique. We found that NetGear units complicate matters a little further by having multiple versions of the operating system, along with varying file system formats depending on which one is used. One uses ext2 and the other BTRFS (and you cannot simply migrate an older NAS system's drives to a new system running a newer OS by simply sliding the disks in). Then there's the method on how it builds the RAID, using LVM or mdadm. Add to this the additional complexity of the proprietary X-RAID and how that writes data to the disk, it becomes a whole host of fun.
After extensive research, we found that the parity data was written to the 2nd disk in the array and that the data itself is only written to the 1st disk. We then started to investigate the state of the failed disk and NAS system itself. The NAS system was simple. No power to the device and therefore couldn't switch it on so we move along to the drive. The drive was making a clicking noise which usually indicates a physical failure of some sort. We opened the top of the disk to check the condition of the platters. The image at the top of this article is the exact drive that we were looking at. I sent the image to a few data recovery companies specialising in physical advanced recoveries who simply came back with not possible.
Which is a shame since we were able to take a back up image of the second drive without too much issue using a mixture of Linux commands, such as ddrescue, Windows applications and macOS with homebrew installed (to add some much needed missing commands). So, without the primary drive we were out of luck and now need to go down another avenue to see if this yields any better results (old back up tapes).
One of the lessons learned here is that RAID systems are not backups and never will be, especially a 2-disk array which isn't even in a standard RAID 1 configuration. Physical failures with a RAID system can still cause data loss, such as multiple disk drive failures at the same time (remember most RAID systems write data to all of the drives at the same time and the RAID systems are usually populated with all the disks at once) back-planes, RAID controllers, power failure etc.
The biggest lesson learned here and the one that title refers to is to back up the data elsewhere, whether you think you need it or not. If you haven't deleted it then there's a reason as to why not so it, therefore, must have some value to you. Don't rely on the RAID system to be your back up and back up the data to either another physical drive which is taken offsite or better still, if your bandwidth allows, back up to the cloud.
Journey IT Services can provide a very cost-effective cloud back up strategy which also provides fully encrypted data in the cloud to protect the data while offsite. If you would like to find out more about how you can avoid data loss then get in touch where we will be more than happy to help.
コメント