====== Data recovery basics ======
===== Abstract =====
The fact that we could even recover data from a formatted Hard Drive is kind of amazing. You choose (or not) to delete data, but it is still there, even if it is not that easy to read.
The purpose of this paper is to understand how data recovery works. To do so, we explain the steps and some technics. We will go through how a sinister leading to a try for data recovery can occur. Then, we will see a method that could have been used by computer forensics experts in order to find traces. To finish, we will mention methods to definitely erase data from Hard Drives to avoid leaving traces.
The reasons for those sinisters are numerous, but human mistakes are majorly involved. Peter Gutmann proposed through his papers an explanation of how this data could still be read on Hard Drives at that time, but also some advices to securely erase data.
This paper is some kind of a warning to all those selling Hard Drives //via //eBay that once contained personal data.
Keywords: **recovery**; **data**; traces
===== Introduction =====
Data recovery consists of finding lost digital information. This loss may have various causes such as human mistake or material failure. The difficulty of the recovery of this data varies, but it may be a challenge.
What is at stake depends on the case. If we are dealing with a private person, the reasons for the willing of retrieving these data are mainly personal attachment to the files (holiday photos and videos). However, if we are talking of a company, then this seems much more important since data loss is the second threat to cloud security and 60% of companies that lose their data shut down within 6 months of this disaster.
In case of an accidentally deleted file, this file is not really removed from the guilty computer. What is really deleted is the path useful to grant access to the file. The space is now available again to re-write on it. But the data corresponding to this file is still stored, until something is stored at the same place. This means that you may be able to restore this data, even months later if no other file overwrote it.
===== =====
===== 1. What to do to avoid such a loss =====
**i) Dropped/damaged drive**
It is highly recommended not to turn on a drive after a sinister. It might actually damage it even more.
**ii) Data saving**
Now that we know what we may go through to get back some major data, what could we do to avoid such a loss?
Obviously, we need to always keep a copy of the most important data at least. For example, we can choose online secure copies. For example, the society Zodiac Aero Duct Systems chose to save all its files online every night to avoid a major loss. All their computers run during the night to prevent from loss or encryption of their data by pirates. Since this task is repetitive and tiring, its automatization will help.
This is not the only option. We can choose to go for hard drive copies for example. However, this option may be less adapted for companies.
The more regularly you will register your files, the safer you will be. Evidently, if you choose to save it every 3 years, if a crash or a cyberattack occurs, you will loose much more data than if you save it every week.
**iii) Machine protection**
The using of high-quality wires to power machines, and voltage inverters or surge protection systems might help to prevent from electrical shocks leading to machine damages. There is no need to remind that, in order to avoid piracy, the use an anti-virus software is primary. A regular change of password may also help.
**iv) Maintenance**
It is true that backing up computer data is essential in any business. However, its benefits are limited if true IT maintenance is not implemented. Indeed, operating systems which are not maintained, which are no longer up to date: obsolete software, outdated antivirus… represent that many entry points for viruses and hackers. Thus, some companies propose to do IT maintenance for others.
===== 2. Steps for data recovery =====
We can divide the job that must be done in four parts, no matter the storage media we are working on.
**i) Storage media repairing**
That step is actually divided in 2 parts, which are the initial evaluation and the proper repairing, which is pretty much expensive because it is based on hardware solutions This is generally made by professionals. The reasons of the damage can really help (up to this step) in order to determine which damages may have occurred. It is made with important precautions to avoid dust and/or further damage, such as working under a flow hood to create a class 100 environment. Some dust may impeach proper reading by the heads (or even damage them), even if it is not visible with a human eye. To keep highest chances of success, a proper environment of work is necessary.
Some things must be known. First, even if it is made by professionals, this step may fail. Plus, some errors require more than a simple change of the damaged piece. For example, if the PCB is the only piece with a problem, changing it will not help the owner. Actually, for about 10 years, a unique adaptive data has been stored in manufactured controller boards that has to match up with the one on the platter.
**ii) Bit by bit copying of the support**
This needs a hardware solution (Deepspar disk imager or similar) or a software one (dd_rescue for example). Some options might be more useful than others depending on the storage medium. The purpose of this step is to work on a copy of the medium so that we use it as less as possible to avoid a complete destruction of our data.
**iii) The actual data recovery**
To recover these files, data recovery tools are used, this is a software approach. These are the most sold solutions online. It can either only repair partitions or completely repair the lost data. This kind of tool permits to locate recoverable data by browsing the disk you erased data from. Then this tool pieces it all back together, no matter its extension (.jpg, .zip…) and the storage media (SD card, USB disk…). This software may even recreate the original structure of the different folders.
It seems useless to remind that a perfect software for this usage does not exist. If the data we are searching for was overwritten too many times or compromised we will most probably not recover it.
**iv) Corrupt or unusable files repairing**
The last phase is done either by hand or using software. This is a very important step for computer forensics scientists searching for traces. We can differentiate two types of failures: software and material failure. In the first case, we obviously need a software solution and in the second, a material one. Material failures often allow partial data recovery, but it might end in a storage media destruction. Software data recovery solutions well done do not alter the medium. This is the reason why the previous diagnostic is essential, in order to know if the problem was caused by a damaged material, or by the software itself if a human mistake is not to blame. A software approach on a material failure may make any recovery attempt useless and, moreover, make it impossible afterwards.
===== 3. Recovering data to find traces =====
**i) On a magnetic media**
Magnetic force microscopy (MFM) is a method using a sharp magnetic end closely placed to the plate that we need to copy. Its magnetic field interacts with the one of the pattern. Moving the “head” on the drive and measuring the interaction force will lead to an image of the data present on this disk.
Magnetic force scanning tunneling microscopy (STM) uses a probe tip made of nickel in a probe at a small bias potential. This allows the electrons of the tested surface to go through the pointy end. The whole surface is tested and the probe adjusts itself to keep the same current. The image is generated like in the previous method.
The pros of the MFM method are that it does not require the removing of the protecting part of the disk, and the results can take few time to get. With the most expensive ones, automation may be possible. Plus, a whole disk image could theoretically be captured.
According to Gutman’s paper, several thousands of SPM were used back in 2009. Just like with the MFM, the recovery process can be automatized, with a PC as a controller.
When data is stored on a magnetic disk, the head sets the polarity of the magnetic domains. However, since most, but not all of them, are set, writing a 1 over a 1 does not have the exact same effect of writing a 1 over a 0. This phenomenon is due to the fact that the head does not interact with the pattern at the exact same place as before. It makes the recovery of 1 or 2 erased layers of data possible with the assistance of a high-quality digital sampling oscilloscope. A software will then create an ideal signal and subtract it to the real one. Thus, those “1.05” and “0.95” will become “0.05” and “-0.05” which makes the previous layer of data readable. With more recent Hard Drives, this method is outdated.
**ii) Recovering data from RAM**
No, data is not immediately unreadable after powering off RAM. Both SRAM and DRAM leave traces of data that was stored inside after the user shuts it down. Plus, it is possible to increase the time during which data will remain stored in the system up to -60°C.
**a. SRAM **
Stocking the same data for quite a while alters the preferred power-up state. It may actually change to the state it was in when SRAM is powered off. This might even last for days in case of oldest systems.
**b. DRAM **
The capacitance in a DRAM cell obviously discharges with time. However, the capacitor dielectric stressed by the electric field applies some little changes to the oxide. Due to this, a threshold shift may occur. Thus, we will observe a sodium deposit on the negative pole. The effect of the stress can be measured by the cell itself.
===== 4. How to securely erase data =====
**i) From Hard Drives in general**
Different options are available to erase data. As seen before, we can use quick and normal format. The difference is that random files on the disk are still readable, because if quick format is used, only metadata is erased.
To avoid this, it is recommended to use normal format or zero-fill. It consists of re-writing every data with zeros. As mentioned before, this is safe enough for average user, but data may still be reachable for a skilled and determined attacker.
This is the reason why the US government standard recommends replacing every bit with zeros, then with ones and finally with random characters. Another solution is to use zero-fill many times on the same support.
The most efficient solution remains to physically destroy the support, data will thus be definitely unreachable.
**ii) Gutmann's method for magnetic disks**
Through his paper, Gutmann proposed a solution to counter the method that permitted to read data on a magnetic disk. It is a way to degauss the drive. The purpose of this is to saturate the disk to the greatest depth possible to erase all traces of data that was once stored. However, highest frequencies only scratch the surface of the pattern. Thus, we need to use the lowest frequency possible. And, since producers try to increase the storage on hard drives, the frequencies used in drives are higher and higher.
Because of methods of encoding, used to make sure the head does not lose the trace of where it is, it is just not possible to overwrite everything with zeros, then with ones as many times as possible. The RLL (Run-length limited) code permits to avoids analog signal peaks overlapping. Plus, it defines a certain maximum number of consecutive zeros. Without this, synchronization could be difficult.
The first RLL code is modified FM, in which every user data bit is followed by a clock bit. It is called (1, 3) RLL because the maximum time between 1 bits is three zero bits, and one 0 bit is always present. Intersymbol interference is mostly avoided but this lowers the capacity of drives and it led to (2, 7) then (1, 7) RLL.
Apart from those 3 types of code, a last one is aimed by Peter Gutmann’s method to erase data. It is Partial-Response Maximum-Likelihood encoding. It uses digital filtering technics to shape the read signal to exhibit desired frequency. It also uses timing characteristics followed by maximum-likelihood digital data detection to determine the most likely sequence of data bits that was written on the disk. The only thing doable there is to overwrite random patterns because “the processing inside the drive is too complex to second- guess”.
The result of all this is a 35 passes table that Peter Gutmann recommends overwriting on the disk to erase visible under this paragraph. No matter the code used so far, the original data should not be reachable. To increase the strength of this method, it is possible to use a random order for the passes. The disk eraser can be improved by adding random passes before and after the erasing process. However, Peter Gutmann himself agrees to say that this method is outdated according to the functioning of today’s drives.
**iii) From RAM**
First, it is possible to lower the time for data to be stored in RAM by heating the system up to 40°C.
Rewriting many times on the support is not as efficient as for magnetic disks. The oxide deposit would just be a bit strengthened or weakened according to what is stored. The longer new data is stored in RAM, the harder it will get to catch the ancient data.
According to Peter Gutmann, the oxide should be exposed to a lot of stress, with the highest temperatures and the longest time possible to expect the best erasing of data possible. However, it may damage the RAM and lower its lifespan.
The solution he advises us to choose in order to avoid the problem of DRAM data retention is to constantly flip the bits in so that a memory cell never holds a charge for too long. Thus, it would not be possible to read it after shutting down the RAM. It is somehow impossible to implement on the whole RAM but should be used for very sensitive data such as encryption keys.
===== 5. Study of a real case data recovery on a Hard Drive =====
The Hard Drive we are dealing with made some weird noises after its owner dropped it. In this example, the cause is human behavior, just like in most cases. Because of the noise, the owner turned his good off immediately and sent it to a specialized firm. It was in the inside of an enclosure. We can remark that his behavior is not perfect. In fact, he should not have powered his Hard Drive after he dropped it. The main reason for this is that the distance between the heads and the platters in these objects is infinitesimally. Thus, dropping it can strike out the disks with the heads. Sometimes, the spindle hub bearing is damaged. The chainrings will not be able to turn anymore. In addition, you may also have a problem where the heads slam into the trays and damage them. In fact, they become mangled on the ends.
Since the Hard Drive we are dealing with is a Hard Drive with large capacity, the heads are even closer to the disks to allow more storage on the same size of platters. In this case, the distance we are talking about is around 5 nanometers (which is 12000 times thinner than a human hair!).
To make this data recovery possible, the expert used parts drive. These parts must match with the original Hard Drive. It is highly recommended to use parts from the same site and with a similar serial number. Obviously, if you do not have the required material, the recovery chances are lower.
**i)** The first step for scientists is the initial evaluation. It consists of opening the enclosure and looking inside to find the problem. As mentioned, we do not want to power the drive to avoid damaging the disks even more. However, the problem could also be very different. For example, the motor could be destroyed and impeach platters to rotate. You can also have an issue where the heads come apart and seize themselves to the platter surface. When that occurs, a sound can be heard. The same thing happens when the spindle bearing is damaged. Due to the drop, the heads could have come apart and slammed into the edge of the platter.
In this case, the drop caused damage to the heads by slamming it to the edge of the platter. A photo of the most damage head can be seen below.
After detecting this, it is necessary to check the alignment of the tracks because the owner turned on the Hard Drive after the drop. If the rotation is warped, the tracks will be unreadable because the heads will jump from one track to another. Multiple sets of heads are necessary in those cases. The efficiency of the recovery depends on the damage the heads made to the platters when the Hard Drive was powered.
Once the customer agreed for data recovery and is aware of the price the recovery will cost, we can move to the next step, which is the actual repairing of the drive. First, it is necessary to remove every damaged piece, and inspect the whole object to make sure every debris have been successfully removed.
Then the disassembly takes place before replacing the damaged components. The pieces used for one recovery can be reused for another one later. During the replacement, the highest precaution is recommended.
**ii)** Once the reassembly done, the expert moved to the next step which consists of copying the repaired Hard Drive. In that case it did not work on the first try because of the platters that were damaged when the drive was turned on after the drop. But, once the working heads are in place, the software initializes the logical block addressing, which is basically reading how the sectors are laid out on the Hard Drive.
Then, it locates the partitions and gives the number of sectors in it. The thing is, in this type of cases, the bad sectors may lead to an automatically rebooting of the drive, which happened here. For that reason, the expert skipped some sectors first to copy the part that was not damaged first. After another change of heads, the copy went successful and the rest of the operation is only putting the data on another Hard Drive.
===== Conclusion =====
Digital information can be lost and/or hidden. Since a sinister can have important influence on companies for example and finding deleted/hidden data can change judiciary cases, it is necessary to find solutions to recover it.
The operation is delicate, but it is possible to recover data from corrupted hard drives and even from RAM if it was stored for enough time. But such a loss can be avoided by some useful tips that most people know, even if everyone does not use them.
However, an important matter remained, would it be possible to really erase data from a storage media? Obviously, no one would like to know that their ID scan is possessed by some evil random guy. And yes, it exists some methods to make it truly hard to recover.
===== Sources =====
[1] https://www.stellarinfo.com/blog/the-basics-of-computer-data-recovery/
[2] https://www.r-studio.com/file-recovery-basics.html
[3] http://www.recovermyfiles.com/data-recovery-help/fundamentals.php
[4] https://www.fer.unizg.hr/_download/repository/ComFor-File_system_forensics-v10-pp-slides.pdf
[5] https://www.toptenreviews.com/how-does-data-recovery-software-work
[6] https://vimeo.com/140252745?width=853&height=480
[7] https://pdfs.semanticscholar.org/cf95/bc92261ea1b767db3c231fa57b508fc2716a.pdf
[8] https://web.archive.org/web/20070221201213/http://www.cypherpunks.to/~peter/usenix01.pdf