Secure your data with a RAID or JBOD system

Introduction

THE RAID is often heard of, but what is it really? It is the acronym for “Redundant Array of Independent Disks” or “Redundant Ensemble of Independent Discs” in French. Or in more common language it’s a method to group several hard drives together to improve performance, both in size and data security.

For the record, originally in 1987 when it was created, the “Redundant Arrays of Inexpensive Disks” means in French “a redundant grouping of low-cost discs”… but since the price of hard drives has dropped a lot.
The R.A.I.D. allows you to build a virtual hard drive with several physical hard drives. The virtual hard drive thus create is called a cluster, you see the idea :-).

But there are several types that don’t all do the same thing, don’t all have the same breakdown tolerance, or the same disk space performance. And since it’s not complicated enough, there are systems that look like RAID but are not RAID. Like the JBOD used by Microsoft, we’ll come back to that too.
The different types are called RAID Levels, the most common are levels 0 to 6. Among them Level 0, 1 and 5 are standards.

Level 0: called striping
Level 1: called mirroring, shadowing or duplexing
Level 2: called striping with parity (obsolete)
Level 3: called disk array with bit-interleaved data
Level 4: called disk array with block-interleaved data
Level 5: called disk array with block-interleaved distributed parity
Level 6: called disk array with block-interleaved distributed parity

It is also possible to combine several RAID systems between them, for example RAID 0-5, so this system will be called RAID 05.

The RAID hardware VS the software RAID

Before seeing the detail of each level, it is also important to introduce the concept of RAID hardware, RAID software and RAID Pseudo-Material. Indeed, doing RAID is good, but knowing what method to use and why is better. Before you even ask yourself what level you want to use.

The hardware RAID

In this case, there is a dedicated RAID card which is a physical card in the same way as the graphics card or the network card for example. This card is called a RAID controller and is usually equipped with a specific processor and memory to be autonomous. So its location can be both in the TOWER of the PC and in the bay of hard drives. This mount makes the RAID disk visible to the operating system as a unit. The OS does not know what composes it, so does not need to know what the cluster is made of, but only the overall performance of the cluster.

Benefits

They allow the detection of defects and the hot replacement of discs (i.e. without stopping the machine).
In a configuration with lots of redundant discs, the system load is lightened.
Diagnostics and maintenance are performed in the background by the controller without using PC system resources.

Disadvantages

The incompatibility of the RAID card, if a card breaks down, it must be replaced by the same card with the same firmware installed on it. Otherwise, the data is lost because it is unreadable for another RAID card. The older a card is, the harder it will be to replace in the event of a breakdown.
Entry-level cards have a low-power processor, not competing with SOFTWARE RAID on a newer computer that has a powerful processor.
The price of 200 to 1000 euros for the RAID card
The difficulty of integrating the OS in place depending on the software provided by THE RAID card manufacturers.
RAID cards specialize for only one type of block device.

The pseudo-material RAID

Many motherboards have a built-in RAID controller that handles RAID 0 and 1 on IDE or SATA drives. BUT this is an abuse of language, in fact it is not really a hardware RAID but rather a disc controller with advanced functions. This is a software RAID with a shift of software routines that manages the RAID.

Benefits

It is a form of RAID software that, as integrated with the motherboard, can have access to the devices before the OS runs. And so the OS can be installed on a DEVICE in RAID.

Disadvantages

The limits associated with a software RAID are also those of this pseudo-material RAID.
Their BIOS features are limited and hardware defect management is not excellent.
For hardware RAID, in the event of a breakdown, there is a risk that you will not be able to change the card by an identical one. This risk is greater for this type where the motherboard must be replaced with an identical one with the same BIOS. So in some cases, without any outages, updating the BIOS can be a problem. This is all the more true since RAID controller manufacturers still have an interest in enuncing that their equipment is not a problem. In the case of a motherboard, RAID is a marketing option and not the main product that is sold.

The RAID software

In this type, RAID control is fully provided by a software layer of the operating system. This software layer works between the hardware driver layer and the OS file system layer. It is for this RAID that there are systems that resemble the RAID but are not RAID as the derivative of JBOD by Microsoft that it calls Storage Spaces since Windows 8. Currently most consumer OS manages THE RAID (the real one) via software. Whether it’s Windows from XP, Mac OS X and Linux. The system offering the most choice and possibilities in RAID software is linux.

Benefits

The price, in fact, there is no need for any additional equipment and many free solutions exist.
We have a lot of flexibility of administration with this method.
Compatibility over time. As this is not dependent hardware, any OS on which is installed the version of the software can read the RAID system. It’s easier to reinstall the same software on a new PC than to search and buy a 10-year-old RAID card.

Disadvantages

The method lies in the pilot layer of the devices that make up the RAID volume. This layer may be imperfect and lack some important features such as detecting hardware defects, the impossibility of changing hot discs.
RAID management uses system resources, a bit of processor and especially bus systems. This limitation is mainly felt when the file is transferred several times to THE RAID systems with redundancy.
The use of RAID on the system disk is not always possible.

RAID levels

Standard levels

Level 0: called striping

The principle of RAID 0 is that of aggregate volume by tape, i.e. each disk works in parallel and thus allows to increase the speed of work on the data. However, the discs must be the same physical size, as “bands” that have no equivalent on other discs will not be used by the system.

The main flaw of this method is that the loss of a single disk results in the loss of all data from each disk. So we have a significant gain in speed but no data security. The data is divided into bands of a fixed size. Concretely, if I place two 2T discs in RAID 0, I will have a 4T RAID drive that behaves in the same way as a physical 4To drive in case of failure, all 4T is lost, but in use the speed is multiplied by 2 thanks to the paral lelisassions.

Level 1: called mirroring, shadowing or duplexing

The RAID 1 is the perfect reverse of level 0. Security is maximum, but speed is low. Indeed, the capacity of the cluster is that of the smallest disk. Again it is advisable to use discs of the same size. And the data is copied to each disk. That is, with 4 discs in level 1, I will have the storage capacity of a single disc. But this disc will be duplicated in 4 identical copies. In this case, unless the cluster breaks down in its entires suddenly due to a surge or other, in the event of a disc failure, there is no data loss.

Level 5: called disk array with block-interleaved distributed parity

The RAID 5 can be seen as a merger of level 0 and level 1 to combine performance and safety. However, this level is only possible from 3 hard drives (preferably identical).

This is like level 0 of a band system, each identical band on each disk is linked together. Except that there, the tapes that contain data are in the number of N-1 discs with the last tape called parity. To simplify this notion to the max, you know that the smallest unit of data is the bit, the bit can only have two values, 0 or 1. In a system with 4 identical discs.

The first block of the first 3 discs will host data that follow each other (as level 0, it’s by tape to parallelize). On the last block, there will be the result of the sum of the first 3 bands. Let’s admit in a totally random way, the bands two 3 first discs are 1.1.0 . At the same location, on the 4th disc, there will be 2 (1-1-0).

So if the disk 1 breaks down, the system will know that the value that was on the first disk, first band is equal to x-1-0 – 2. And so will know that there was a 1 on the broken disk. It can then, in the event of a disk failure, regenerate it on a new disk from the other 3. Thus the data is parallelized into N-1 disk, and the system is tolerant to 1 disk failure. The storage capacity will therefore be N-1 discs.

If I have 4 2To discs each, I’ll have 6TO LOST RAID 5-2To storage that serve exclusively for parity. That’s two TSs are spread over the 4 discs.

Other levels of RAID

Level 2: called striping with parity (obsolete)

The RAID 2 combines Level 0 with writing an error control code (ECC code). Which today is directly integrated into hard drive controllers. The level of security is good but the performance is poor, hence its abandonment.

Level 3: called disk array with bit-interleaved data

The RAID 3, the system looks like level 5, except that instead of working in blocks it works bybytes. And here parity (remember the example of THE RAID 5, the 2AWAY parity spread over the 4 discs?) is stored only on a single disc.

Level 4: called disk array with block-interleaved data

The RAID 4, is similar to level 3 and level 5, therefore it is like level 3 with a dedicated and undistributed parity disk. But it works like level 5 per block. The consequence is that between level 3 and 4, level 4 has less synchronicity between discs. Because it works with larger data units, it is therefore more efficient. The level 4 schema is similar to the level 3 diagram above.

Level 6: called disk array with block-interleaved distributed parity

The RAID 6 is similar to level 5, but contains twice the parity spread over the discs. That is, instead of a parity band on level 5, here there are two. So the storage capacity is N-2 discs, versus N-1 discs for level 5. In return, here the tolerance to failure is two defective discs. Level 6 is therefore used when there are more discs, while level 5 is suitable for smaller clusters.

Combined levels

Before we finish the RAID levels, we’ll also see that it’s possible to combine multiple LARs together. In the example here, we will create a RAID 10, i.e. a RAID 1 – a RAID 0. That is, the discs are in level 1. But that there are several independent clusters and it is these clusters that are in level 0.
Now that you know the definition of each level of RAID, what does a RAID1-0 match? Let’s see if we have 4 2TO discs. We will first make two clusters of two discs. Each cluster is therefore in RAID 1, i.e. the discs are identical two to two. Then these two clusters are themselves linked together in RAID 0, i.e. linked by band. What’s the point of that? With this construction, when I add data, that data is distributed as level 0, per band, and so we become more efficient. But the RAID 0 has no security. This is where the RAID 1 linked to each “discs” of the RAID 0 cluster comes into play, it allows to duplicate each band of RAID 0 within a RAID in the cluster. Thus each data is presented in duplicate. The failure tolerance is 1 disc per bunch, or half of the discs in the worst case.
All LDIs can be combined on the same principle. The Cluster behaving like a single physical disc, you can put several clusters in RAID.

The alternative JBOD.

JBOD is a data organization method developed by Microsoft. The principle is to be able to assemble discs together in clusters, as raid will. In its initial version, it is not RAID because there is no question of writing simultaneously on several discs, but of writing the data afterwards. This is a simple record concatenation. Indeed, one could see this method as the opposite of partitioning a hard drive. In this implementation, you don’t have to have disks of the same size. And if one disc breaks down, the others remain legible. There is no duplication or parity. So we’re losing the data on the broken disk.

But then why will you tell me about it? Because since Windows 8 (for mainstream OS) there has been a new data management proposed by Microsoft and based on JBOD. This is Storage Spaces.
In this case, storage Spaces does not allow the discs to be used individually once they are assembled. You can also create a space that is much larger than the physical capacity you have because of its dynamic data allocation. The system simply tells the user to add a disk when the space becomes too limited.
The real improvement is that it is now possible to have the equivalent of a RAID 1 and a system close to the RAID 5. So if you’re still tracking, a mirrored data duplication and data parity redundancy. Note that for parity, the ratio is not the same as for the RAID 5. Indeed here the size of the storage makes about 2/3 of the total physical space. Which means that it evolves little by little, and that when you have 6 discs, it becomes a level 6 equivalent. Note that the system indicates when there is a problem with one of the discs to be able to change them in case of failures.

Data lost anyway?

Problems can always happen, and data loss for some reason is one of them. Even if you’re in RAID. Data recovery, you have different types: software or “physics”.

Recovery software, it must be said, is quite inefficient. Unlike labs that recover “physically.” I’m talking about “physical” in the sense that they need the storage medium so they can disassemble it and retrieve the data. And for this, we need specialized equipment, but also clean rooms, which are work areas where air quality, temperature and humidity are regulated in order to protect the media from contamination and damage.

One of the leaders in data recovery is Kroll Ontrack, which operates at 35 sites worldwide

They do data recovery on all media and operating systems: hard drive, server,… 24/7: 25 years of experience, 50,000 data recoveries a year worldwide, new unique tools continuously developed,…

Conclusion

In conclusion, the advantage of RAID is to reduce the risk of data loss in the event of outages. It improves performance by parallelizing disk access, it is not necessarily expensive to implement.

However, keep in mind that RAID is not a quick fix and will not protect you from fire, machine surge, nuclear bomb or other reasons why all your discs can break down simultaneously. (Neither even human error, if you delete a file, it is deleted for good as on a normal disk, and there, the restoration is even more complicated!) To make up for this, there is a RAID spread over the network that is possible, but this solution is reserved for professionals.

For your general culture, just know that it is to do RAID between different servers on the network, servers that are not necessarily in the same building, or even on the same continent in the most extreme cases.

The R.A.I.D. is easier to use in Linux which handles it perfectly online ordering, some software are also available. Windows can also handle it, but there since Windows 8, the simplest solution is the Storage Spaces which is native to the system and which allows the main common function of THE RAID. But then again, as with RAID, it does not protect against real problems such as fires for example.

So at our level, for really sensitive data for which no deletion is allowed, the best is to have an online storage space in addition, it is paid via subscription but not overpriced. Just be careful where you store them! Prefer storage that you have your hand in and so you can encrypt your data yourself. Where a storage space at a free association ISP. For this last point we will eventually come back in a future article for more details.

Introduction

The RAID hardware VS the software RAID

The hardware RAID

Benefits

Disadvantages

The pseudo-material RAID

Benefits

Disadvantages

The RAID software

Benefits

Disadvantages

RAID levels

Standard levels

Level 0: called striping

Level 1: called mirroring, shadowing or duplexing

Level 5: called disk array with block-interleaved distributed parity

Other levels of RAID

Level 2: called striping with parity (obsolete)

Level 3: called disk array with bit-interleaved data

Level 4: called disk array with block-interleaved data

Level 6: called disk array with block-interleaved distributed parity

Combined levels

The alternative JBOD.

Data lost anyway?

Conclusion

Leave a comment