cross-posted from: https://beehaw.org/post/24650125
Because nothing says “fun” quite like having to restore a RAID that just saw 140TB fail.
Western Digital this week outlined its near-term and mid-term plans to increase hard drive capacities to around 60TB and beyond with optimizations that significantly increase HDD performance for the AI and cloud era. In addition, the company outlined its longer-term vision for hard disk drives’ evolution that includes a new laser technology for heat-assisted magnetic recording (HAMR), new platters with higher areal density, and HDD assemblies with up to 14 platters. As a result, WD will be able to offer drives beyond 140 TB in the 2030s.
Western Digital plans to volume produce its inaugural commercial hard drives featuring HAMR technology next year, with capacities rising from 40TB (CMR) or 44TB (SMR) in late 2026, with production ramping in 2027. These drives will use the company’s proven 11-platter platform with high-density media as well as HAMR heads with edge-emitting lasers that heat iron-platinum alloy (FePt) on top of platters to its Curie temperature — the point at which its magnetic properties change — and reducing its magnetic coercivity before writing data.



Holy fuck can you imagine how long it would take to re-stripe a failed drive in a z2 array 😭
My Z2 had à drive failure recently, with 4To drives. Took me almost 3 days to re-silver the array 😅. fortunately had a hot spare setup, so it started as soon as it failed, but now a second drive is showing signs of failing soon, so I had to pay the AI tax (168€) to get one asap (arriving Monday), as well as a second one, cheaper (around 120€), but which won’t arrive until the end of April.
Not a clue. Care to eli5?
When you are running a server just to store files (a NAS) you generally set it up so multiple physical hard disks are joined together into an array so if one fails, none of the data is lost. You can replace a failed drive by taking it out and putting in a new working drive and then the system has to copy all of the data over from the other drives. This process can take many hours to run even with the 10-20 TB drive you get today, so doing the same thing with 140 TB drive would take days.
Thanks! So, why does it matter? It’s a server, you can have it to do the job unattended. Or does it affect other services and you’re unable to use anything else before it finishes?
It will take a long time and while it runs it will use a lot of resources so the server can be bogged down. It is also a dangerous time for a NAS, because if you have a drive down, and another drive dies, the whole pool can collapse. The process involves reading every bit on every drive, so it does put strain on everything.
Some people will go out of their way to buy drives from different manufacturing batches so if one batch has a problem, not all of their drives will fail.
The way striping works (at an eli5 level) is you have a bunch of drives and one is a check for everything else. So let’s say you have four 10tb drives. Three would be data and one would be the check, so you get 30tb of usable space.
In reality you don’t have a single drive working as a check, instead you spread the checks across all of the drives, if you map it out with “d” being data and “c” being check it looks like this: dddc ddcd dcdd cddd
This way each drive has the same number of checks on it, and also why we call it striping.
@SmoothLiquidation @Telorand They also claim up to 8x speed improvements with HAMR. Obviously that remains to be seen, but if they could roughly match capacity improvements, that would keep restriping in the same ballpark.