When it comes to protecting our photography data we all know we should do it, but for a lot of us, we don’t know how to protect our data from loss. Should we use a RAID 1 or a RAID 6, striping or mirroring, NAS or DAS? I use KISS, or Keep It Simple Stupid.
Of course, each system or approach has its advantages and disadvantages and not all approaches are appropriate for everyone. The amount of risk you can withstand of losing data, along with the cost in time and money, along with the ease or difficulty of protecting your data should all be considered when implementing a data protection plan. It can become complicated and overwhelming, but for the majority of us, it doesn't have to be.
For the majority of photographers, a simple approach that is easy to understand and to follow is the best approach. After all the best data protection plans in the world are worthless if we don’t use them. That’s why I’ve developed my own approach, for not only my photo data but also my business data that minimizes the risk of loss while keeping it simple to follow. While no system or approach is 100 percent perfect, I believe my approach reduces the risk from viruses, complete data loss from theft or fire, and file corruption while also minimizing complexity and cost.
The approach I’m advocating requires a couple of hard drives, an off-site location, and a little discipline. I admit it does not eliminate all risk of the data loss but I believe it minimizes it to a reasonable level for most photographers.
This is how I protect my data. First I download all my photos from the camera to two separate hard drives. I like to use Photo Mechanic for this as I can rename my files and apply any ITCP data I need to the files. So now I have two copies of my data which provides redundancy, so if a hard drive fails I have another drive with the same data. One of the two drives is automatically backed up at regular intervals by a third hard drive which is running Apple’s Time Machine. The purpose of this third hard drive is to provide a version-controlled backup, which helps to protect against deleting a file or a file becoming corrupt. Remember I’m also backing up business documents. You may not have the latest version but at least you have a version to work from.
The next step addresses data protection by isolation. By isolation I mean from being connected to my computer and network and also from my physical location. This step addresses two concerns. First is the risk of the data becoming corrupt due to a computer issue or from being held hostage by a ransomware. The second is protection from physical theft, fire, water damage, or a spilled beverage. I handle this by rotating two additional hard drives between my office and my credit union’s safe deposit box. The frequency of this step can be adjusted to suit one’s needs but I do this step once a month on the 15th and I put a reminder in my calendar so I don’t forget. One of these drives is always in the safe deposit box at the credit union and the other is at my office. On the 15th of the month I take the hard drive that is at the office and using Carbon Copy Cloner I do a complete copy of one of the two hard drives I use on a daily basis. Once this is completed, I take the hard drive that was just copied to to the credit union. I place this hard drive in the safe deposit box and take the hard drive that was in the safe deposit box back to the office for the next month’s backup. I also grab a Tootsie-Pop that my credit union leaves out for their customers. Now I have a copy of my data that is never more than a month old so that if I lost all the other hard drives I still have one drive with the vast majority of my data that is safe at the credit union. Of course, if you want you could add another drive to the rotation or rotate on a weekly basis.
A final step I occasionally use for short-term data that I’m working on is cloud-based storage. I only use this for specific items, as I still don’t believe cloud-based storage is ideal for long-term storage due to the time to upload and download. This method is getting better but for me, I’d rather stick with my multiple hard drive and location method.
Remember data protection is not a one-size-fits-all solution, so what solutions have you devised for data protection and just importantly what level of risk are you willing to take with your data?
This sounds like the polar opposite of "keeping it simple, stupid".
It is about as simple as it gets while still technically being "backup".
I use a similar approach; dump images into LR which puts them onto an external HDD that is then backed up to a separate external HDD, both HDDs are auto backed up to a cloud service.
I guess I'm thinking of "simple" from the standpoint of the amount of effort required on the part of the user. Having a RAID array and a service like Backblaze seems way simpler than physically moving multiple external harddrives around, occasionally cloning them, and swapping them in and out of a safety depot box.
I will agree that from a technical standpoint, this solution is, indeed, much simpler than setting up such a solution.
Mr. Jin, I'm not sure why you think this is the "polar opposite" of KISS. You know what is the "polar opposite" of KISS?: the experience of some users spending several days/weeks back and forth with Drobo support trying to resurrect a failed "RAID" array. And backing up to the cloud takes forever and I do not want to expose my sensitive data to a third party. I would rather have multiple copies of my data on several drives that are rotated to a secure off site location. The only difference with Doug's method is I use chronosync to backup and secure location is family home 25 min drive away. very easy like brushing your teeth.
I'm of the belief that the more automated a solution is, the more foolproof it is because the biggest hurdle we face when it comes to backups is our own laziness. Sure, it's easy to talk about manually connecting and disconnecting harddrives, swapping them back and forth from off-site locations such as a family member's home, but how many of us actually do this diligently? How many people start with such a system and then get complacent about it after a few weeks or a few months?
Even if you did manage to stick to doing this, how many man hours are spent applying such a solution vs. an automated system that is constantly working in the background?
As far as RAID arrays go, Drobo is pretty much a low-end solution and as such, I am not surprised that you or others have encountered problems with it. No solution is completely reliable, but this is one of those situations where the amount of money you invest will actually mitigate the odds of failure. This means not only buying a good enclosure, but also spending the money on quality harddrives as well.
Backing up to the cloud does indeed take a long time, but that's really just the initial upload that takes long. Once you've gotten past that first bundle of data, the bandwidth consumption shouldn't be that much of an issue unless you're either shooting thousands of new images daily or creating tons of video content.
In my mind, the more times a system requires you to actively manage it, the less simple it actually is and the more points of failure there are because as I said, the #1 challenge to any back-up system is human complacency.
The key to any effective backup solution is redundancy and automation. How much or how little is a personal decision, but I think it's important that we be honest with ourselves about how much personal effort we're realistically going to be willing to exert and how disciplined we actually are. A 25 minute drive to drop off/pick up a hard drive every week or every month might be something you're willing to do, but I feel pretty safe in saying that the vast majority wouldn't put in such an effort, which is why I said that I don't think that this solution is actually "simple" in practice even though it is technically simple and perfectly logical.
Michael, I get this isn't for everyone and as I mention in the article everyone needs to find a system that works for them. As far as the time involved, for me it is actually a very small amount of time. First I'm lucky that my credit union is right on the way to town so I pass it almost everyday if not several times a day. I know this might not be the case for others. As far as swapping out drives and cables, there really is much of that either. I plug in the rotated backup drive, hit Carbon Copy Cloner, and come back in about an hour. Very little of my time is actually spent dedicated to this system. Again it is something that works for me and may not be suitable for others.
Thank you for reading the article and providing your response. We can learn from what people post here.
I don't use Drobo but it came to mind as Fstoppers not only listed it as gear they used for weddings (https://fstoppers.com/wedding-gear-software-and-tools-we-use) but also wrote articles about (https://fstoppers.com/gear/editorial-hard-drives-what-when-and-where-buy...)
I avoid big capacity RAID systems after hearing about failed rebuilds due to large hard drive capacity.
A video by Joey L shows he employs a similar system Doug uses that relies on multiple hard drives.
I also have relatives a lot closer but do not want to have my backups in the same area due to a natural disaster in 2012 that affected a large party of my city.
I am also leery of offsite backups: remember the mad scramble to download images when Digital Railroad abruptly shut down?
I also have about 80-90 tb of RAW images & 4k video that are backed up on multiple drives and I am not sure how many years that would take to upload. I also shoot anywhere from 15-50gb a day.
"I'm of the belief that the more automated a solution is, the more foolproof it is because the biggest hurdle we face when it comes to backups is our own laziness."
Most pros I know are very serious about backing up their images offsite.
There are different types of RAID systems with different levels of redundancy. Some confusion also comes from the fact that one RAID configuration (RAID 0) actually offers no redundancy at all in exchange for increased performance. Of course the more redundancy you build into any system, the more it will cost since you're theoretically getting less storage out of each hard drive that you purchase.
Large hard drive capacity by itself isn't really an issue as long as it's part of a RAID array with an appropriate level of redundancy since the whole point of the array is to ensure that the failure of any one hard drive in the system regardless of how large it is will not result in data loss. I personally use 10TB hard drives and even if one of them went down, I wouldn't lose any data at all. Now if for some reason, two drives went down at the same time, I'd be boned because I don't have my system configured to cover that, but if you're really paranoid, you can configure yours differently.
There is always a risk with cloud storage that the company might go out of business. One way to mitigate this risk is by looking at their plans. If something sounds too good to be true and you're wondering how the hell they're actually making money, there's a pretty good chance that it's a company that's bleeding cash. Unfortunately, we have a tendency to try to take advantage of the best deals without critically thinking about whether the company in question is actually offering pricing plans that are sustainable to its business.
Also, even if a cloud storage company does happen to go out of business, you basically treat that as one prong of your redundancy failing. Cloud storage should be just one part of your overall backup strategy—not something you rely on solely.
Personally, I am a proponent of keeping a physical off-site back-up in the form of keeping a copy of your hard drives at a relative's house as well, but in truth, these backups should be treated as an absolute last resort due to the fact that the manual nature of the system means that they will not be as up to date as your on-site copy or your cloud copy. In essence, they are your "nuclear option" should literally everything else somehow fail.
You mentioned that you shoot anywhere from 15-50gb a day. Most people who are driving out a hard drive to a safety deposit box or family member's house might do it once a week or once a month. So that's anywhere between 105gb-1.5tb of data you would lose if that's what you're depending on in the event of a failure. The problem is not just the data itself, but the fact that it's your most recent data (and probably your most important).
Yes, uploading something to the tune of 80tb to a cloud solution is going to take a LOOOONG time and that's something that you can't get around. That's all the more reason that you should find a good plan and start earlier rather than later because it's not like you're going to have any less data to upload later and the longer you go without doing it, the more daunting it will become.
Once again, there's nothing intrinsically wrong with doing what was described in the article, but I think that a good backup solution needs to also utilize automation so that the latest version of your files are being kept up to date at all times.
Whatever backup solution you use is a balance between calculated risk and the amount of money or resources (in the form of time and effort) that you're willing to spend. For me, I don't really have too much redundancy since I don't have all that much money and frankly speaking, the type of photography that I do rarely involves people asking me to look back through my archives to dig up old files and potentially re-edit them. In my case, my "nuclear option" for clients is that I upload all of the fully edited JPEG files to Google Albums that they have access to. Then again, I'm primarily a real estate and business headshot photographer. Your situation might be different.
Article on raid rebuild issues:
http://www.enterprisestorageguide.com/raid-disk-rebuild-times
Please let me know how long your 10TB rebuild takes or if it was successful.
Also please let me know how fast it takes for you to backup 105gb-1tb over your internet provider and what your backup provider's storage cap is.
I also keep selects on a laptop and entire assignments on a portable hard drive that is always with me. I do not delete these files from the laptop until a week after delivery to client is complete in case client needs additional files.
I honestly wouldn't be able to tell you since the only harddrive that I've ever had fail on me was an old Kingston HyperX SSD drive that I was using at a boot drive back in my gaming days. Since then, I've been in the regular habit of rotating my hard drives out after 3 years of use.
Just for some reference, I currently use 3 total drives. I have an SSD as my boot drive, 2 internal Western Digital Gold hard drives set up in a RAID 1. By next month, I will also have an external enclosure with 2 more 10TB Western Digital Gold hard drives, also set up in a RAID 1 configuration, to mirror my internal storage. So at that point why would I ever be rebuilding anything? The only thing that I would have to fear is a fire or some other disaster taking out everything at once, in which case I still have my cloud back-up to fall back on. Mind you, unlike you, I don't have 80 terabytes of data or anything ridiculous like that largely because once I'm done with a job and my client is satisfied, I am free to just go ahead and delete the RAW files since nobody will ever come looking for them. I also don't take any video so that probably helps.
In regard to how long it would take to upload data, that depends entirely on your internet connection. Fortunately, my ISP has no bandwidth caps which would add some complication to the process (I know many do) so it's just something that would run in the background 24/7 while I live my daily life. Uploading only one terabyte, however, would certainly not take more than a day for me.
Backblaze currently has no data caps for their standard "Business Backup" storage plan, which is only $50/year. So there's that. I'm not sure about other solutions since I haven't shopped around in a while, but I'm sure you can google them.
Once again, everyone's use case is different and you have you decide what the best system is for you. Any back-up system can fail and eventually every hard drive will fail regardless of how cheap or expensive it is.
For me, if, after a month from now, I have some sort of situation where 4 hard drives (2 internal and 2 external) die on me at once (which is a possibility in the event of a fire or something since my back-up hard drives would not be off-site) and my cloud back-up service provider happens to go out of business all at the same time, then I'm fine with just chalking it up to fate playing a cruel joke on me and moving on with my life. I high doubt that this will happen, but I do acknowledge the possibility.
Then again, I'm not working in an industry where someone is liable to sue me if I can't dig up their files from more than a few days prior so all I would really be losing is my personal work and I'm comfortable with losing it if it takes that level of misfortune for it to disappear. Mind you, along with all of this, Google would either have to go out of business or wipe out all of their images in Google Photos at the same time, too, for me to not have SOME form of the image.
So yeah, I don't think I'll be telling you about rebuild times for a 10TB harddrive because I don't see myself ever having to do it. For me, two mirrored RAID 1 systems, a cloud backup solution, and Google Photos for a "nuclear option" is security enough... Hell, it's probably overkill. :P
http://www.tomshardware.com/forum/257159-32-raid-substitute-backup
Good comment on RAID 1:
RAID 1 isn't a substitute for backup because there are a lot of risks that it can't protect against.
If you accidentally delete a file, it will instantly be removed from both mirrored copies.
If your disk is corrupted by a software bug or virus, the corruption will be done to both mirrored copies simultaneously.
If you're hit by a bad enough power surge, it'll probably fry both disks at the same time.
ALSO GOOD COMMENT
Also, i would like to add failures of the RAID-layer itself.
In my estimation, most windows (fake)RAIDs do not fail because of a disk failure, but a failure on the RAID level instead causing a broken array. The user may not be able to recover from that situation, even though the data is still physically on the drives; the user lost access to it. Any hastily action the user takes may destroy the data permanently.
So for all intents and purposes, a RAID-array should be seen as a single disk; it can fail. That's why you need a backup.
I have to agree with Michael that Drobo is a beginner solution.
Choose a more mainstream NAS, like Synology or even Netgear - Make sure that it has a bitrot protected file system (BTRFS or ZFS).
As for Backblaze - It's protected by strong encryption (impractical to crack).
Many NAS offer apps to backup to Backblaze or similar.
You're only looking at initial complexity, which isn't that much, then you can in effect forget about manual interaction.
One thing that ought to be made clear, is that RAID is NOT a backup solution, but a storage solution, and it ought to have a backup solution attached to it.
A Redundant Array of Independent Drives, or RAID, is typically used to achieve at least one of the following; more storage, faster storage, more dependable storage. Even RAID 1, (mirroring), is not a backup solution, but a ‘more dependable storage’ solution. RAID is typically main storage, and main storage must be backed up.
Additionally, Network Attached Storage (NAS) and Directly Attached Storage (DAS) solutions may contain RAID storage, and a NAS/DAS solution is not always secondary storage. Some NAS/DAS solutions are used as main storage, and, as always, main storage must be backed up. NAS can be used as part of a backup system, but is not necessarily the only part.
Karim you are correct and lots of people either fail to understand that RAID is not a backup or they simply don't care about having a backup.
This is absolutely correct. RAID in itself is not a back-up solution, but your on-site back-up solution should ideally employ a RAID array to minimize the potential for data loss due to hard drive failure because as we all know, backup drives are no less capable of failing as our primary storage drives.
Correct, to some extent, which is why I mentioned that ① RAID is used primarily for more storage, faster storage, & more dependable storage, and ② That RAID is often used in NAS/DAS and other secondary storage solutions.
However, two copies of a non-RAID backup may be better than one mirrored (RAID1) copy of a backup, as two copies can be kept in separate places, and be less prone to failure from catastrophic disasters, whereas one RAID1 solution, (or any RAID or non-RAID solution), in one location, can still be totally destroyed by one disaster.
So RAID is NOT essential for backup, depending on how one does backup, but is a good idea, and does not negate having multiple backups in multiple locations.
nice methods. Although, working for a HD company, I really don't trust the external HD's shelf life over the dependability of a well-established cloud service. My process is import my photos to LR, Google Drive, and place it on a local NAS box. The Google Drive backup will take forever, but the LR and NAS Box are quick. I've had the NAS box for a few years and even that has broken down. Although people seem to hate Google Drive here, it has been a life saver for me.
OK, I am probably not the typical photographer here as my day job is as an IT architect and my solution may be beyond the technical capability of most people, but it involves staged, off premise backups and are based on cheap and versatile Raspberry Pi computers running Linux.
All important stuff on my laptop is automatically copied over the Internet to my Nextcloud server, which is situated at a friend's house abroad. This server is backed up to another server at my friend's house with a Time Machine-like functionality every hour, providing versioning and guarding against accidental deletions. Once a day (actually at 2AM at night) my Nextcloud server is backed up over the Internet to my brother's RAID1 NAS. At this time, the system has automatically provided three extra copies of my data, now in three locations. About once or twice a day (depending on the amount of work I have done), I manually (a simple click) back my laptop up over the local network to my storage server, which provides for quick, local retrieval in case my laptop gives up the spirit.
In the worst-case scenario (house burning down with the laptop, say), I ask my friend or my brother to copy the local data to (depending on the amount) a memory stick or a hard disk and get that sent to me by courier.
The original setup took some time to refine but works like a dream today and is all based on free, open source software and cheap hardware, apart from my brother's NAS. To give you an idea, each Raspberry Pi with a 4TB external disk, enclosure and power supply (one at home and two at my friend's) has cost me less than £120 (about $160) add to that another Raspberry Pi at my brother's and my total, direct cost for the servers has been less than £400 (roughly $500)!
Caring about the protection of your data is right for you. It will pay off in the long run. To protect your data from outside and inside threats, you are going to need this backup guide from https://spinbackup.com/blog/office-365-backup-guide/. Trust me; you'll be surprised by the instant results.