Ya know, we have it pretty darn good.
Our disk drives are super dense, super fast, really small, and amazingly reliable.
Except when they’re not!
Because of their data density, sometimes it’s really hard to figure out that the problem you’re chasing is a slightly corrupted file that you only access once in a great while.
Or that a subdirectory contains a group of corrupted files that you never access but that the system knows are problematic and is trying to fix instead of telling you, “Those files there are a steaming pile of crap… Sorry!”
I’ve been chasing a subtle wonky problem around my network for a while. Most of the time I was able to minimize the issue to the point that I forgot about it.
With the recent changes to the network, I caused a serious disturbance in the force! The disturbance was bad enough that I wasn’t able to minimize the problem anymore, its just as well, I hate “now ya see ‘em… now you don’t issues.”
Now that I’ve had to truly address the problem, I think I’ve narrowed it down to a bad drive in one of my TimeCapsules. This is of course the drive with backups dating back 4 years. Thank goodness I had a copy of that backup on a portable drive or I’d have been really bummed out.
I don’t know what the Time Capsule drive was actually doing, but I can tell you that it wasn’t backing up properly, and was sending all kinds of network broadcast messages.
Those millions of messages were impacting network performance across the board. Ironically, none of those messages actually said anything like, “Hey I’m in trouble please fix me!”
Sigh, there are times when I’d love to slap some engineers! Then again I can see that they’re trying to make things simple and user friendly. I just wish Apple, in all their endless forums would publish something like “For those of you that want to see all the errors… Set these parameters!”
It would make things a lot easier sometimes.
After a lot of troubleshooting, isolating the drive and now reformatting it, my network is operating at expected and acceptable speeds. Why reformat it? I’m just curious if any part of it is salvageable. Its possible that there’s nothing wrong with the drive, just that the data was corrupted by a power failure or something.
An example of the difference in speed is that 1GB backed up every two hours (Before Isolating the drive). Now 1 GB backs up in just a few minutes. You don’t really think about speed across your network until you have over 300GB of data that you want to back up. Then suddenly its a really big deal.
I can’t complain even if the Time Capsule drive is bad I’ve gotten 6 years out of it. That’s a pretty good run on a drive that’s running all the time. Now the question is do I trust the drive IF it passes reformatting? I can and have, begun backing up on another NAS drive. If the Time Capsule drive checks out OK do I dare trust it?
The really funny part of all the is that I never used to have backups. Typically when I delete something, I’m very Hillary about it. I mean for that shit to stay gone. But there have been times since I started using Apples Time Machine that I’ve pulled things back from the abyss.
It’s funny to me how quickly I adapt to and become reliant on new technology.
Then there’s the other part of me that doesn’t like change. That part thinks, “If the Time Capsule drive is really dead, I wonder if I could simply replace it with a new drive?”
After all if everything else in the TimeCapsule is still working why toss the whole machine? Although, a nice shiny new Time Capsule would fit very well in my wiring closet.
Update
I was overconfident. I thought I had figured the problem out and everything was working properly again.
I’ve tried all the tricks I know, I’ve tried unwrapping the ball of fur that is Time Machine backup. Still the backup system is behaving completely abnormally. I can get the system working properly for one backup, when the next scheduled backup starts things go to hell again.
This leaves me in an interesting situation.
I have other things to do besides do unpaid QA on Apple products.
I have a couple of websites that I must protect until delivery for example. I’ve got my own writing that I’d like to do.
Since I’ve resolved all the other issues, plaguing my network and have come to the conclusion that regarding the backups, the common denominator is my primary computer, as I see it I have 3 basic choices.
1) Turn off Time Machine, check its functionality after each release of patches to OS X. I believe that this problem was introduced with the last round of patches as I’m seeing log entires denoting bugs and failures that I’ve never seen before. Even after backing out the patches, the backups still only work once. I can make sure that the critical data is moved to offline storage manually
2) Hope that the backups I have are in fact reliable. Then reformat and reinstall my system from the ground up, paying attention to the partition table. As I recently learned with a friends computer, there can be partition table issues that cause Time Machine to fail arbitrarily with nothing more than a cryptically unhelpful message. That message, of course leads you right down the rabbit hole of wasted time and incorrect diagnostics. I suspect this is the loop I’m in right now. Like my friend, my computer is running just dandy although when it’s trying to do a Time Machine backup it does run a tad warm.
3) Locate, research, and then install, and use a 3rd party piece of backup software and retire Time Machine altogether. Time Machine is very convenient, and when it works, it works well. There are however other solutions and these other solutions provide more flexibility without some of the risks I’ve read of inherent to Time Machine. For example; there are multiple reports of corrupted indexing, corrupting the underlying data. This results in a backup being completely useless. Worse than useless in fact, because you don’t know the backup is trash, UNTIL you need to restore from it.
Perhaps a backup solution that permits cloning of the computer’s internal drive is a better solution. This means that there is always a bootable, functional copy of the system’s disk and it’s easy to check that it’s a working copy.
Perhaps a combination of all three solutions is the best way to protect my data. If I start with 3 then execute 2 then perhaps 1 won’t be necessary.
I’ve been at this off & on for the last week, its time to move on.
For those who are searching for answers, My experience with the Apple Forums has been next to usless, except as a time waster.
Here are the symptoms I’m seeing.
“Preparing Backup” <- This step can go on for hours. During this time the console log shows that the backup drive is mounted but there may be no log activity for 20, 30, or some random number of minutes. When activity is again shown in the log it is almost but not always a message saying;
Deep event scan at path:/ reason:must scan subdirs | new event db |
This happens even if you initiate a backup right after a successful backup, meaning there should be no need to rescan the system hasn’t changed.
process backupd[751] thread 36644 caught burning CPU! It used more than 50% CPU
This message makes sense because the whole system is very busy during indexing. But it shouldn’t be required to index the entire drive with each backup.
The following log is closer to normal but this log results in the system taking 12 hours to copy 12 GB. The new full disk 350GB backup to the same volume over the same connection only took six hours to begin with.
com.apple.backupd[751]: Starting automatic backupcom.apple.backupd[751]: Attempting to mount network destination URL: afp://UserName._afpovertcp._tcp.local./DriveNamecom.apple.backupd[751]: Mounted network destination at mount point: /Volumes/DriveName using URL: afp://UserName._afpovertcp._tcp.local./DriveNamecom.apple.backupd[751]: Disk image /Volumes/DriveName/ComputerName.sparsebundle mounted at: /Volumes/Time Machine Backupscom.apple.backupd[751]: Backing up to /dev/disk4s2: /Volumes/Time Machine Backups/Backups.backupdbcom.apple.backupd[751]: Event store UUIDs don't match for volume: Macintosh HDcom.apple.backupd[751]: Waiting for index to be ready (100)com.apple.backupd[751]: Deep event scan at path:/ reason:must scan subdirs|new event db|com.apple.backupd[751]: Reading cached event database from: /Volumes/Time Machine Backups/Backups.backupdb/ComputerName/2015-04-02-074526.inProgress/15A09E54-6451-4E09-A947-4F7EBAF8DE21/.CFB31849-81AA-376E-BE44-9BE66ECE17D2.eventdbcom.apple.backupd[751]: Using cached disk scancom.apple.backupd[751]: Saved event cache at /Volumes/Time Machine Backups/Backups.backupdb/ComputerName/2015-04-02-074526.inProgress/38C9D5E1-C5B6-4F3F-BB63-19C909718795/.CFB31849-81AA-376E-BE44-9BE66ECE17D2.eventdbcom.apple.backupd[751]: Not using file event preflight for Macintosh HDcom.apple.backupd[751]: Found 415344 files (11.39 GB) needing backupcom.apple.backupd[751]: 14.63 GB required (including padding), 1.69 TB availablecom.apple.backupd[751]: Copied 96408 items (1.03 GB) from volume Macintosh HD. Linked 335719.
kernel[0]: process backupd[751] thread 36644 caught burning CPU! It used more than 50% CPU (Actual recent usage: 61%) over 180 seconds. thread lifetime cpu usage 548.276287 seconds, (241.274677 user, 307.001610 system) ledger info: balance: 90003836101 credit: 536400616669 debit: 446396780568 limit: 90000000000 (50%) period: 180000000000 time since last refill (ns): 147338040650
spindump[444]: Saved cpu_resource.diag report for backupd version ??? (???) to /Library/Logs/DiagnosticReports/backupd_2015-04-02-082002_ComputerName.cpu_resource.diag
I have no answers for any of you looking to find out what happened to your nice reliable Time Machine backup. I can only say that since the last patch mine has been completely screwed.
The event logs look pretty much the same and the computers behavior is the same, if I’m backing up to a network attached drive or a locally attached external drive.
Virus scans show no issues, Malware scan shows no issues. Deleting and forcing a rebuild of spotlight indexes on all drives does not resolve the problem.
Disk Repair indicates that the internal drive and external drives are healthy. I’ve run the repair disk option on all drives, reformatted external drives, and am confident that whatever the problem is, it is not the external drives.
Everything points back to this particular system which has led me to the three options.
One more point of interest is this.
I have not been able to permanently exclude the backup drives from spotlight indexing. I’ve tried via Terminal and via the spotlight application in System Preferences.
The exclusions are deleted after a reboot.