Tuesday, September 16, 2014

Apple Time machine: accomplishment unlocked -- Decreased trust!


The saying is true: you only know backup works when you've restored.

I got a new laptop and restored from time machine backup on an external drive attached to my airport extreme at home. My time machine backups had been going fine for years. It took a long time, and the network got disconnected so I had to restart it once, but eventually claimed to finish.

And there's the rub it *claimed* to restore, but it only partially restored. It's one thing to error, but why is it Apple UI's lie to me? (For example, my iPhone's iMessage has shown my message as delivered, whereas my wife's showed no sign of the message.)

For anyone else who runs into a failed Time Machine restore I'd like to write down what I've found as I manually attempt to synchronize critical folders with the backup image.
  1. First, my home directory appears complete. The incomplete restorations all have to do with top-level directories like "/foo" and "/bar" that are either missing entirely or have the following problems.
  2. Some subdirectories are just missing. My Aperture photo library had most years but was totally missing 2012 and 2013, as well as other misc files.
  3. Some files were present but zero length.
  4. Finally, the weird thing is that some files were actually corrupted. I found photos that were smaller size in the restore than in the latest backup image on the time machine disk. Indeed, these jpegs, could still be viewed but would be missing part of the image. 
The last makes me especially nervous because it's NOT consistent with a (sequential) transfer of files and directories that got interrupted at a single point in time. Of course, I was guilty of restarting the restore process as I'd mentioned at the top. So maybe the bottom line here is that restarting it without manually wiping the HD in between is not safe. (It's not idempotent.)

Right now I'm using Unison to do the repairs, but for greatest peace of mind I should probably wipe it and restore again... The problem is that I used this machine for a while before noticing that the time machine restore was a partial failure!!

Monday, January 6, 2014

Five flavors of the parallel AND operation

The issue of parallel, logical and operations was recently discussed at length on my student, Lindsey Kuper’s, blog.
This is a follow-up to that post, with the purpose of laying out a taxonomy of conjunction operations that might occur in a parallel language. (The same discussion applies equally to disjunction.) Below I classify these operations into five levels, from basic to full featured. A good context to imagine as we go through this is a computationally intensive search problem with lots of nested parallel conjunctions and disjunctions that expose opportunities for parallelism.
Before we get started, some ground rules. We need the operation to take some kind of deferred computations or thunks. Further, we assume these can be turned into futures for exposing parallelism. In our case, we use Haskell and the Par monad so this means that we expect something like this:
result <- span=""> asyncAnd comp1 comp2 
where comp1 and comp2 are monadic computations that, once executed, return a Bool value. But the distinctions below could just as well apply to a plain, non-monadic and operation in a lazy, pure language — for example, Haskell’s (&&) operator.

Level One

  1. Strict: Evaluate both comp1 and comp2 in parallel, and then return the logical and of the resulting booleans.
This "level one" and is parallel, but is also less work-efficient than the sequential left-to-right short-circuiting && operation found in most programming languages. It always evaluates both computations fully. Thus, our next upgrade is short-circuiting.

Level Two

Here it helps to generalize a bit and consider an and that takes a list of an arbitrary number of input computations:
  1. Short-circuiting: for a list of input computations, evaluate them using a machine-specific degree of parallelism, but also stop launching new computations whenever a False is encountered in an already-completed computation.
This is an intermediate category which captures operations which have some degree of parallelism but are asymmetric, much like left-to-right sequential (&&). That is, some input computations have the ability to short-circuit (deschedule) others, but the relationship is not fair; it’s not all-to-all.
You can build an N-way and meeting this limited criteria using (1) a level-one parallel and together with (2) a short-circuiting sequential and. The two operators could be combined by a heuristic strategy for determining how many parallel tasks are appropriate, and then "bottoming out" to the sequential version. Still, as long as the sequential and is used, inputs are handled asymmetrically; and as long as a level-one parallel and is used we will run unnecessary computations that can’t be descheduled or canceled. Indeed, this is a real tension: work-efficiency vs. parallelism.

Level Three

  1. Short-circuiting, symmetric: all input computations are potentially evaluated in parallel, and any one of them returning False quickly can potentially cause the result of the entire and operation to become available immediately.
Ok, so this helps us get early answers. But it doesn’t impose any requirements as to the work-efficiency of the entire operation. That is, it doesn’t mandate that once an N-way and operation returns False that remaining computations will be cancelled in bounded time. Thus our fourth level concerns cancellation.

Level Four

  1. Symmetric with cancellation: When an early answer is given, all other threads/tasks are told to stop computing because their answers are no longer needed. There exists a bound on how much computation lingering child-computations of the and may do after this point in time.
Anecdotally, I think level-four ands are pretty rare in practical parallel languages and libraries, simply because [transitive] task cancellation is not a common feature of modern parallel runtime systems. I would be happy to be wrong about this! So please share any links to standard library documentation that mention level-four parallel ands out there in the wild.

Level Five

Our final stop is a feature that is not strictly a property of the conjunction operator itself, but rather a complementary feature we find useful in some of our applications. It concerns the ability of canceled computations to do useful work, by storing partial results that are computed before cancellation occurs.
In an unrestricted parallel language, canceled computations may have arbitrary side effects, which, in general, race with the cancellation message itself. In a deterministic language such side effects cannot be allowed; rather, it is usually safe to cancel only "pure" computations. There is one exception, however, a memoized pure function provides a "side effect free" way of storing partial work.
  1. Cancellation + memoization: Canceled threads never provide a final result, but they can contribute to other computations by adding to their memo-tables.
One example of an algorithm which benefits from this combination is the subtype-checking algorithm for equi-recursive types in Chapter 21 of Types and Programming Languages. Verifying a subtype relation between two types requires [parallelizable] conjunctions and disjunctions (e.g. when checking product and sum types), but at the same time memoization can help performance because when typechecking real code, the same subtype checks are made repeatedly.
Our recent draft paper on "taming the parallel effect zoo", discusses the combination of cancellation and memoization further.
Is there a level six? Is there anything else we might ask of our parallel logical operators?

Final thoughts: Parallel conjunction/disjunction in GHC

GHC with "sparks" (futures) can do level-two ands, for example by sparking the arguments to (&&). But short-circuiting must follow a predetermined order. Further, cancellation is not possible with sparks, because sparked computations cannot be removed from the spark pool.
If one could use internal GHC functions to determine whether an object is a THUNK or CONSTR at runtime, it would then be possible to do more flexible short-circuiting, for example, sparking some elements of a [Bool] while executing others on the current thread, and inbetween executions, polling for completion of the already-sparked futures. Yet this road seems unappealing for a number of reasons.
Alternatively, rather than using the spark mechanism, it is certainly possible to use IO and MVars to implement parallel-and in Haskell, as Conal Elliot does in this post, using an unamb combinator. But forkIO is vastly higher overhead than sparks, with our own Par monad’s futures falling somewhere inbetween. Moreover, our desire to include transitive cancellation and memoization features motivates us to build these operators on a custom Par monad rather than with IO or sparks directly.