Proving a Computer Program's Correctness

jgrahamc · on Oct 2, 2009

Yes, this stuff is hard. My doctoral thesis was about using formal methods to prove that a system met a security policy. Specifically, I developed an algebra for reasoning about a system and creating its code directly from a specification.

If that sort of stuff excites you it's here: http://www.jgc.org/blog/jgc-thesis.pdf

billswift · on Oct 2, 2009

Actually, Edsger W Dijkstra advocated this stuff for many years. Almost all of his writings are collected on his home page at the University of Texas, http://www.cs.utexas.edu/users/EWD/ . There have been some advances since, but as the Schneier article shows, not much that is very useful yet.

makecheck · on Oct 2, 2009

It seems to me like a big part of this would depend on a stable foundation. And in the software world, that foundation shifts a lot. Last I checked, no one in the industry had come together and agreed on a list of programs that would never be changed again. And even if you could stabilize it all, how long would it be before you're forced to port it to new hardware?

The moment anything shifts, any validation work you've done is called into question. Okay, for 30 seconds, your program was proven correct; but how about now? Did the change cause a problem? And how, exactly, would you detect it, without repeating the validation entirely?

To me, this does underscore the importance of keeping programs small, stable, and well understood. If anyone's bragging about how many millions of lines of code they have, as Microsoft put in a bullet point for Windows once, they should not be considered serious programmers. The bigger something is, the less it takes to introduce one of these unvalidated quirks.

gdp · on Oct 2, 2009

Extrapolating the effort involved in proving huge numbers of complex safety properties for an operating system written in C to programs in general is a bit silly. Doing formal verification in more abstract languages gets considerably easier, and considerably easier to mechanise large chunks of the work (a Hindley-Milner type system is a kind of proof of a safety property, for example, and that's totally mechanical).

So it might take 200,000 years to verify Windows, but considerably less to verify an equivalent amount of non-operating system Haskell code, for example.

billswift · on Oct 2, 2009

The language specification and the compiler would have to be proven correct first. Which is increasingly difficult with higher level languages and with optimizing compilers. The critical aspect is that the MACHINE CODE has to be correct. That is one of the reasons the problem is so hard.

gdp · on Oct 2, 2009

For this project, that's assumed. Provided you have some kind of formal semantics, you can define translations. And in fact, Adam Chlipala's work on certified compilers show that it's not impossible.

http://adam.chlipala.net/papers/CtpcPLDI07/

(there are subsequent papers too, but that's probably a good place to start).

gruseom · on Oct 2, 2009

I've never gotten this. How do they know there aren't bugs in the proof? Mathematical proofs often have bugs. They're caught by inspection by other mathematicians. Sometimes it takes years before that happens.

The problem is that most proofs about code are likely to go without much inspection, just as most code goes without inspection.

jerf · on Oct 2, 2009

Well, it works exactly like mathematics. You start with a small kernel that seems likely to be true, then you build on top of that with transforms likely to be true, and at the end you end up with a proof statement that is fairly likely to be true.

Then you sit and pound on the core axioms and transformations to make sure they are correct. This is impossible, in the strictest sense, but it is often practical.

All programs are mathematical proofs of one sort or another, so in the end, it's the same problem that every programmer faces: How do I know this program isn't buggy? Well, you don't, and yet, here we are, with broadly functioning programs of great complexity. Ultimately, going through a rigidly mechanical proof is a quantitative change, in that it takes your confidence in the lack of bugs from less than 100% to another value much closer to 100% but still lesser, but as I like to say, a large enough quantitative change is a qualitative change. So while it is true that the proof may not be 100%, that doesn't mean it would be useless.

Also, it should be pointed out that in the real world there is a real error rate that real CPUs have; a percentage of computations will go wrong. Pushing your confidence of your algorithms below that point is largely useless. In the real world, the difference between 10 or 15 9s and actual, factual 100% isn't actually relevant, so the fact that the proofs can't get to 100% is a much lesser concern than you might think. It would only matter if we had 100% reliable computers in every other way. I think I can safely say that the dominant problem in computing today is not unreliable hardware, though, it's bugs in our algorithms, so this is the right approach. (Note I said dominant; massive server farms have to worry about unreliable hardware, but even then, you'll be handling that with algorithms and code and your dominant problems on that front will be bugs in the code.)

gruseom · on Oct 2, 2009

But you didn't answer my point: mathematical errors that escape the original prover are caught by inspection by other mathematicians, and one thing we already know about the software business is that inspection is an extremely scarce resource relative to the amount of code produced.

jerf · on Oct 2, 2009

Oh, well: "Verifying the kernel—known as the seL4 microkernel—involved mathematically proving the correctness of about 7,500 lines of computer code in an project taking an average of six people more than five years." Observing that that is not practical was pointed out by Bruce, too. In this case there was a lot of examination.

(Frankly, sounds pretty dull to me.)

stcredzero · on Oct 2, 2009

Bootstrapping. You build the proof inside an automated theorem prover that's already been verified. In fact, one might have bootstrapped that from a simpler tool that's already been verified.

gruseom · on Oct 2, 2009

That makes more sense. In fact, it sounds like automated testing, only at a higher level of abstraction (proving general invariants rather than concrete examples).

edw519 · on Oct 2, 2009

...said for the first time a team had been able to prove with mathematical rigour that an operating-system kernel...

In this case, exactly what does "mathematical rigour" mean? I imagine there are several responses to this question, but even a conceptual overview of how math can prove bug-freeness would have been nice.

gdp · on Oct 2, 2009

As has been said before, it's not about being "bug free" - it's about a number of safety properties always holding as long as the assumptions are true.

The safety properties might be things like who can write to what memory, and when it is acceptable for bits of the kernel to do certain things. Lock freedom in scheduling is probably also a big thing. I haven't looked at any of this very closely, but I'm just guessing based on the sorts of properties I would try to prove initially.

stcredzero · on Oct 2, 2009

it's about a number of safety properties always holding as long as the assumptions are true

And no project is ever without such assumptions. You can't have abstractions without them.

gdp · on Oct 2, 2009

I was using 'assumption' in the strict mathematical-logic sense. These assumptions are stated and it is trivial to check that the proof is not true if the assumptions aren't true.

stcredzero · on Oct 2, 2009

I was using 'assumption' to talk about the faith that abstractions and their (sometimes mathematical) assumptions can be successfully applied to solve real-world problems.

Some may argue that there's no guarantee that such proofs apply to real world systems. But really, there's no guarantee that any abstract model does. We humans have to get paid for something.

gdp · on Oct 2, 2009

> Some may argue that there's no guarantee that such proofs apply to real world systems.

I would argue that those who would argue that have probably not spent a lot of time doing any real-world formal verification.

People seem to like arguing about its utility in the abstract without ever doing it.

stcredzero · on Oct 2, 2009

You might be replying as if I'm disagreeing. If so, please re-read ancestor posts with the other spin.

In short: most of the objections people have to formal verification also apply to modeling something in software to begin with.

gdp · on Oct 2, 2009

No, I wasn't suggesting you were disagreeing, I was simply responding to your hypothetical disagreement that some might argue. My counter-argument could be just as applicable to any of the people who have professed similar beliefs to the ones you speculated that some may hold.

stcredzero · on Oct 2, 2009

It's not so hypothetical. It's more the typical response of a programmer when formal methods come up. But I meant it as an observation of typical reactions, not as a hypothetical debate.

ckuehne · on Oct 2, 2009

As a starting point have a look at http://en.wikipedia.org/wiki/Predicate_transformer_semantics.

jah · on Oct 2, 2009

If the correctness of a program requires the ability to terminate, how do these proofs get around the Halting Problem?

MaysonL · on Oct 2, 2009

You can't, for the general case, prove whether or not a program terminates. For many programs, you can.

jgrahamc · on Oct 2, 2009

These are not machine proofs, so they are not getting round the halting problem.

rpcutts · on Oct 2, 2009

Ah reminds me of when I used to get to play around with B.

ilyak · on Oct 2, 2009

What if your proof contains errors?

Your program is code, your proof is code, both are going to be buggy.

And what are you going to prove?

gdp · on Oct 2, 2009

http://news.ycombinator.com/item?id=760547

There was some discussion on the original post to HN about this announcement. In particular, look at this:

http://ertos.nicta.com.au/research/l4.verified/proof.pml

Which talks about the limits of provability.

In short, a "proof" that contains errors isn't really a proof of anything. And you build your proof inside an automated theorem prover so that it can be mechanically checked for internal inconsistency (i.e. contradictions), or for a completely inability to be false (i.e. tautologies).

ilyak · on Oct 2, 2009

Your automated theorem prover contains bugs, too.

gdp · on Oct 2, 2009

Read the link.

If you understand a little bit about how something like Isabelle is constructed, only a small number of axioms are "hand coded", and then the rest is built from those. You can always build an external proof checking tool that checks that the proof you've constructed is correct. And if you're not confident in that, you can build another one.