Ø Classification
of problems:
The
subject of computational complexity theory is dedicated to
classifying problems by how hard they are. There are many different
classifications; some of the most common and useful are the following. (One
technical point: these are all really defined in terms of yes-or-no problems --
does a certain structure exist rather than how do I find the structure.)
- P.
Problems that can be solved in polynomial time. ("P" stands for
polynomial.) These problems have formed the main material of this course.
- NP. This
stands for "nondeterministic polynomial time" where
nondeterministic is just a fancy way of talking about guessing a solution.
A problem is in NP if you can quickly (in polynomial time) test whether a
solution is correct (without worrying about how hard it might be to find
the solution). Problems in NP are still relatively easy: if only we could
guess the right solution, we could then quickly test it.
NP
does not stand for "non-polynomial". There are many
complexity classes that are much harder than NP.
- PSPACE.
Problems that can be solved using a reasonable amount of memory (again
defined formally as a polynomial in the input size) without regard to how
much time the solution takes.
- EXPTIME.
Problems that can be solved in exponential time. This class contains most
problems you are likely to run into, including everything in the previous
three classes. It may be surprising that this class is not all-inclusive:
there are problems for which the best algorithms take even more than exponential
time.
- Undecidable. For
some problems, we can prove that there is no algorithm that always solves
them, no matter how much time or space is allowed. One very uninformative
proof of this is based on the fact that there are as many problems as
there real numbers, and only as many programs as there are integers, so
there are not enough programs to solve all the problems. But we can also
define explicit and useful problems which can't be solved.
Although
defined theoretically, many of these classes have practical implications. For
instance P is a very good approximation to the class of problems which can be
solved quickly in practice -- usually if this is true, we can prove a
polynomial worst case time bound, and conversely the polynomial time bounds we
can prove are usually small enough that the corresponding algorithms really are
practical. NP-completeness theory is concerned with the distinction between the
first two classes, P and NP.
Ø NP-completeness
The theory of NP-completeness is a solution to the
practical problem of applying complexity theory to individual problems.
NP-complete problems are defined in a precise sense as the hardest problems in
P. Even though we don't know whether there is any problem in NP that is not in
P, we can point to an NP-complete problem and say that if there are any hard
problems in NP, that problems is one of the hard ones.(Conversely if everything
in NP is easy, those problems are easy. So NP-completeness can be thought of as
a way of making the big P=NP question equivalent to smaller questions about the
hardness of individual problems.)So if we believe that P and NP are unequal,
and we prove that some problem is NP-complete, we should believe that it
doesn't have a fast algorithm.
Ø Problems of
complexity theory
The most famous open
problem in theoretical science is whether P = NP. In other words, if it's
always easy to check a solution, should it also be easy to find the solution?
We have no reason to
believe it should be true, so the expectation among most theoreticians is that
it's false. But we also don't have a proof...So we have this nice construction
of complexity classes P and NP but we can't even say that there's one problem
in NP and not in P. So what good is the theory if it can't tell us how hard any
particular problem is to solve?
Ø Reduction
Formally, NP-completeness is defined in terms of
"reduction" which is just a complicated way of saying one problem is
easier than another.
We
say that A is easier than B, and write A < B, if we can write down an algorithm
for solving A that uses a small number of calls to a subroutine for B (with
everything outside the subroutine calls being fast, polynomial time). There are
several minor variations of this definition depending on the detailed meaning
of "small" -- it may be a polynomial number of calls, a fixed
constant number, or just one call.
Then
if A < B, and B is in P, so is A: we can write down a polynomial algorithm
for A by expanding the subroutine calls to use the fast algorithm for B.
So
"easier" in this context means that if one problem can be solved in
polynomial time, so can the other. It is possible for the algorithms for A to
be slower than those for B, even though A < B.
As
an example, consider the Hamiltonian cycle problem. Does a given graph have a
cycle visiting each vertex exactly once? Here's a solution, using longest path
as a subroutine:
for each edge (u,v) of G
if there is a simple path of length n-1
from u to v
return yes // path + edge form a cycle
return no
This
algorithm makes m calls to a longest path subroutine, and does O(m) work
outside those subroutine calls, so it shows that Hamiltonian cycle < longest
path. (It doesn't show that Hamiltonian cycle is in P, because we don't know
how to solve the longest path subproblems quickly.)
As
a second example, consider a polynomial time problem such as the minimum
spanning tree. Then for every other problem B, B < minimum spanning tree,
since there is a fast algorithm for minimum spanning trees using a subroutine for
B. (We don't actually have to call the subroutine, or we can call it and ignore
its results.)
Ø Cook's Theorem
We are now ready to
formally define NP-completeness. We say that a problem A in NP is NP-complete
when, for every other problem B in NP, B < A.
This
seems like a very strong definition. After all, the notion of reduction we've
defined above seems to imply that if B < A, then the two problems are very
closely related; for instance Hamiltonian cycle and longest path are both about
finding very similar structures in graphs. Why should there be a problem that
closely related to all the different problems in NP?
Theorem:
an NP-complete problem exists.
We
prove this by example. One NP-complete problem can be found by modifying the
halting problem (which without modification is undecidable).
Ø Bounded halting. This problem takes
as input a program X and a number K. The problem is to find data which, when
given as input to X, causes it to stop in at most K steps.
To
be precise, this needs some more careful definition: what language is X written
in? What constitutes a single step? Also for technical reasons K should be
specified in unary notation, so that the length of that part
of the input is K itself rather than O(log K).
For
reasonable ways of filling in the details, this is in NP: to test if data is a
correct solution, just simulate the program for K steps. This takes time
polynomial in K and in the length of program. (Here's one point at which we
need to be careful: the program can not perform unreasonable operations such as
arithmetic on very large integers, because then we wouldn't be able to simulate
it quickly enough.)
To
finish the proof that this is NP-complete, we need to show that it's harder
than anything else in NP. Suppose we have a problem A in NP. This means that we
can write a program PA that tests solutions to A, and halts within polynomial
time p(n) with a yes or no answer depending on whether the given solution is
really a solution to the given problem. We can then easily form a modified program
PA' to enter an infinite loop whenever it would halt with a no answer. If we
could solve bounded halting, we could solve A by passing PA' and p(n) as
arguments to a subroutine for bounded halting. So A < bounded halting. But
this argument works for every problem in NP, so bounded halting is NP-complete.
Ø How to prove
NP-completeness in practice
The
proof above of NP-completeness for bounded halting is great for the theory of
NP-completeness, but doesn't help us understand other more abstract problems such
as the Hamiltonian cycle problem.
Most
proofs of NP-completeness don't look like the one above; it would be too
difficult to prove anything else that way. Instead, they are based on the
observation that if A < B and B < C, then A < C. (Recall that these
relations are defined in terms of the existence of an algorithm that calls
subroutines. Given an algorithm that solves A with a subroutine for B, and an
algorithm that solves B with a subroutine for C, we can just use the second
algorithm to expand the subroutine calls of the first algorithm, and get an
algorithm that solves A with a subroutine for C.)
As
a consequence of this observation, if A is NP-complete, B is in NP, and A <
B, B is NP-complete. In practice that's how we prove NP-completeness: We start
with one specific problem that we prove NP-complete, and we then prove that
it's easier than lots of others which must therefore also be NP-complete.
So
e.g. since Hamiltonian cycle is known to be NP-complete, and Hamiltonian cycle
< longest path, we can deduce that longest path is also NP-complete.
Starting
from the bounded halting problem we can show that it's reducible to a problem
of simulating circuits (we know that computers can be built out of circuits, so
any problem involving simulating computers can be translated to one about
simulating circuits). So various circuit simulation problems are NP-complete,
in particular Satisfiability, which asks whether there is an input to a Boolean
circuit that causes its output to be one.
Circuits
look a lot like graphs, so from there it's another easy step to proving that
many graph problems are NP-complete. Most of these proofs rely on
constructing gadgets, small subgraphs that act (in the context of
the graph problem under consideration) like Boolean gates and other components
of circuits.
SAT = { f | f is a Boolean
Formula with a satisfying assignment }
Ø 3-SAT
3-SAT = { f | f is in Conjunctive Normal Form, each clause has exactly 3 literals and f
is satisfiable }
3-SAT
is NP-Complete
Ø NP-Completeness Proof
Method
·
To show that Q is NP-Complete:
1)
Show that Q is in NP
2)
Pick an instance, R, of your favorite NP-Complete problem (ex: Φ in 3- SAT)
3)
Show a polynomial algorithm to transform R into an instance of Q
Ø Clique
·
CLIQUE = { <G,k> | G is a graph with a
clique of size k }
·
A clique is a subset of vertices that are all
connected
Reduce 3-SAT to Clique
·
Pick
an instance of 3-SAT, Φ, with k clauses
·
Make
a vertex for each literal
·
Connect
each vertex to the literals in other clauses that are not the negation
Any
k-clique in this graph corresponds to a satisfying assignment
- Independent Set
INDEPENDENT SET = { <G,k> | where G has
an independent set of size k }
An independent set is a set of vertices that
have no edges
How can we reduce this to clique?
Conclusion |
We
conclude that, given any instance I of problem A, we can construct in polynomial time a circuit whose
known inputs are the bits of I,
and whose unknown inputs are the bits of S, such that the output is true if and
only if the unknown inputs spell a solution S of I.
In other words, the satisfying truth assignments to the unknown inputs
of the circuit are in one-to-one correspondence with the
solutions of instance I of
A. The
reduction is complete.
No comments:
Post a Comment