Tuesday, 15 October 2013

NP-COMPLETE PROBLEMS

Ø Classification of problems:
  The subject of computational complexity theory is dedicated to classifying problems by how hard they are. There are many different classifications; some of the most common and useful are the following. (One technical point: these are all really defined in terms of yes-or-no problems -- does a certain structure exist rather than how do I find the structure.)
  • P. Problems that can be solved in polynomial time. ("P" stands for polynomial.) These problems have formed the main material of this course.
  • NP. This stands for "nondeterministic polynomial time" where nondeterministic is just a fancy way of talking about guessing a solution. A problem is in NP if you can quickly (in polynomial time) test whether a solution is correct (without worrying about how hard it might be to find the solution). Problems in NP are still relatively easy: if only we could guess the right solution, we could then quickly test it.
NP does not stand for "non-polynomial". There are many complexity classes that are much harder than NP.
  • PSPACE. Problems that can be solved using a reasonable amount of memory (again defined formally as a polynomial in the input size) without regard to how much time the solution takes.
  • EXPTIME. Problems that can be solved in exponential time. This class contains most problems you are likely to run into, including everything in the previous three classes. It may be surprising that this class is not all-inclusive: there are problems for which the best algorithms take even more than exponential time.
  • Undecidable. For some problems, we can prove that there is no algorithm that always solves them, no matter how much time or space is allowed. One very uninformative proof of this is based on the fact that there are as many problems as there real numbers, and only as many programs as there are integers, so there are not enough programs to solve all the problems. But we can also define explicit and useful problems which can't be solved.
Although defined theoretically, many of these classes have practical implications. For instance P is a very good approximation to the class of problems which can be solved quickly in practice -- usually if this is true, we can prove a polynomial worst case time bound, and conversely the polynomial time bounds we can prove are usually small enough that the corresponding algorithms really are practical. NP-completeness theory is concerned with the distinction between the first two classes, P and NP.

Ø NP-completeness
The theory of NP-completeness is a solution to the practical problem of applying complexity theory to individual problems. NP-complete problems are defined in a precise sense as the hardest problems in P. Even though we don't know whether there is any problem in NP that is not in P, we can point to an NP-complete problem and say that if there are any hard problems in NP, that problems is one of the hard ones.(Conversely if everything in NP is easy, those problems are easy. So NP-completeness can be thought of as a way of making the big P=NP question equivalent to smaller questions about the hardness of individual problems.)So if we believe that P and NP are unequal, and we prove that some problem is NP-complete, we should believe that it doesn't have a fast algorithm.

Ø Problems of complexity theory
The most famous open problem in theoretical science is whether P = NP. In other words, if it's always easy to check a solution, should it also be easy to find the solution?
We have no reason to believe it should be true, so the expectation among most theoreticians is that it's false. But we also don't have a proof...So we have this nice construction of complexity classes P and NP but we can't even say that there's one problem in NP and not in P. So what good is the theory if it can't tell us how hard any particular problem is to solve?

 Ø Reduction
Formally, NP-completeness is defined in terms of "reduction" which is just a complicated way of saying one problem is easier than another.
We say that A is easier than B, and write A < B, if we can write down an algorithm for solving A that uses a small number of calls to a subroutine for B (with everything outside the subroutine calls being fast, polynomial time). There are several minor variations of this definition depending on the detailed meaning of "small" -- it may be a polynomial number of calls, a fixed constant number, or just one call.
Then if A < B, and B is in P, so is A: we can write down a polynomial algorithm for A by expanding the subroutine calls to use the fast algorithm for B.
So "easier" in this context means that if one problem can be solved in polynomial time, so can the other. It is possible for the algorithms for A to be slower than those for B, even though A < B.
As an example, consider the Hamiltonian cycle problem. Does a given graph have a cycle visiting each vertex exactly once? Here's a solution, using longest path as a subroutine:
    for each edge (u,v) of G
    if there is a simple path of length n-1 from u to v
        return yes      // path + edge form a cycle
    return no
This algorithm makes m calls to a longest path subroutine, and does O(m) work outside those subroutine calls, so it shows that Hamiltonian cycle < longest path. (It doesn't show that Hamiltonian cycle is in P, because we don't know how to solve the longest path subproblems quickly.)
As a second example, consider a polynomial time problem such as the minimum spanning tree. Then for every other problem B, B < minimum spanning tree, since there is a fast algorithm for minimum spanning trees using a subroutine for B. (We don't actually have to call the subroutine, or we can call it and ignore its results.)

Ø Cook's Theorem
We are now ready to formally define NP-completeness. We say that a problem A in NP is NP-complete when, for every other problem B in NP, B < A.
This seems like a very strong definition. After all, the notion of reduction we've defined above seems to imply that if B < A, then the two problems are very closely related; for instance Hamiltonian cycle and longest path are both about finding very similar structures in graphs. Why should there be a problem that closely related to all the different problems in NP?
Theorem: an NP-complete problem exists.
We prove this by example. One NP-complete problem can be found by modifying the halting problem (which without modification is undecidable).

Ø  Bounded halting. This problem takes as input a program X and a number K. The problem is to find data which, when given as input to X, causes it to stop in at most K steps.
To be precise, this needs some more careful definition: what language is X written in? What constitutes a single step? Also for technical reasons K should be specified in unary notation, so that the length of that part of the input is K itself rather than O(log K).
For reasonable ways of filling in the details, this is in NP: to test if data is a correct solution, just simulate the program for K steps. This takes time polynomial in K and in the length of program. (Here's one point at which we need to be careful: the program can not perform unreasonable operations such as arithmetic on very large integers, because then we wouldn't be able to simulate it quickly enough.)
To finish the proof that this is NP-complete, we need to show that it's harder than anything else in NP. Suppose we have a problem A in NP. This means that we can write a program PA that tests solutions to A, and halts within polynomial time p(n) with a yes or no answer depending on whether the given solution is really a solution to the given problem. We can then easily form a modified program PA' to enter an infinite loop whenever it would halt with a no answer. If we could solve bounded halting, we could solve A by passing PA' and p(n) as arguments to a subroutine for bounded halting. So A < bounded halting. But this argument works for every problem in NP, so bounded halting is NP-complete.

Ø How to prove NP-completeness in practice
The proof above of NP-completeness for bounded halting is great for the theory of NP-completeness, but doesn't help us understand other more abstract problems such as the Hamiltonian cycle problem.
Most proofs of NP-completeness don't look like the one above; it would be too difficult to prove anything else that way. Instead, they are based on the observation that if A < B and B < C, then A < C. (Recall that these relations are defined in terms of the existence of an algorithm that calls subroutines. Given an algorithm that solves A with a subroutine for B, and an algorithm that solves B with a subroutine for C, we can just use the second algorithm to expand the subroutine calls of the first algorithm, and get an algorithm that solves A with a subroutine for C.)
As a consequence of this observation, if A is NP-complete, B is in NP, and A < B, B is NP-complete. In practice that's how we prove NP-completeness: We start with one specific problem that we prove NP-complete, and we then prove that it's easier than lots of others which must therefore also be NP-complete.
So e.g. since Hamiltonian cycle is known to be NP-complete, and Hamiltonian cycle < longest path, we can deduce that longest path is also NP-complete.
Starting from the bounded halting problem we can show that it's reducible to a problem of simulating circuits (we know that computers can be built out of circuits, so any problem involving simulating computers can be translated to one about simulating circuits). So various circuit simulation problems are NP-complete, in particular Satisfiability, which asks whether there is an input to a Boolean circuit that causes its output to be one.
Circuits look a lot like graphs, so from there it's another easy step to proving that many graph problems are NP-complete. Most of these proofs rely on constructing gadgets, small subgraphs that act (in the context of the graph problem under consideration) like Boolean gates and other components of circuits.
Ø  SAT

SAT = { f | f is a Boolean Formula with a satisfying assignment }
Ø  3-SAT
   3-SAT = { f | f is in Conjunctive Normal Form,  each clause has exactly 3 literals and f is satisfiable }
  3-SAT is NP-Complete


Ø NP-Completeness Proof Method
·        To show that Q is NP-Complete:
  1) Show that Q is in NP
  2) Pick an instance, R, of your favorite NP-Complete problem (ex: Φ in 3- SAT)
  3) Show a polynomial algorithm to transform R into an instance of Q


Ø Clique
·        CLIQUE = { <G,k> | G is a graph with a clique of size k }
·        A clique is a subset of vertices that are all connected


Reduce 3-SAT to Clique

·        Pick an instance of 3-SAT, Φ, with k clauses
·        Make a vertex for each literal
·        Connect each vertex to the literals in other clauses that are not the negation
      Any k-clique in this graph corresponds to a satisfying assignment
  •    Independent Set

  INDEPENDENT SET = { <G,k> | where G has an independent set of size k }
  An independent set is a set of vertices that have no edges
  How can we reduce this to clique?
Conclusion
We conclude that, given any instance I of problem A, we can construct in polynomial time a circuit whose known inputs are the bits of I, and whose unknown inputs are the bits of S, such that the output is true if and only if the unknown inputs spell a solution S of I. In other words, the satisfying truth assignments to the unknown inputs of the circuit are in one-to-one correspondence with the solutions of instance I of A. The reduction is complete.












No comments:

Post a Comment