24 March 2005

Counting partitions

This post is related to part A of Lars' contest. Your job there is to find a (0,1) matrix that maximizes a certain objective function. A (0, 1) matrix is a matrix whose elements are taken from the set {0, 1}. The row sum of a n x m matrix is a n-vector whose i-th element contains the sum of i-th row in the matrix. Similarly you can define the column sum of a matrix. An interesting (standard) problem is to find the necessary and sufficient conditions such that there exist a (0,1) matrix with given row and column sums; and if there is then construct one. But here I'm going to talk about another problem.

I wanted to partition the search space into matrices with certain row/column sums and then proceed with searching these regions in a certain order. The search for each region is implemented. But now I need to order these "regions" by how interesting they are. My first idea was to generate all possible row/column sums, order them by some score and then take the top R regions. In order to get the top R row/column sums I don't need to remember all possibilities and then order them. But I do need to generate them all (at least if I want to keep the definition of a row/column sum score easy to be changed). Fortunately my first instinct was to say: let's count how many possibilities are there to see if it is feasible to generate them all. It turns out they are too many :(

So what I want to present here is a way of counting possible row partitions of a (0,1) matrix. I will also restrict myself to square matrices as in Lars' contest; but the generalization is straightforward. What exactly do we count? The number of solution of the equation: x1+x2+...+xn=s. Here s is the total number of ones in the matrix and n is the number of rows (and columns) in the matrix. Each x is at least 0 and at most n. I will denote the number we search for as p(n, s, m), where m is the maximum value allowable for x, i.e. I have generalized the problem a bit. To paraphrase Dijkstra, sometimes making the problem more general means making it easier to solve.

My first idea was to count the solutions of the equation with sum s, maximum value per term m and number of terms n by counting first solutions with no zero term, then solutions with one zero term, etc. If you write this down you'll see that after very little manipulation (sorry, I hate to write math in HTML -- much worse than LaTeX) the following program works.

   3:let cache = Hashtbl.create 10000000;;
   4:
   5:let rec count n s m = 
   6:  match (s, m) with
   7:    (0, _) -> 1 (* order is relevant! *)
   8:  | (_, 0) -> 0
   9:  | _ -> if n < 0 || s < 0 || m < 0 then 0 else
  10:    try
  11:      Hashtbl.find cache (n, s, m)
  12:    with Not_found ->
  13:      let r = (count (n-1) s m) + (count n (s-n) (m-1)) in
  14:      if r < 0 then failwith "integer overflow" else
  15:      Hashtbl.add cache (n, s, m) r; r;;

Now we can use the above function. For example if we look at matrices that have roughly half elements equal to one and half equal to zero then thair partitions are given by:

  17:let possib n = count n (n * n / 2) n;;

It turns out that possib 18 is about 100 millions, possib 19 is about 400 millions, and possib 20 overflows 231. So there is no chance to generate all possible row sums for matrices of up to 59 rows :( I need a solution specific to the score I'm using for row/column sums...

No comments:

Post a Comment

Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.