Matroids and the Greedy Method

Suppose we are given a set E = {e1,e2,...,en} of n objects, each with associated positive integer weights w(e1), w(e2),..., w(en). Let I be a subset of the power set (the set of all subsets) of E with the following two properties:
  1. For every I1 in I, if I2 is a subet of I1 then I2 is in I.
  2. For every pair I1, I2 in I such that |I2| = |I1|+1, there is an element e in I2 - I1 which makes I1 U {e} in I.
Then the pair (E,I) is called a matroid. Members of I are referred to as independent sets. Independent sets containing the most elements are called maximum cardinality independent sets. The problem of finding a maximum cardinality independent set (Imax) which minimizes (or maximizes) the sum of w(e) over all e in Imax may be solved as follows (the Greedy Method):

Set T = emptyset
Order all objects in E by increasing (decreasing) weight
Repeat the following until E is empty
    Let e be the lowest (highest) weight element in E
    Remove e from E
    If T U {e} is in I then add e to T
Output T

Next we present two examples: the Maximum Spanning Network Problem and the Integer Deadline Scheduling Problem with unit processing times.

First consider the problem of finding a Minimum Spanning Tree (network) in a weighted graph G. Let E be the set of edges in the graph G. Let I be the set of all forests of G (that is, all graphs using edges of G which have no cycles). Then (E,I) is a matroid.

To see this consider the two properties which define a matroid. Property (1) is satisfied since removing any edge from a forest results in another forest. Property (2) is satisfied by the following argument: Start with any I1,I2 in I with |I2| = |I1|+1. Consider three possibilites.

  1. There is an edge e of I2 which has at most one vertex in common with a vertex that is an endpoint of an edge of I1. In this case I1 U {e} is a forest.
  2. There is an edge e of I2 which has one vertex common to a vertex in one connected component of I1 and the other vertex common to a vertex in another connected component of I1. In this case, since both connected components are trees and there are no other edges between the components, I1 U {e} is a forest.
  3. For each edge < v,w > of I2, v and w are in the same connected component of I1. This case is impossible since it would require that at least one connected component of I1 contain a number of edges of I2 greater than the number of edges of I1 within that component. This implies that there is a cycle among edges of I2 in that component and this implies I2 is not a forest (contrary to hypothesis). Therefore, we can always choose an e from I2-I1 such that I1 U {e} is in I.

    Now, the maximum cardinality forest of G is a spanning tree of G. Also, the weight of a spanning network in the Minimum Spanning Network problem is the sum of the edges in the network (a tree). That is, the weight of a maximum cardinality independent set is the sum of weights of objects in that set. Hence, we may use the greedy method above to solve the Minimum Spanning Network problem (which is what we have been doing all along). The only part that has to be filled in is how to check whether adding a particular edge to a forest T keeps a forest. But, this is easy, as we saw.

    The Integer Deadline Scheduling problem is stated as follows:

    Integer Deadline Scheduling
    Instance: A set A = {a1,a2,...,an} of n jobs, each requiring unit time to complete, and associated with each job a is an integer deadline d(a) and an integer profit p(a).

    Produce: A schedule [ao1,ao2,...,aon] for a single processor such that the sum of the profits of all jobs which are processed before their deadline is maximum over all schedules.

    The solution here is trickier than for the Minimum Spanning Tree problem since, first, we can order jobs based on their deadlines or their profits so we must decide on which to base the ordering and, second, we need a schedule (that is, a list) as output and not a subset. As you might suspect, ordering on profit is correct (since we are trying to maximize profit). We use T to hold a subset of jobs which can be scheduled in such a way that all jobs in the subset can be completed before their deadlines. The final solution is obtained by finding that schedule for T (easily done by ordering the jobs in T by increasing deadline) and tacking onto that schedule all jobs not in T in any order.