Identifying Transcriptional Modules using Subspace Clustering
      - Amit Sinha

    Complex biological functions carried out by an organism are driven by a relatively small and finite set of genes. This is achieved in part by combinatorial action of genes wherein a subset of genes work as a module to carry out a cellular process. Any method which seeks to find enrichment of single genes in a sample of interest is likely to be ineffective. The problem is even more challenging as these modules overlap with each other which leads to an exponential search space. Subspace clustering is an effective method for mining such data. In this work, we address the specific problem of finding transcriptional modules - where a set of transcription factor genes drive the expression of a set of target genes. Since the search space of a subspace clustering is exponential, a large number of patterns are found, many of which may occur just by chance. To filter such noise, we devise a method of using a random background data set to find patterns specific to the samples of interest, without increasing the complexity of the algorithm. We ran our algorithm on a data set of tumor protein p53 binding sites. We were able to identify many known binding partners of p53. We also found several others proteins which have been speculated to be a co-factor of p53 by experimental methods.

Last updated: November 7, 2009.