|
Identifying Transcriptional Modules using
Subspace Clustering
- Amit Sinha
abstract
|
Complex biological functions carried out by an organism are driven by a
relatively small and finite set of genes. This is achieved in part by
combinatorial action of genes wherein a subset of genes work as a module
to carry out a cellular process. Any method which seeks to find
enrichment of single genes in a sample of interest is likely to be
ineffective. The problem is even more challenging as these modules
overlap with each other which leads to an exponential search space.
Subspace clustering is an effective method for mining such data. In this
work, we address the specific problem of finding transcriptional modules
- where a set of transcription factor genes drive the expression of a
set of target genes. Since the search space of a subspace clustering is
exponential, a large number of patterns are found, many of which may
occur just by chance. To filter such noise, we devise a method of using
a random background data set to find patterns specific to the samples of
interest, without increasing the complexity of the algorithm. We ran our
algorithm on a data set of tumor protein p53 binding sites. We were able
to identify many known binding partners of p53. We also found several
others proteins which have been speculated to be a co-factor of p53 by
experimental methods.
|
|