The Gene Ontology (GO) project provides structured description for genes known biological information at different levels. The controlled vocabularies (fixed words) describe gene function at 3 levels:
These terms are species independent and hierarchically structured (see more information here).
Enrichment analysis is a test to see a small subset of genes when sampled from large set of genes (reference set), what is the probability that small subset of genes (or statistically large proportion of subset genes) belong to a functional category as opposed to a randomly sampled subset of genes. This is normally done using either hypergeometric test (test without replacement) or using binomial test (with replacement).
Before you begin, you need to have 2 important files.
go.obo
file from the downloads page)Once you have these 2 files, you can run Ontologizer. Ontologizer is a command line tool which very effecient when you have large number of list to analyze. You need to download the jar
file to run this
The command:
1
2
3
4
5
6
7
8
9
java -jar Ontologizer.jar \
--association association.anno \
--go gene_ontology.obo \
--studyset your_input_list.txt \
--population population.txt \
--calculation Parent-Child-Union \
--mtc Westfall-Young-Single-Step \
--dot 0.05 \
--resamplingsteps 1000
Here the population.txt
is basically the full list of genes that you have in anno file. The only time you need to change this is when you have a different background set to test. You can also play around the other settings like --mtc
, --resamplingsteps
to optimize. With the --dot
it will also generate a dot
file, that can be used with GraphViz
to show the pathway where these genes are enriched.
The dot
file is created after the run is complete. You can convert this to png
format to view them easily:
1
dot -Tpng input.dot -o output.png