The Gene Ontology (GO) project provides structured description for genes known biological information at different levels. The controlled vocabularies (fixed words) describe gene function at 3 levels:
These terms are species independent and hierarchically structured (see more information here).
Enrichment analysis is a test to see a small subset of genes when sampled from large set of genes (reference set), what is the probability that small subset of genes (or statistically large proportion of subset genes) belong to a functional category as opposed to a randomly sampled subset of genes. This is normally done using either hypergeometric test (test without replacement) or using binomial test (with replacement).
Before you begin, you need to have 2 important files.
go.obo file from the downloads page)Once you have these 2 files, you can run Ontologizer. Ontologizer is a command line tool which very effecient when you have large number of list to analyze. You need to download the jar file to run this
The command:
1
2
3
4
5
6
7
8
9
java -jar Ontologizer.jar \
--association association.anno \
--go gene_ontology.obo \
--studyset your_input_list.txt \
--population population.txt \
--calculation Parent-Child-Union \
--mtc Westfall-Young-Single-Step \
--dot 0.05 \
--resamplingsteps 1000
Here the population.txt is basically the full list of genes that you have in anno file. The only time you need to change this is when you have a different background set to test. You can also play around the other settings like --mtc, --resamplingsteps to optimize. With the --dot it will also generate a dot file, that can be used with GraphViz to show the pathway where these genes are enriched.
The dot file is created after the run is complete. You can convert this to png format to view them easily:
1
dot -Tpng input.dot -o output.png