What dataset will we use throughout this text?

As much as possible, we will be using Arabidopsis data from multiple NCBI BioProject that contains datasets for many of the most common data analyses. The following BioProjects were chosen.

SeqType Platform ReadType BioProject Experiment
ChIP-seq Illumina single PRJNA316877 Requirement for flap endonuclease 1 (FEN1) to maintain genomic stability and transcriptional gene silencing in Arabidopsis
ChIP-seq Illumina paired PRJNA349052 Centromere location in Arabidopsis is unaltered by extreme divergence in CENH3 protein sequence
ATAC-seq Illumina paired PRJNA394532 ATAC-seq profiling of open chromatin in the root tips
RNA-seq Illumina single PRJNA312637 RNA-seq analysis of transcriptomes in cae2-1, CA1-1 and cae2-1 CA1-1 Arabidopsis genotypes
RNA-seq Illumina paired PRJNA348194 Analysis of gene expression in a ATRX loss-of-function line
ncRNA SOLiD single PRJNA169627 Deep sequencing of small RNAs
microRNA Illumina single PRJNA355875 Differential expression of microRNAs in wildtype versus DCL1 mutants in Arabidopsis thaliana
Long Reads PacBio long-reads PRJNA314706 Diploid Arabidopsis thaliana genome sequencing and assembly
DNAseq Illumina paired-end PRJEB13889 Genome stability under UV-B in Arabidopsis thaliana
DNAseq Illumina mate-pair SRX1434948 Arabidopsis thaliana Genome sequencing and assembly
16s-rRNA Illumina paired-end MG-RAST:4457768.3-4459735.3 Moving pictures of the human microbiome (“Moving Pictures” tutorial)
Shotgun metagenomics Illumina paired-end ERX2017035 A case of hepatic brucelloma studied by next generation sequencing

Why Arabidopsis?

Arabidopsis is one of several model organisms where significant amounts of data has been collected on a wide variety of bioinformatic data analysis problems. Additional example datasets from a variety of organism will also be provided as problem sets to explore.

