What dataset will we use throughout this text?

As much as possible, we will be using Arabidopsis data from multiple NCBI BioProject that contains datasets for many of the most common data analyses. The following BioProjects were chosen.

SeqType	Platform	ReadType	BioProject	Experiment
ChIP-seq	Illumina	single	PRJNA316877	Requirement for flap endonuclease 1 (FEN1) to maintain genomic stability and transcriptional gene silencing in Arabidopsis
ChIP-seq	Illumina	paired	PRJNA349052	Centromere location in Arabidopsis is unaltered by extreme divergence in CENH3 protein sequence
ATAC-seq	Illumina	paired	PRJNA394532	ATAC-seq profiling of open chromatin in the root tips
RNA-seq	Illumina	single	PRJNA312637	RNA-seq analysis of transcriptomes in cae2-1, CA1-1 and cae2-1 CA1-1 Arabidopsis genotypes
RNA-seq	Illumina	paired	PRJNA348194	Analysis of gene expression in a ATRX loss-of-function line
ncRNA	SOLiD	single	PRJNA169627	Deep sequencing of small RNAs
microRNA	Illumina	single	PRJNA355875	Differential expression of microRNAs in wildtype versus DCL1 mutants in Arabidopsis thaliana
Long Reads	PacBio	long-reads	PRJNA314706	Diploid Arabidopsis thaliana genome sequencing and assembly
DNAseq	Illumina	paired-end	PRJEB13889	Genome stability under UV-B in Arabidopsis thaliana
DNAseq	Illumina	mate-pair	SRX1434948	Arabidopsis thaliana Genome sequencing and assembly
16s-rRNA	Illumina	paired-end	MG-RAST:4457768.3-4459735.3	Moving pictures of the human microbiome (“Moving Pictures” tutorial)
Shotgun metagenomics	Illumina	paired-end	ERX2017035	A case of hepatic brucelloma studied by next generation sequencing

Why Arabidopsis?

Arabidopsis is one of several model organisms where significant amounts of data has been collected on a wide variety of bioinformatic data analysis problems. Additional example datasets from a variety of organism will also be provided as problem sets to explore.

Previous Table of contents

Introduction to Data Acquisition

What dataset will we use throughout this text?

Why Arabidopsis?