Last Update: 7 Dec 2018
RMarkdown: generate_heatmaps.Rmd

Clustering and Heatmap generation using R

Clustering and heatmap helps us to visualize trends in large dataset. Here, this method will describe how to create one in R

Preparing the dataset

Ideal dataset for heatmap is a matrix (preferably as a csv file), where there are rows and columns of data, like this:

1
2
3
4
5
6
7
8
9
10
11
12
$ cat test_dataset.txt
        A   B   C   D   E   F
Row_1   97  14  99  65  57  29
Row_2   56  18  50  8   58  98
Row_3   58  58  43  936 47  78
Row_4   2   55  5   65  48  5
Row_5   45  87  47  47  88  47
Row_6   69  6   69  55  52  88
Row_7   8   32  58  23  23  7
Row_8   6   3   21  32  3   87
Row_9   37  98  36  12  36  7
Row_10  85  75  34  25  2   9

This is a tab separated data, simply use the tr to make it csv file

1
cat test_dataset.txt | tr '\t' ',' > test_dataset.csv

The new data file should look like this:

1
2
3
4
5
6
7
8
9
10
11
12
$ cat test_dataset.csv
,A,B,C,D,E,F
Row_1,97,14,99,65,57,29
Row_2,56,18,50,8,58,98
Row_3,58,58,43,936,47,78
Row_4,2,55,5,65,48,5
Row_5,45,87,47,47,88,47
Row_6,69,6,69,55,52,88
Row_7,8,32,58,23,23,7
Row_8,6,3,21,32,3,87
Row_9,37,98,36,12,36,7
Row_10,85,75,34,25,2,9

Creating the heatmap

Open R in either Windows/Linux/Mac. You will need following packages for this tutorial. If you don’t have them, please install using the following commands:

1
2
3
install.packages(c("gplots", "vegan", "RColorBrewer", "Cairo"))
install.packages("BiocManager")
BiocManager::install("Heatplus")

Once done, load the necessary libraries to make sure they were installed correctly.

1
2
3
4
5
6
# load required packages
library(gplots)
library(Heatplus)
library(vegan)
library(RColorBrewer)
library(Cairo)

Now we can load the data and generate the heatmap.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# import the data file as matrix
INFILE = "data/test_dataset.csv"       # Path to CSV file
all.data <- as.matrix(read.csv (INFILE, sep=",", row.names=1))


all.data[1:3, 1:4]                        # take a look at your data
#>        A  B  C   D
#> Row_1 97 14 99  65
#> Row_2 56 18 50   8
#> Row_3 58 58 43 936
data.prop <- all.data/rowSums(all.data)   # convert the data into proportions, you can also log transform your data
data.prop[1:3, 1:3]                       # view transformed data
#>                A          B         C
#> Row_1 0.26869806 0.03878116 0.2742382
#> Row_2 0.19444444 0.06250000 0.1736111
#> Row_3 0.04754098 0.04754098 0.0352459

# create a color gradient for heatmap
scaleyellowred <- colorRampPalette(c("lightyellow", "red"), space = "rgb")(100)
# generate a heatmap without clustering
heatmap(as.matrix(data.prop), Rowv = NA, Colv = NA, col = scaleyellowred, margins = c(10, 2))

Add clustering.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Cluster the data, first rows
data.dist <- vegdist(data.prop, method = "bray")
row.clus <- hclust(data.dist, "aver")
# then columns, you can skip one of the clustering if you don't want to mess a particular row/column
data.dist.g <- vegdist(t(data.prop), method = "bray")
col.clus <- hclust(data.dist.g, "aver")
# generate a heatmap
p <- heatmap(as.matrix(data.prop),
   Rowv = as.dendrogram(row.clus),
   Colv = as.dendrogram(col.clus),
   col = scaleyellowred,
   margins = c(7, 8),
   xlab = "Column names",
   ylab = "Row names",
   main = "Title", pch=10)

The image is stored as plot p and can be saved as a scalabe vector.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# to save it as scalable vector graphics (change svg to png, pdf etc if you need other formats)
svg("heatmap.svg")
p
#> $rowInd
#>  [1]  3  4  8  1  2  6 10  7  5  9
#> 
#> $colInd
#> [1] 6 4 2 5 1 3
#> 
#> $Rowv
#> NULL
#> 
#> $Colv
#> NULL
dev.off()
#> quartz_off_screen 
#>                 2

You can customize the axis titles, chart title by modifying the commands above.