Adding text annotation to a clustering scatter plot (tSNE)

Multi tool use
Adding text annotation to a clustering scatter plot (tSNE)
I have XY
data (a 2D tSNE
embedding of high dimensional data) which I'd like to scatter plot
. The data are assigned to several cluster
s, so I'd like to color code the points by cluster
and then add a single label for each cluster
, that has the same color coding as the cluster
s, and is located outside (as much as possible) from the cluster
's points.
XY
tSNE
scatter plot
cluster
cluster
cluster
cluster
cluster
Any idea how to do this using R
in either ggplot2
and ggrepel
or plotly
?
R
ggplot2
ggrepel
plotly
Here's the example data (the XY
coordinates and cluster
assignments are in df
and the labels in label.df
) and the ggplot2
part of it:
XY
cluster
df
label.df
ggplot2
library(dplyr)
library(ggplot2)
set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)
label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))
ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none")
1 Answer
1
The geom_label_repel()
function in the ggrepel
package allows you to easily add labels to plots while trying to "repel" the labels from not overlapping with other elements. A slight addition to your existing code where we summarize the data / get coordinates of where to put the labels (here I chose the upper left'ish region of each cluster - which is the min of x and the max of y) and merge it with your existing data containing the cluster labels. Specify this data frame in the call to geom_label_repel()
and specify the variable that contains the label
aesthetic in aes()
.
geom_label_repel()
ggrepel
geom_label_repel()
label
aes()
library(dplyr)
library(ggplot2)
library(ggrepel)
set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)
label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))
label.df_2 <- df %>%
group_by(cluster) %>%
summarize(x = min(x), y = max(y)) %>%
left_join(label.df)
ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none") +
ggrepel::geom_label_repel(data = label.df_2, aes(label = label))
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.