Text analysis is vital for extracting insights from unstructured data. Here, we’ll analyze a simple text corpus.
Step 1: Install and Load Required Packages
rCopy codeinstall.packages("tm")
install.packages("wordcloud")
library(tm)
library(wordcloud)
Step 2: Create a Sample Text Corpus
rCopy code# Create a sample text corpus
texts <- c("R is great for data analysis.",
"Data science is an exciting field.",
"R and Python are popular programming languages.",
"Data visualization is key to understanding data.")
# Create a Corpus
corpus <- Corpus(VectorSource(texts))
# Preprocess the text (convert to lower case, remove punctuation)
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeWords, stopwords("en"))
Step 3: Create a Term-Document Matrix
rCopy code# Create a term-document matrix
tdm <- TermDocumentMatrix(corpus)
tdm_matrix <- as.matrix(tdm)
word_freqs <- sort(rowSums(tdm_matrix), decreasing = TRUE)
word_freqs_df <- data.frame(word = names(word_freqs), freq = word_freqs)
Step 4: Generate a Word Cloud
rCopy code# Create a word cloud
set.seed(1234)
wordcloud(words = word_freqs_df$word, freq = word_freqs_df$freq, min.freq = 1,
max.words = 100, random.order = FALSE, rot.per = 0.35,
colors = brewer.pal(8, "Dark2"))
Leave a Reply