Text Analysis with tm and wordcloud

Text analysis is vital for extracting insights from unstructured data. Here, we’ll analyze a simple text corpus.

Step 1: Install and Load Required Packages

rCopy codeinstall.packages("tm")
install.packages("wordcloud")
library(tm)
library(wordcloud)

Step 2: Create a Sample Text Corpus

rCopy code# Create a sample text corpus
texts <- c("R is great for data analysis.",
           "Data science is an exciting field.",
           "R and Python are popular programming languages.",
           "Data visualization is key to understanding data.")

# Create a Corpus
corpus <- Corpus(VectorSource(texts))

# Preprocess the text (convert to lower case, remove punctuation)
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeWords, stopwords("en"))

Step 3: Create a Term-Document Matrix

rCopy code# Create a term-document matrix
tdm <- TermDocumentMatrix(corpus)
tdm_matrix <- as.matrix(tdm)
word_freqs <- sort(rowSums(tdm_matrix), decreasing = TRUE)
word_freqs_df <- data.frame(word = names(word_freqs), freq = word_freqs)

Step 4: Generate a Word Cloud

rCopy code# Create a word cloud
set.seed(1234)
wordcloud(words = word_freqs_df$word, freq = word_freqs_df$freq, min.freq = 1,
          max.words = 100, random.order = FALSE, rot.per = 0.35,
          colors = brewer.pal(8, "Dark2"))

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *