Data Manipulation with dplyr

In this example, we’ll use the dplyr package for data manipulation. We’ll filter, summarize, and arrange data.

Step 1: Install and Load dplyr

If you don’t have dplyr installed yet, you can install it with:

rCopy codeinstall.packages("dplyr")

Then, load the library:

rCopy codelibrary(dplyr)

Step 2: Create a Sample Dataset

We’ll continue using the previous dataset or create a new one:

rCopy code# Create a sample dataset
set.seed(456)
data <- data.frame(
  id = 1:100,
  age = sample(18:65, 100, replace = TRUE),
  height = rnorm(100, mean = 170, sd = 10),
  weight = rnorm(100, mean = 70, sd = 15)
)

Step 3: Data Manipulation

  1. Filtering Data: Let’s filter individuals who are above 30 years old.
rCopy code# Filter data for individuals older than 30
filtered_data <- data %>% filter(age > 30)
head(filtered_data)
  1. Summarizing Data: We can calculate the average height and weight for this filtered group.
rCopy code# Summarize to get mean height and weight for individuals older than 30
summary_stats <- filtered_data %>%
  summarize(
    mean_height = mean(height),
    mean_weight = mean(weight),
    count = n()
  )
print(summary_stats)
  1. Arranging Data: Sort the dataset by height in descending order.
rCopy code# Arrange data by height in descending order
arranged_data <- data %>% arrange(desc(height))
head(arranged_data)

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *