R is a powerful tool for data analysis, statistics, and visualization. In this short tutorial, we will walk through a simple example that covers:
- Loading data from a CSV file
- Cleaning missing values
- Summarizing data by category
- Creating a bar chart
Step 1: Install and Load Required Packages
We will use the dplyr
package for data manipulation and ggplot2
for plotting. Install them (once) and then load them into your R session.
install.packages("dplyr")
install.packages("ggplot2")
library(dplyr)
library(ggplot2)
Step 2: Load Your Dataset
We’ll assume you have a file named data.csv
in your working directory.
df <- read.csv("data.csv")
head(df)
Step 3: Clean the Data
Remove rows with missing values in key columns.
df_clean <- df %>%
filter(!is.na(category), !is.na(value))
Step 4: Summarize by Group
We calculate the mean value for each category.
summary_tbl <- df_clean %>%
group_by(category) %>%
summarize(mean_value = mean(value, na.rm = TRUE),
count = n())
summary_tbl
Step 5: Create a Bar Chart
Visualize the mean value by category using ggplot2
.
ggplot(summary_tbl, aes(x = category, y = mean_value)) +
geom_col(fill = "steelblue") +
labs(title = "Mean Value by Category",
x = "Category",
y = "Mean Value") +
theme_minimal()
Output
The result will be a bar chart showing the average value for each category in your dataset.
Next Steps
From here, you can:
- Try different plot types (scatter, line, histogram)
- Filter your data for specific conditions
- Export results to a CSV or PDF