Day 2 – Data Wrangling | Data Analysis for Neuroscientists

Topics Covered

dplyr Verbs & Pipes

Master filter, select, mutate, rename, summarise, and arrange — chain them with the pipe operators %>% and |>.

Grouping & Counting

Aggregate data with group_by() + summarise(), and quickly tally categories with count() and tally().

Reshaping Data

Transform between wide and long formats using pivot_longer() and pivot_wider().

Missing Data

Detect, visualise, and handle NAs using naniar (gg_miss_var, gg_miss_upset) and visdat (vis_miss).

Key Concepts

Filtering rows with logical conditions: ==, |, %in%, and range checks
Selecting and dropping columns — including helpers like starts_with(), ends_with(), contains()
Creating new columns with mutate() — ratios, unit conversions, and conditional labels via if_else()
Group-wise summaries: mean, max, sd with group_by() + summarise()
Reshaping: wide → long with pivot_longer(), long → wide with pivot_wider()
Diagnosing missingness: vis_miss() for bird's eye view, gg_miss_var() for bar charts, gg_miss_upset() for co-occurrence patterns
Cleaning data with drop_na() after understanding the missingness structure

Scripts

📄 missing_data_viz.R

Missing data analysis with naniar & visdat — vis_miss() heatmap, gg_miss_var() bar chart, and gg_miss_upset() co-occurrence plot.

Download →

📄 dplyr_verbs.R

Core dplyr verbs on Palmer Penguins — filter, select, rename, mutate, group_by + summarise, count/tally, and pivot_longer/pivot_wider.

Download →

📄 data_wrangling_exercises.R

Hands-on exercises with chaining, OR logic, %in%, helper selectors, and multi-column mutate — includes 🧪 Try-It challenges.

Download →

Visual

Missing data visualisation generated by missing_data_viz.R

Homework

Filter penguins whose flipper length is between 190 and 210 mm (🧪 from day2_data_wrangling.R)
Select only columns that end with "_mm" using helper functions
Create a column called bill_area = bill_length_mm × bill_depth_mm
Run vis_miss() and gg_miss_upset() on the penguins dataset and interpret the patterns
Use drop_na() and compare the cleaned dataset dimensions to the original

Resources

Homework Downloads

📄 data_manip_intro_tidyverse.R

Introduction to tidy data manipulation — pipes, filter, select, mutate, and core dplyr workflows.

Download →

📄 data_manip_efficient_tidyverse.R

Efficient tidyverse patterns — across(), where(), rowwise operations, and performance-conscious wrangling.

Download →

📄 data_manip_advanced_tidyverse.R

Advanced tidyverse techniques — joins, nested data, purrr mappings, and complex reshaping pipelines.

Download →

Topics Covered

dplyr Verbs & Pipes

Grouping & Counting

Reshaping Data

Missing Data

Key Concepts

Scripts

📄 missing_data_viz.R

📄 dplyr_verbs.R

📄 data_wrangling_exercises.R

Visual

Homework

Resources

Homework Downloads

📄 data_manip_intro_tidyverse.R

📄 data_manip_efficient_tidyverse.R

📄 data_manip_advanced_tidyverse.R

🗺️ Geographic Quirk