Getting Started with bidser

Introduction to bidser

bidser is an R package designed for working with neuroimaging data organized according to the Brain Imaging Data Structure (BIDS) standard. BIDS is a specification that describes how to organize and name neuroimaging and behavioral data, making datasets more accessible, shareable, and easier to analyze.

What is BIDS?

BIDS organizes data into a hierarchical folder structure with standardized naming conventions:

Subjects are identified by folders named sub-XX
Sessions (optional) are identified by folders named ses-XX
Data types are organized into modality-specific folders (anat, func, dwi, etc.)
Files follow specific naming patterns that encode metadata (subject, session, task, run, etc.)

What does bidser do?

bidser provides tools to:

Query and filter files based on BIDS metadata (subject, task, run, etc.)
Read event files that describe experimental paradigms
Work with fMRIPrep derivatives for preprocessed data
Navigate complex BIDS hierarchies without manually constructing file paths

Let’s explore these capabilities using a real BIDS dataset.

Loading a BIDS Dataset

We’ll use the ds001 dataset from the BIDS examples, which contains data from a “Balloon Analog Risk Task” experiment with 16 subjects.

proj
#> BIDS Project Summary 
#> Project Name:  bids_example_ds001 
#> Participants (n):  16 
#> Tasks:  balloonanalogrisktask 
#> Image Types:  anat, func 
#> Modalities:  (none) 
#> Keys:  folder, kind, relative_path, subid, suffix, type, run, task

The bids_project object provides a high-level interface to the dataset. We can see it contains 16 subjects with both anatomical and functional data.

Basic Dataset Queries

Dataset Structure

Let’s explore the basic structure of this dataset:

# Check if the dataset has multiple sessions per subject
sessions(proj)
#> NULL

# Get all participant IDs
participants(proj)
#>  [1] "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12" "13" "14" "15"
#> [16] "16"

# What tasks are included?
tasks(proj)
#> [1] "balloonanalogrisktask"

# Get a summary of the dataset
bids_summary(proj)
#> $n_subjects
#> [1] 16
#> 
#> $n_sessions
#> NULL
#> 
#> $tasks
#> # A tibble: 1 × 2
#>   task                  n_runs
#>   <chr>                  <int>
#> 1 balloonanalogrisktask      3
#> 
#> $total_runs
#> [1] 3

Finding Files by Type

bidser provides several ways to find files. Let’s start with the most common neuroimaging file types:

# Find all anatomical T1-weighted images
t1w_files <- search_files(proj, regex = "T1w\\.nii", full_path = FALSE)
head(t1w_files)
#> [1] "sub-01/anat/sub-01_T1w.nii.gz" "sub-02/anat/sub-02_T1w.nii.gz"
#> [3] "sub-03/anat/sub-03_T1w.nii.gz" "sub-04/anat/sub-04_T1w.nii.gz"
#> [5] "sub-05/anat/sub-05_T1w.nii.gz" "sub-06/anat/sub-06_T1w.nii.gz"

# Find all functional BOLD scans
bold_files <- func_scans(proj, full_path = FALSE)
head(bold_files)
#> [1] "sub-01/func/sub-01_task-balloonanalogrisktask_run-01_bold.nii.gz"
#> [2] "sub-01/func/sub-01_task-balloonanalogrisktask_run-02_bold.nii.gz"
#> [3] "sub-01/func/sub-01_task-balloonanalogrisktask_run-03_bold.nii.gz"
#> [4] "sub-02/func/sub-02_task-balloonanalogrisktask_run-01_bold.nii.gz"
#> [5] "sub-02/func/sub-02_task-balloonanalogrisktask_run-02_bold.nii.gz"
#> [6] "sub-02/func/sub-02_task-balloonanalogrisktask_run-03_bold.nii.gz"

Filtering by Subject and Task

One of bidser’s key strengths is filtering data by BIDS metadata:

# Get functional scans for specific subjects
sub01_scans <- func_scans(proj, subid = "01")
sub02_scans <- func_scans(proj, subid = "02")

cat("Subject 01:", length(sub01_scans), "scans\n")
#> Subject 01: 3 scans
cat("Subject 02:", length(sub02_scans), "scans\n")
#> Subject 02: 3 scans

# Filter by task (ds001 only has one task, but this shows the syntax)
task_scans <- func_scans(proj, task = "balloonanalogrisktask")
cat("Balloon task:", length(task_scans), "scans total\n")
#> Balloon task: 48 scans total

# Combine filters: specific subject AND task
sub01_task_scans <- func_scans(proj, subid = "01", task = "balloonanalogrisktask")
cat("Subject 01, balloon task:", length(sub01_task_scans), "scans\n")
#> Subject 01, balloon task: 3 scans

Working with Multiple Subjects

You can use regular expressions to select multiple subjects at once:

# Get scans for subjects 01, 02, and 03
first_three_scans <- func_scans(proj, subid = "0[123]")
cat("First 3 subjects:", length(first_three_scans), "scans total\n")
#> First 3 subjects: 9 scans total

# Get scans for all subjects (equivalent to default)
all_scans <- func_scans(proj, subid = ".*")
cat("All subjects:", length(all_scans), "scans total\n")
#> All subjects: 48 scans total

Working with Event Files

Event files describe the experimental paradigm - when stimuli were presented, what responses occurred, etc. This is crucial for task-based fMRI analysis.

# Find all event files
event_file_paths <- event_files(proj)
cat("Found", length(event_file_paths), "event files\n")
#> Found 48 event files

# Read event data into a nested data frame
events_data <- read_events(proj)
events_data
#> # A tibble: 48 × 5
#> # Groups:   .task, .session, .run, .subid [48]
#>    .subid .session .run  .task                 data              
#>    <chr>  <chr>    <chr> <chr>                 <list>            
#>  1 01     <NA>     01    balloonanalogrisktask <tibble [158 × 2]>
#>  2 01     <NA>     02    balloonanalogrisktask <tibble [156 × 2]>
#>  3 01     <NA>     03    balloonanalogrisktask <tibble [149 × 2]>
#>  4 02     <NA>     01    balloonanalogrisktask <tibble [185 × 2]>
#>  5 02     <NA>     02    balloonanalogrisktask <tibble [184 × 2]>
#>  6 02     <NA>     03    balloonanalogrisktask <tibble [186 × 2]>
#>  7 03     <NA>     01    balloonanalogrisktask <tibble [150 × 2]>
#>  8 03     <NA>     02    balloonanalogrisktask <tibble [169 × 2]>
#>  9 03     <NA>     03    balloonanalogrisktask <tibble [175 × 2]>
#> 10 04     <NA>     01    balloonanalogrisktask <tibble [166 × 2]>
#> # ℹ 38 more rows

Let’s explore the event data structure:

# Unnest events for subject 01
first_subject_events <- events_data %>%
  filter(.subid == "01") %>%
  unnest(cols = c(data))

head(first_subject_events)
#> # A tibble: 6 × 6
#> # Groups:   .task, .session, .run, .subid [1]
#>   .subid .session .run  .task                 onset\tduration\ttrial_typ…¹ .file
#>   <chr>  <chr>    <chr> <chr>                 <chr>                        <chr>
#> 1 01     <NA>     01    balloonanalogrisktask "0.061\t0.772\tpumps_demean… /pri…
#> 2 01     <NA>     01    balloonanalogrisktask "4.958\t0.772\tpumps_demean… /pri…
#> 3 01     <NA>     01    balloonanalogrisktask "7.179\t0.772\tpumps_demean… /pri…
#> 4 01     <NA>     01    balloonanalogrisktask "10.416\t0.772\tpumps_demea… /pri…
#> 5 01     <NA>     01    balloonanalogrisktask "13.419\t0.772\tpumps_demea… /pri…
#> 6 01     <NA>     01    balloonanalogrisktask "16.754\t0.772\texplode_dem… /pri…
#> # ℹ abbreviated name:
#> #   ¹`onset\tduration\ttrial_type\tcash_demean\tcontrol_pumps_demean\texplode_demean\tpumps_demean\tresponse_time`
names(first_subject_events)
#> [1] ".subid"                                                                                                     
#> [2] ".session"                                                                                                   
#> [3] ".run"                                                                                                       
#> [4] ".task"                                                                                                      
#> [5] "onset\tduration\ttrial_type\tcash_demean\tcontrol_pumps_demean\texplode_demean\tpumps_demean\tresponse_time"
#> [6] ".file"

Analyzing Event Data

Let’s do some basic exploration of the experimental design:

# How many trials per subject?
trial_counts <- events_data %>%
  unnest(cols = c(data)) %>%
  group_by(.subid) %>%
  summarise(n_trials = n(), .groups = "drop")

trial_counts
#> # A tibble: 16 × 2
#>    .subid n_trials
#>    <chr>     <int>
#>  1 01          463
#>  2 02          555
#>  3 03          494
#>  4 04          510
#>  5 05          419
#>  6 06          536
#>  7 07          492
#>  8 08          494
#>  9 09          497
#> 10 10          521
#> 11 11          471
#> 12 12          453
#> 13 13          485
#> 14 14          503
#> 15 15          411
#> 16 16          419

Working with Individual Subjects

The bids_subject() function provides a convenient interface for working with data from a single subject. It returns a lightweight object with helper functions that automatically filter data for that subject.

# Create a subject-specific interface for subject 01
subject_01 <- bids_subject(proj, "01")

# Get all functional scans for this subject
sub01_scans <- subject_01$scans()
cat("Subject 01:", length(sub01_scans), "functional scans\n")
#> Subject 01: 3 functional scans

# Get event files for this subject
sub01_events <- subject_01$events()
cat("Subject 01:", length(sub01_events), "event files\n")
#> Subject 01: 5 event files

# Read event data for this subject
sub01_event_data <- subject_01$events()
sub01_event_data
#> # A tibble: 3 × 5
#> # Groups:   .task, .session, .run, .subid [3]
#>   .subid .session .run  .task                 data              
#>   <chr>  <chr>    <chr> <chr>                 <list>            
#> 1 01     <NA>     01    balloonanalogrisktask <tibble [158 × 2]>
#> 2 01     <NA>     02    balloonanalogrisktask <tibble [156 × 2]>
#> 3 01     <NA>     03    balloonanalogrisktask <tibble [149 × 2]>

This approach is particularly useful when you’re doing subject-level analyses:

subjects_to_analyze <- c("01", "02", "03")

for (subj_id in subjects_to_analyze) {
  subj <- bids_subject(proj, subj_id)
  scans <- subj$scans()
  events <- subj$events()
  cat(sprintf("Subject %s: %d scans, %d event files\n",
              subj_id, length(scans), length(events)))
}
#> Subject 01: 3 scans, 5 event files
#> Subject 02: 3 scans, 5 event files
#> Subject 03: 3 scans, 5 event files

The subject interface makes it easy to write analysis pipelines that iterate over subjects without manually constructing filters:

subject_trial_summary <- lapply(participants(proj)[1:3], function(subj_id) {
  subj <- bids_subject(proj, subj_id)
  event_data <- subj$events()
  n_trials <- if (nrow(event_data) > 0) {
    event_data %>% unnest(cols = c(data)) %>% nrow()
  } else {
    0
  }
  tibble(subject = subj_id, n_trials = n_trials, n_scans = length(subj$scans()))
}) %>% bind_rows()

subject_trial_summary
#> # A tibble: 3 × 3
#>   subject n_trials n_scans
#>   <chr>      <int>   <int>
#> 1 01           463       3
#> 2 02           555       3
#> 3 03           494       3

Advanced Querying

Custom File Searches

The search_files() function is very flexible for custom queries:

# Find all JSON sidecar files
json_files <- search_files(proj, regex = "\\.json$")
cat("Found", length(json_files), "JSON files\n")
#> Found 0 JSON files

# Find files for specific runs
run1_files <- search_files(proj, regex = "bold", run = "01")
cat("Found", length(run1_files), "files from run 01\n")
#> Found 16 files from run 01

# Complex pattern matching: T1w files for subjects 01-05
t1w_subset <- search_files(proj, regex = "T1w", subid = "0[1-5]")
cat("Found", length(t1w_subset), "T1w files for subjects 01-05\n")
#> Found 5 T1w files for subjects 01-05

Getting Full File Paths

Sometimes you need the complete file paths for analysis:

# Get full paths to functional scans for analysis
full_paths <- func_scans(proj, subid = "01", full_path = TRUE)
full_paths
#> [1] "/private/var/folders/9h/nkjq6vss7mqdl4ck7q1hd8ph0000gp/T/RtmpYEiio1/bids_example_ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-01_bold.nii.gz"
#> [2] "/private/var/folders/9h/nkjq6vss7mqdl4ck7q1hd8ph0000gp/T/RtmpYEiio1/bids_example_ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-02_bold.nii.gz"
#> [3] "/private/var/folders/9h/nkjq6vss7mqdl4ck7q1hd8ph0000gp/T/RtmpYEiio1/bids_example_ds001/sub-01/func/sub-01_task-balloonanalogrisktask_run-03_bold.nii.gz"

# Check that files actually exist
all(file.exists(full_paths))
#> [1] TRUE

Next Steps

This quickstart covered the basic functionality of bidser for querying BIDS datasets. For more advanced usage, see:

fMRIPrep integration: Working with preprocessed derivatives
Data loading: Reading neuroimaging data with neurobase or RNifti
Confound regression: Using physiological and motion regressors
Group analysis: Combining data across subjects efficiently

Reading files produced by FMRIPrep

If you have processed a dataset with FMRIPrep, bidser can be used to read in many of the resultant derivative files. If a project has an FMRIPrep derivatives folder, then we can read in the BIDS hierarchy plus derivatives as follows:

# Download an fMRIPrep example dataset
deriv_path <- get_example_bids_dataset("ds000001-fmriprep")
proj_deriv <- bids_project(deriv_path, fmriprep = TRUE)

proj_deriv

# Convenience functions for derivative files, e.g. preprocessed scans:
pscans <- preproc_scans(proj_deriv)
head(as.character(pscans))

# Read confound files
conf <- read_confounds(proj_deriv, subid = "01")