Filesystem Amarel Holmes Lab

Date: April 1, 2025 4:09 PM

Overview

This is a tutorial for norms of Holmes Lab’s file-space on Amarel

There are 3 main priorities: security, organization and functionality. If you have any suggestions for amendments to this structure, please contact Avram or the lab manager.

Jump to section:

Security

Only lab members and certain approved collaborators have access to the /f_ah1491_1 spaces. To get access to this file space, email help@oarc.rutgers.edu and request to be added to the holmes lab folder (/projects/f_ah1491_1) and the CAHBIR partition (p_dz268_1). You will also be added to the usergroup /projectsp/f_ah1491_1. (To get a collaborator access to Amarel, follow this tutorial.)

New folders should be created giving only read permissions to others, unless it’s a lab-general folder. Raw data should be read-only whenever possible.

Functionality

The main concern of our filesystem functionality is space and filecount limits. We are always near the limit of both, and so we use the /scratch/f_ah1491_1 for everything where weekly backups are not necessary.

Stored on /scratch/f_ah1491

open source datasets
datasets which are backed up on flywheel (internal datasets)
Data analysis processes should be done on scratch and then only needed outputs should be stored on the /projects/f_ah1491_1 space.

Stored in /projects/f_ah1491_1

Files in the /projects/f_ah1491_1 space should be compressed when possible.

General Process

Find the data you want to use in /scratch
Create or modify scripts you want to use in /projects and have them pull from data in scratch
Move script to /scratch if it makes intermediate files
Run script in scratch (or /projects if it doesn’t create intermediate files locally and you can save outputs/intermediate files to scratch)
Move output files (not intermediates/unused files) to the appropriate folder in /projects

The general approach for using /scratch is to copy your job’s files (input files, libraries, etc.) to /scratch, run your job and write output files to /scratch, then move the files you need to save to your /projects directory.

scp my-dir-of-output-files.tar.gz [gc563@amarel.rutgers.edu](mailto:gc563@amarel.rutgers.edu):/home/gc563

scp – Securely copies files back to your /home directory.
my-dir-of-output-files.tar.gz – The output archive.
/home/gc563 – The destination folder.

File Structures

The lab follows BIDS file structuring as much as possible for the organization of data files (see “BIDS” section below for details). Documentation, scripts and user folders are choices made by Holmes Lab.

/scratch/f_ah1491_1/

100 TB storage
Unlimited file count
Not backed up
Never deleted
Lab general

structure:

/scratch/f_ah1491_1
│── open_data/
│── internal_data/
│── users/

/internal_data or /open_data

/project_name/
│
├── bids/                     # BIDS-compliant raw dataset
│   ├── sub-01/
│   ├── sub-02/
│   ├── dataset_description.json
│   ├── participants.tsv
│   └── ...
│
├── derivatives/                  # BIDS derivatives (preprocessed data)
│   ├── fmriprep/                 # fMRIPrep outputs (preprocessed fMRI)
│   │   ├── sub-01/
│   │   ├── sub-02/
│   │   ├── logs/
│   │   ├── figures/
│   │   ├── reports/
│   │   ├── dataset_description.json
│   │   └── ...
│   │
│   ├── freesurfer/               # FreeSurfer outputs
│   │   ├── sub-01/
│   │   ├── sub-02/
│   │   ├── scripts/
│   │   ├── subjects/             # Standard FreeSurfer subjects directory
│   │   ├── dataset_description.json
│   │   └── ...

/users

structure:

/projects/f_ah1491_1/users/
│── <user_name>/
│── <user_name>/
│── <user_name>/

/projects/f_ah1491_1/

100 TB storage
5 million file count limit
Backed up frequently
Never deleted
Lab general

structure:

/projects/f_ah1491_1
│── open_data/
│── internal_data/
│── analysis_tools/
│── documentation/
│── users/

/internal_data or /open_data

structure:

/project_name/
│
├── behavioral/                     
│   ├── mindlamp/
│   ├── qualtrics/
│   ├── testmybrain/
│   ├── dataset_description.json
│   └── ...
├── processed_data/    
│   ├── timeseries/
│   └── ...
├── results/              
│   ├── connectivity/
│   ├── GLM_analysis/
│   ├── ICA_components , nmf , loadings, /
│   ├── dataset_description.json
│   ├── carrisa
	│   ├── ICA_components , nmf , loadings, /
│   └── ...
│   
├── results/                  # Processed results (stats, figures)
│   ├── group_level/
│   ├── individual_subjects/
│   ├── figures/
│   ├── tables/
│   ├── dataset_description.json
│   └── ...
│
├── scripts/                         # Scripts and notebooks
│   ├── preprocessing/            # fMRIPrep/FreeSurfer prep scripts
│   ├── analysis/                 # GLM, connectivity, clustering scripts
│   ├── visualization/            # Scripts for visualizing results
│   ├── utils/                    # Helper functions
│   ├── README.md                ## github explanation ssh 
│   └── ...
│
├── docs/                         # Documentation for project
│   ├── README.md
│   ├── methods.md
│   ├── references/
│   └── ...
│
└── logs/                         # Log files for pipeline runs
    ├── preprocessing_logs/
    ├── analysis_logs/
    ├── errors/
    ├── dataset_description.json
    └── ...

/analysis_tools

structure:

/projects/f_ah1491_1/analysis_tools/
│── preprocessing/      # fMRIprep, Freesurfer, DWI processing
│── analysis/           # Statistical analysis, GLMs, MVPA
│── visualization/      # Brain plots, ROI maps
│── utilities/          # Helper scripts (e.g., BIDS conversion, file renaming)
│── README.md          # Script usage guidelines

/documentation

structure:

/projects/f_ah1491_1/documentation/
│── guidelines.md      # Data management policies
│── tutorials/         # Pipeline guides, usage instructions
│── changelog.md       # Updates and notes on processing

/users

structure:

/projects/f_ah1491_1/users/
│── <user_name>/

BIDS Structure

The Brain Imaging Data Structure is a consistent file structure for neuroimaging. The specification can be browsed online in the BIDS specification. Below is an overview based on file types used in Holmes Lab studies.

BIDS compliance can be checked on https://bids-standard.github.io/bids-validator/

Naming specifications:

Subject IDs are “sub-” + the full ID. (ie ID “PCR200” → “sub-PCR200”)
Session labels are 2 digits, so the first session is “ses-01”
All naming conventions can be found in the BIDS specification

sub-001		
└── ses-10
		├── anat
		│   ├── sub-101_ses-10_T1w.json
		│   ├── sub-101_ses-10_T1w.nii.gz
		│   ├── sub-101_ses-10_T2w.json
		│   └── sub-101_ses-10_T2w.nii.gz
		├── fmap
		│   ├── sub-101_ses-10_dir-AP_epi.json
		│   ├── sub-101_ses-10_dir-AP_epi.nii.gz
		│   ├── sub-101_ses-10_dir-PA_epi.json
		│   └── sub-101_ses-10_dir-PA_epi.nii.gz
		├── func
		│   ├── sub-101_ses-10_task-language_run-01_bold.json
		│   ├── sub-101_ses-10_task-language_run-01_bold.nii.gz
		│   ├── sub-101_ses-10_task-language_run-01_events.tsv
		│   ├── sub-101_ses-10_task-language_run-01_sbref.json
		│   ├── sub-101_ses-10_task-language_run-01_sbref.nii.gz
		│   ├── sub-101_ses-10_task-rest_run-01_bold.json
		│   ├── sub-101_ses-10_task-rest_run-01_bold.nii.gz
		│   ├── sub-101_ses-10_task-rest_run-01_events.tsv
		│   ├── sub-101_ses-10_task-rest_run-01_sbref.json
		│   ├── sub-101_ses-10_task-rest_run-01_sbref.nii.gz
		│   ├── sub-101_ses-10_task-rest_run-02_bold.json
		│   ├── sub-101_ses-10_task-rest_run-02_bold.nii.gz
		│   ├── sub-101_ses-10_task-rest_run-02_events.tsv
		│   ├── sub-101_ses-10_task-rest_run-02_sbref.json
		│   └── sub-101_ses-10_task-rest_run-02_sbref.nii.gz
		├── sub-101_ses-10_scans.json
		└── sub-101_ses-10_scans.tsv
dataset_description.json
participants.tsv
task-rest_bold.json
task-flanker_bold.json
task-language_bold.json
task-elevator_bold.json
task-momentous_bold.json