✍️ Tutorial: Data from NDA

Last updated: Jun 12, 2025

Database: National Data Archive at the United States National Institute of Health
Helpdesk Contact: ndahelp@mail.nih.gov

DARs

1) Have an eRA Profile

Make sure you have an eRA account. If you don’t have one, email the RBHS signing official, Gregory Werhner (gw266@research.rutgers.edu) to have one created for you
Ask your PI what institution they’re assigned to as a PI in NDA. Then put in your employment, making sure to select the same institution as they have.
1. For example, there are 2 versions of RBHS. “Rutgers Biomedical and Health Sciences”, and “Rutgers Biomedical/Health Sciences - RBHS”. There’s also “Rutgers, the State University of NJ” and others. Make sure to select the one that matches your PI
Change your contact email to your Rutgers email

2) Get Access to NDA

Create an NDA account (nda tutorial): Go to the NDA website: https://nda.nih.gov/. To create an account, click the “log in” button in the top right which will prompt a few “create account” options. Make sure to link to your eRA or login.gov which is listed with your institutional email
After/if you already have an account, go to “Profile” and make sure your institutional affiliation is the same as listed on your eRA account / your PI’s institutional affiliation.
1. If it’s not, email the helpdesk ndahelp@mail.nih.gov and ask it to be changed
  NOTE: It takes 48 hours for eRA institution changes to populate to NDA, so if you’ve just changed your eRA, wait to see if it populates to NDA.

3) Create a Data Access Request (DAR)

3.1) Request Access under a new Data Access Request (DAR) for an NDA dataset(s) you wish to use

If your PI doesn’t already have a DAR for the dataset you wish to use, you must request permissions. Additionally, if you wish to add or remove people from a DAR, you must submit a new DAR.

First, make sure your PI’s ‘Institutional affiliation’ is the same as yours
Have your PI go to Profile > Data Permissions > scroll down to the collection… > Request Access (PI must start a request to the data collection that holds the dataset you’re interested in)
Read the instructions carefully to fully understand the process and timeline. Make sure you have selected the correct institutional sponsorship, then click ‘Start Request’ to begin your application.
Fill project specific fields— example responses:
1. Name of Project: “Psychiatric connectomes”
  1. Not important, doesn’t have to match anywhere else
2. Research purpose: “Our research is looking at the relationship between biological biomarkers, such as MRI and fMRI, and behavioral markers such as clinical phenotypes and task-behavior. “
Fill in the other two fields— USE THIS LANGUAGE for these questions:
Data access plan: “The data will be stored on a secure server, in a password protected partition on the Rutgers University Amarel computing cluster. The folder will only be accessible by those listed in this DUA.”
Plan for deletion: “All data that has been downloaded from this dataset will be deleted from all our local or cloud-based machines when research is completed, or this DUC is expired, whichever comes first.“
Then add everyone who will access the data. On this page, make sure to add all individuals who will access, use, or analyze the data, regardless of their position title, or role in data usage. This includes any IT staff responsible for cleaning or managing the data.
1. If the person has an NDA account: but you don’t see them on the list of “Known affiliated user” you have to ask them to follow steps 1.1.-1.3.
  1. Make sure to select the email associated with your collaborator’s NDA account. Don’t press ‘add new user’ unless your collaborator doesn’t have an NDA account already
For “Signing Official”, select Chrissa Pappannoiu **cp847@ored.rutgers.edu
1. They will log into NDA to sign your eDAR.
2. Email Signing official:
  “Hi Chrissa, I’m a researcher in the Holmes Lab, PI: Avram Holmes, and we’re applying for access to the NDA dataset [Enter dataset name]. The Data Access Request is within NDA for your signature. Thanks so much and please let me know if you have any questions.”
Review the details of your eDAR to ensure everything is correct and click ‘Next’ to proceed.
After reading the Terms and Conditions, check the boxes to certify your agreement, then click ‘Confirm’ to complete the process.
Wait to get granted access (~10 business days)

3.2) Get added to your PI’s existing DAR for a collection/dataset

If additional individuals need data access after DAR approval, the lead recipient or new recipients must submit a new DAR using the “New Data Access Requests (DAR)” procedure (Steps in 3.1). The new DAR can reuse the same Research Data Use Statement if applicable.

3.3) Edit a DAR

Have the PI go to their Data Permissions dashboard and select ‘View Request’.
On the eDAR overview page, click ‘Edit’ in the top right corner.
A prompt will appear asking, “Are you sure you want to edit this Data Access Request?” Read the message carefully, then click ‘Yes’ to proceed.
Once confirmed, you can edit the DAR. Click ‘Start Request’ to proceed.

3.4) Renewing an eDAR

Log into your NDA account with your eRA Commons username and password. Navigate to your Data Permissions Dashboard and select ‘Renew Access’ under the ‘Actions’ menu for the eDAR you wish to renew.
In the Research Data Use Statement section, review the information and make any necessary updates. Once you’re done, click ‘Next’ to proceed.

Data Downloads

2) Creating Python environment for Data Downloads

Create Conda environment

#first create a conda environemnt for the NDA downloader tool. Might not be needed, but was considered good practise the milgram cluster 
conda create --name NDA_download  python=3.11
conda activate NDA_download

#Do this every time
#install nda-tools package:
pip install nda-tools

#see if it worked 
conda list

3) Downloading Data from NDA to Compute Cluster

go to nda.nih.gov
sign in
search the dataset you want
1. Get Data > Text Search > “dataset name”
select it from the search results and press “Add to Workspace”
see that the number of subjects in the “Filter Cart” in the upper right hand corner is the appropriate amount. If so, press ‘Create Data Package/Add Data to Study’
Press “Go to Data Packaging Page” (if it doesn’t automatically) and select from the checkboxes all the data you want
1. on the left “Collections By Permission Group” is the collections/datasets and all their iterations
2. on the right “Data Structure by Category” is the types of data within it
  1. deselect if you don’t want it
Press “Create Data Package” button
1. Clear name so you can reference back
2. Make SURE to check the box for “Include associated data files”
3. press “Create Data Package”
You should get a popup like this. Go to dashboard via link or go to your User Profile (https://nda.nih.gov/user/dashboard/profile.html) and click on Data Packages
Your packages should look like this, including the one you just made. The just-made one will say Status: “Creating Package” for a couple minutes, but then will populate with Status: “Ready to Download” and the size
When it says “ready to download” and the size seems right, copy the Package ID Number in the first column for the package you want to download
Open text editor (BBEdit, Textedit, VSCode, etc.) and paste this code
(downloadcmd is a command from the package nda-tools)
```
    
downloadcmd -dp 1225580 -d /projects/f_ah1491_1/Open_Data/NAPLS3 -wt 5
    
```

-wt = the number of files you download in parallel. You should use max 10.
change 1225580 to YOUR PACKAGE ID
change /projects/f_ah1491_1 to the FOLDER YOU WANT TO DOWNLOAD TO
1. Save this file as a NAME.sh file, and have NAME be relevant to the package you’re downloading
2. Go to terminal, input chmod u+rwx [NAME.sh](http://NAME.sh) to make sure you have execute permissions
3. Code Template:
Just out file
```
  /path/to/my_script.sh 1>/path/to/my_script.out &
```
Out file and error file
```
  /path/to/my_script.sh 1>/path/to/my_script.out 2>/path/to/my_script.err &
```
This will run your script in the background (&) and save out the terminal outputs into a file with the same name but a different extension (.out) and the error files into that name but (.err)

It should prompt you here for your username and password from NDA. Make sure these are the credentials that link to the account where you created the data package!
Once the job starts running, you can check it’s running and its progress by entering sacct
1. Your job should be listed in a table like this
2. your job is the one that says ‘main’ at the top, in this example it’s ‘shell_dca+’ (which is the beginning of the shellfile.sh name). this is the one that should be ‘RUNNING’ and ‘COMPLETED’. if this one fails, the ‘batch’ or ‘extern’ files may say ‘COMPLETED’, but your job has still failed

If it says “FAILED”:

Check the error file by displaying it, it’s called slurm.most.recent.err if you kept it as I have above, or it’s called however you defined in the shellfile.sh file
vi slurm.most.recent.err
(:q to quit the vi viewer)
or
cat slurm.most.recent.err

Now that you see the problem, you can try to fix it!

if it says ‘permission denied’, try making sure both .sh files are fully rwx accessible to you, and the folders that hold those files are as well
if it says ‘file cannot be found’, check all the filepaths in your files for errors, and make sure they line up exactly to where the files are
if it is a new error or you can’t figure it out, ask other members of your lab or Rutgers’ Office of Advanced Research Computing (help@oarc.rutgers.edu)

4) More Information:

PDF Tutorial: https://rutgers.box.com/s/fh8iv3luan3xonevzyijf185shpzavvm

Youtube Tutorial: NDA Data Access Webinar

https://www.youtube.com/watch?v=53P6hEy-zaM