Tuesday, February 15, 2022

Tips for uploading data to NCBI SRA

 ### Case: 

So, If have 20 bacterial samples, then I will have 20 x 2 = 40 fastq files for Illumina paired data and 20 x 1 = 20 fastq files for Nanopore data that I need to upload.

1) Create a Bioproject (Get the Bioproject accession)

2) Create a Biosample (Use Bioproject accession to fill the Microbe template (template depends on the organism bacterial in this case;can be downloaded from NCBI only as going through the steps)

3) For Illumina data 

   - Create a SRA submission 1

   - Need to fill the metadata template using the SAMN numbers generated from Biosample submission.

   - Upload the data either using aspera or FTP.

4) For Nanopore data 

   - Create a SRA submission 2

   - Need to fill the metadata template using the SAMN numbers generated from Biosample submission.

   - Upload the data either using aspera or FTP.



# Pointers:

1. Always best to create a bioproject first and get PRJNA number

2. Then create a BioSample project - in the Microbe1.0 excel provide the Bioproject (PRJNA) number

3. Then do the SRA submission.

4. For the SRA submission, if you have two types of data such as Ilumina and Nanopore sequences for the same sample, then in the SRA_meta_acc file have the Illumina information and followed by nanopore details. 

5. The SAM number (Biosample number) will be same for both Illumina and Nanopore data

6. If your data samples are more than 1000, then single submission is not possible in NCBI. So you have to split the data accordingly.

7. aspera command line seems a bit confusing to use, ftp seems to be straightforward. Just need to get username password from SRA submission

8. Advantage with ftp is: Even if sometimes the internet connection drops, we can resume from where transfer stopped

9. We cannot upload half of our data with FTP and half of our data aspera. It is either aspera or FTP.