abyssMG

M2: External MAGs

Blandine Trouche October 2023

The accession information for the external MAGs used can be found in Supplementary table S3. We used in total 35 external references: - 4 MAGS from the Mariana trench (Zhong et al., 2020) - 22 MAGs from the Mariana trench and adjacent abyssal plains (Zhou et al., 2022) - 9 MAGs from the North Atlantic abyssal plain (Kerou et al., 2021)

1- Downloading the data

Example of commands to download the data with ncbi datasets

conda activate ncbi_datasets

cd NON_REDUNDANT_BINS
mkdir -p 01_EXT_FASTA
cd 01_EXT_FASTA

datasets download genome accession GCA_012928605.1 --include genome
unzip ncbi_dataset.zip
mv ncbi_dataset/data/*/*.fna .
rm -r ncbi_dataset*

# compress all downloaded fasta files
gzip *.fna

2- Running the snakemake workflow

Creating the text file necessary for Anvi’o to find the files: ext_Zhou_Zhong_Kerou.txt

name    path
B89T1L10    01_EXT_FASTA/GCA_022561135.1_ASM2256113v1_genomic.fna.gz
B7T1B11   01_EXT_FASTA/GCA_022561815.1_ASM2256181v1_genomic.fna.gz
B2D1T2    01_EXT_FASTA/GCA_022567895.1_ASM2256789v1_genomic.fna.gz
B1T1B5    01_EXT_FASTA/GCA_022572285.1_ASM2257228v1_genomic.fna.gz
B17T3L8   01_EXT_FASTA/GCA_022572765.1_ASM2257276v1_genomic.fna.gz
B16T1B3   01_EXT_FASTA/GCA_022573065.1_ASM2257306v1_genomic.fna.gz
B19T1B10    01_EXT_FASTA/GCA_022572565.1_ASM2257256v1_genomic.fna.gz
B26D1T2   01_EXT_FASTA/GCA_022571195.1_ASM2257119v1_genomic.fna.gz
B44T3L14    01_EXT_FASTA/GCA_022565935.1_ASM2256593v1_genomic.fna.gz
B56T1B5   01_EXT_FASTA/GCA_022564175.1_ASM2256417v1_genomic.fna.gz
B51T1B5   01_EXT_FASTA/GCA_022564515.1_ASM2256451v1_genomic.fna.gz
B52T3L11    01_EXT_FASTA/GCA_022564385.1_ASM2256438v1_genomic.fna.gz
B49T1B8   01_EXT_FASTA/GCA_022564815.1_ASM2256481v1_genomic.fna.gz
B5T1L6    01_EXT_FASTA/GCA_022563695.1_ASM2256369v1_genomic.fna.gz
B6T1L6    01_EXT_FASTA/GCA_022562615.1_ASM2256261v1_genomic.fna.gz
B10T1B5   01_EXT_FASTA/GCA_022574355.1_ASM2257435v1_genomic.fna.gz
B10T1B11    01_EXT_FASTA/GCA_022574415.1_ASM2257441v1_genomic.fna.gz
B12T1B11    01_EXT_FASTA/GCA_022573895.1_ASM2257389v1_genomic.fna.gz
B15D1T2   01_EXT_FASTA/GCA_022573365.1_ASM2257336v1_genomic.fna.gz
B15MC02   01_EXT_FASTA/GCA_022573335.1_ASM2257333v1_genomic.fna.gz
B10D1T1   01_EXT_FASTA/GCA_022545335.1_ASM2254533v1_genomic.fna.gz
B12D1T1   01_EXT_FASTA/GCA_022545345.1_ASM2254534v1_genomic.fna.gz
MTA1    01_EXT_FASTA/GCA_012928605.1_ASM1292860v1_genomic.fna.gz
MTA4    01_EXT_FASTA/GCA_012928615.1_ASM1292861v1_genomic.fna.gz
MTA5    01_EXT_FASTA/GCA_012928585.1_ASM1292858v1_genomic.fna.gz
MTA6    01_EXT_FASTA/GCA_012928565.1_ASM1292856v1_genomic.fna.gz
NPMR_NP_delta_1   01_EXT_FASTA/GCA_016276965.1_ASM1627696v1_genomic.fna.gz
NPMR_NP_theta_3   01_EXT_FASTA/GCA_016838785.1_ASM1683878v1_genomic.fna.gz
NPMR_NP_delta_2   01_EXT_FASTA/GCA_016838725.1_ASM1683872v1_genomic.fna.gz
NPMR_NP_iota_1    01_EXT_FASTA/GCA_016838825.1_ASM1683882v1_genomic.fna.gz
NPMR_NP_theta_2   01_EXT_FASTA/GCA_016838795.1_ASM1683879v1_genomic.fna.gz
NPMR_NP_theta_5   01_EXT_FASTA/GCA_016838745.1_ASM1683874v1_genomic.fna.gz
NPMR_NP_theta_4   01_EXT_FASTA/GCA_016838765.1_ASM1683876v1_genomic.fna.gz
NPMR_NP_delta_3   01_EXT_FASTA/GCA_016838865.1_ASM1683886v1_genomic.fna.gz
NPMR_NP_theta_1   01_EXT_FASTA/GCA_016838845.1_ASM1683884v1_genomic.fna.gz
conda activate anvio-7.1
cd NON_REDUNDANT_BINS

# checking the steps that will be run by the workflow 
anvi-run-workflow -w contigs \
                  -c config_contigs_ext.json \
                  --save-workflow-graph

# Run the actual workflow
anvi-run-workflow -w contigs \
                  -c config_contigs_ext.json \
                  --additional-params \
                      #--directory your_working_directory \
                      --jobs 36 \
                      --keep-going --rerun-incomplete >> workflow_log_contigs_ext.txt 2>&1