Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GATK gCNV doesn't recognize BAM index input which isn't located in the same origin folder as BAM file #7487

Closed
seunghun23 opened this issue Sep 29, 2021 · 9 comments
Assignees

Comments

@seunghun23
Copy link

Hi,

I'm trying to run "cnv_germline_cohort_workflow" from this workspace (https://app.terra.bio/#workspaces/help-gatk/Germline-CNVs-GATK4), and the workflow is keep failing at the "CollectCounts" step with the following error
in multiple shards


A USER ERROR has occurred: Traversal by intervals was requested but some input files are not indexed.
Please index all input files:

samtools index /cromwell_root/dg.4DFC_3615e55e-6aa3-43e7-8d7b-6f2824071971/C345.TCGA-A3-3373-11A-01D-1421-08.5.bam

This was weird since I set a correct index file as an input, but after some investigation, I realized that
the error seemed to be occurring when the bucket path where BAM index was located was different from that of BAM file
For example, if you look at this log below,

Attempting to download gs://gdc-tcga-phs000178-controlled/KIRC/DNA/WXS/BI/ILLUMINA/C345.TCGA-A3-3373-11A-01D-1421-08.5.bam to /cromwell_root/dg.4DFC_3615e55e-6aa3-43e7-8d7b-6f2824071971/C345.TCGA-A3-3373-11A-01D-1421-08.5.bam Successfully activated service account; Will continue with download. Activated service account credentials for: [kd5mqbpsed8lzz0kyz9tvkht-3274@dcf-prod.iam.gserviceaccount.com] Download complete! 2021/09/29 15:46:14 Localizing input drs://dg.4DFC:ab4d57fa-bfff-4a48-bd96-f2866ecfe0e1 -> /cromwell_root/dg.4DFC_ab4d57fa-bfff-4a48-bd96-f2866ecfe0e1/C345.TCGA-A3-3373-11A-01D-1421-08.5.bam.bai Requester Pays project ID is Some(vanallen-firecloud-nih) Attempting to download gs://gdc-tcga-phs000178-controlled/KIRC/DNA/WXS/BI/ILLUMINA/C345.TCGA-A3-3373-11A-01D-1421-08.5.bam.bai to /cromwell_root/dg.4DFC_ab4d57fa-bfff-4a48-bd96-f2866ecfe0e1/C345.TCGA-A3-3373-11A-01D-1421-08.5.bam.bai Successfully activated service account; Will continue with download. Activated service account credentials for: [kd5mqbpsed8lzz0kyz9tvkht-3274@dcf-prod.iam.gserviceaccount.com] Download complete!

C345.TCGA-A3-3373-11A-01D-1421-08.5.bam and C345.TCGA-A3-3373-11A-01D-1421-08.5.bam.bai
were successfully downloaded, but since these TCGA files use DRS URI, they were copied to two separate cromwell folders

/cromwell_root/dg.4DFC_3615e55e-6aa3-43e7-8d7b-6f2824071971/C345.TCGA-A3-3373-11A-01D-1421-08.5.bam
/cromwell_root/dg.4DFC_ab4d57fa-bfff-4a48-bd96-f2866ecfe0e1/C345.TCGA-A3-3373-11A-01D-1421-08.5.bam.bai

And GATK doesn't seem to recognize BAM index when it is not inside a same folder.
Could you maybe add symlink for the BAM and BAI files in the WDL script?

Thanks,
Seunghun

@droazen
Copy link
Collaborator

droazen commented Oct 12, 2021

@mwalker174 ^^

@ldgauthier
Copy link
Contributor

@droazen isn't this a simple case of using another command line argument to specify the index path? And if that's necessary for DRS files, won't it be the same for all tools?

@droazen
Copy link
Collaborator

droazen commented Oct 19, 2021

@ldgauthier It's true that this issue will affect other tools/pipelines, but to address this specific issue the gCNV WDL will need to be patched to either copy the bai to the bam location, or make use of the GATK's --read-index argument.

@ldgauthier
Copy link
Contributor

ldgauthier commented Oct 21, 2021

Any reason we wouldn't want to always use --read-index in WDLs, even if it's named as expected?

@asmirnov239
Copy link
Collaborator

@seunghun23 I just merged a fix into the master.

Closed by #7518.

@s-hoyt
Copy link

s-hoyt commented Mar 8, 2022

Hi @asmirnov239 is it also possible to add this change to gatk/mutect2-gatk4? I am having the same problem there. Thanks!

@asmirnov239
Copy link
Collaborator

@s-hoyt I am looking at the mutect2 WDL (https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect2.wdl) and it seems like there is an option to specify BAM indices as a workflow level input. Are there any specific tasks that you had trouble with?

@shahab-sarmashghi
Copy link

shahab-sarmashghi commented Aug 29, 2023

@asmirnov239 I'm getting the same error on Terra when bam files and their indices are not in the same location. I'm running https://github.com/broadinstitute/gatk/blob/4.1.7.0/scripts/mutect2_wdl/mutect2_pon.wdl. The workflow input for bam index seems to be just a placeholder, if you follow the variables along the wdl it's never actually used.

@shahab-sarmashghi
Copy link

Seems to be an easy fix similar to #7518, adding --read-index argument to every gatk command in scripts/mutect2_wdl/mutect2.wdl that processes the input bam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants