Snakemake:如何有效地使用配置文件

编程入门 行业动态 更新时间:2024-10-18 20:22:49
本文介绍了Snakemake:如何有效地使用配置文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我在 snakemake 中使用以下配置文件格式进行一些测序分析练习(我有大量样本,每个样本包含2个fastq文件:

I'm using the following config file format in snakemake for a some sequencing analysis practice (I have loads of samples each containing 2 fastq files:

samples: Sample1_XY: - fastq_files/SRR4356728_1.fastq.gz - fastq_files/SRR4356728_2.fastq.gz Sample2_AB: - fastq_files/SRR6257171_1.fastq.gz - fastq_files/SRR6257171_2.fastq.gz

我在我的管道开始时使用以下规则来运行fastqc并对齐fastqc文件:

I'm using the following rules at the start of my pipeline to run fastqc and for alignment of the fastqc files:

import os # read config info into this namespace configfile: "config.yaml" rule all: input: expand("FastQC/{sample}_fastqc.zip", sample=config["samples"]), expand("bam_files/{sample}.bam", sample=config["samples"]), "FastQC/fastq_multiqc.html" rule fastqc: input: sample=lambda wildcards: config['samples'][wildcards.sample] output: # Output needs to end in '_fastqc.html' for multiqc to work html="FastQC/{sample}_fastqc.html", zip="FastQC/{sample}_fastqc.zip" params: "" wrapper: "0.21.0/bio/fastqc" rule bowtie2: input: sample=lambda wildcards: config['samples'][wildcards.sample] output: "bam_files/{sample}.bam" log: "logs/bowtie2/{sample}.txt" params: index=config["index"], # prefix of reference genome index (built with bowtie2-build), extra="" threads: 8 wrapper: "0.21.0/bio/bowtie2/align" rule multiqc_fastq: input: expand("FastQC/{sample}_fastqc.html", sample=config["samples"]) output: "FastQC/fastq_multiqc.html" params: log: "logs/multiqc.log" wrapper: "0.21.0/bio/multiqc"

我的问题在于fastqc规则。

My issue is with the fastqc rule.

目前fastqc规则和bowtie2规则都创建了一个使用两个输入生成的输出文件 SRRXXXXXXX_1.fastq.gz 和 SRRXXXXXXX_2.fastq.gz 。

Currently both the fastqc rule and the bowtie2 rule create one output file generated using two inputs SRRXXXXXXX_1.fastq.gz and SRRXXXXXXX_2.fastq.gz.

我需要fastq规则来生成两个文件,每个 fastq.gz 文件都有一个文件但是我不确定如何从fastqc规则输入语句正确索引配置文件,或者如何组合expand和wildcards命令来解决这个问题。我可以通过在输入语句的末尾添加 [0] 或 [1] 来获取单个fastq文件,但不是两个单独/分开运行。

I need the fastq rule to generate two files, a separate one for each of the fastq.gz files but I'm unsure how to index the config file correctly from the fastqc rule input statement, or how to combine the the expand and wildcards commands to solve this. I can get an individual fastq file by adding [0] or [1] to the end of the input statement, but not both run individually/separately.

我一直在努力获取正确的索引格式来分别访问每个文件。当前格式是我管理的唯一一个允许 snakemake -np 生成作业列表的格式。

I've been messing around trying to get the correct indexing format to access each file separately. The current format is the only one I've managed that allows snakemake -np to generate a job list.

任何提示都将不胜感激。

Any tips would be greatly appreciated.

推荐答案

sample将有两个fastq文件,它们的格式为 *** _ 1.fastq.gz 和 *** _ 2.fastq.gz 。在这种情况下,下面的配置和代码将起作用。

It appears each sample would have two fastq files, and they are named in format ***_1.fastq.gz and ***_2.fastq.gz. In that case, config and code below would work.

config.yaml:

config.yaml:

samples: Sample_A: fastq_files/SRR4356728 Sample_B: fastq_files/SRR6257171

Snakefile:

Snakefile:

# read config info into this namespace configfile: "config.yaml" print (config['samples']) rule all: input: expand("FastQC/{sample}_{num}_fastqc.zip", sample=config["samples"], num=['1', '2']), expand("bam_files/{sample}.bam", sample=config["samples"]), "FastQC/fastq_multiqc.html" rule fastqc: input: sample=lambda wildcards: f"{config['samples'][wildcards.sample]}_{wildcards.num}.fastq.gz" output: # Output needs to end in '_fastqc.html' for multiqc to work html="FastQC/{sample}_{num}_fastqc.html", zip="FastQC/{sample}_{num}_fastqc.zip" wrapper: "0.21.0/bio/fastqc" rule bowtie2: input: sample=lambda wildcards: expand(f"{config['samples'][wildcards.sample]}_{{num}}.fastq.gz", num=[1,2]) output: "bam_files/{sample}.bam" wrapper: "0.21.0/bio/bowtie2/align" rule multiqc_fastq: input: expand("FastQC/{sample}_{num}_fastqc.html", sample=config["samples"], num=['1', '2']) output: "FastQC/fastq_multiqc.html" wrapper: "0.21.0/bio/multiqc"

更多推荐

Snakemake:如何有效地使用配置文件

本文发布于:2023-11-28 07:13:44,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1641426.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:有效地   配置文件   Snakemake

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!