RNAseq差异表达基因分析实战_RNAseq差异表达基因分析实战

(80) 2024-06-24 11:01:01

RNA-seq 比对软件STAR——(2)使用

一、参数说明

详见——>manual
(1) readFilesIn
要映射序列文件的名称(带路径),如果文件是压缩的文件使用readFilesCommand参数进行解压缩。如果是(*.gz)使用 --readFilesCommand zcat或 --readFilesCommand gunzip -c,对于bzip2压缩文件,使用–readFilesCommand bunzip2 -c

(2) outFileNamePrefix
输出文件的前缀(包含路径)

(3) outFilterMultimapNmax
一个read允许最多对齐数,超过认为read没有映射
max number of multiple alignments allowed for a read: if exceeded, the read is considered unmapped
(4) outSAMtype BAM SortedByCoordinate
生成的BAM文件排序
output sorted by coordinate Aligned.sortedByCoord.out.bam file, similar to samtools sort command. If this option causes problems, it is recommended to reduce
–outBAMsortingThreadN from the default 6 to lower values (as low as 1).
(5) outSAMattributes

  • NH:number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag.
  • HI:multiple alignment index, starts with –outSAMattrIHstart (=1 by default). Standard SAM tag
  • NM:edit distance to the reference (number of mismatched + inserted +deleted bases) for each mate. Standard SAM tag.
  • MD:string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag.
  • XS:alignment strand according to –outSAMstrandField.
  • AS:multiple alignment index, starts with –outSAMattrIHstart (=1 by default). Standard SAM tag

二、index

STAR --runMode genomeGenerate --runThreadN 20 \ --genomeDir /share2/pub/yangjy/yangjy/database/STAR_index69 \ --outTmpDir /share2/pub/yangjy/yangjy/database/tmp \ --genomeFastaFiles /share/pub/wangxy/software/genome/ucsc/hg38/hg38.fa \ --sjdbGTFfile /share/pub/wangxy/Annotation/hg38/gencode.v34.annotation.gtf \ --sjdbOverhang 69 

error 1
RNAseq差异表达基因分析实战_RNAseq差异表达基因分析实战 (https://mushiming.com/)  第1张
新版的STAR 需要写tmp路径,即增加参数 --outTmpDir ,而且这个路径必须不存在的!!,上面的STAR_index69必须是提前创建好的!!
error 2
RNAseq差异表达基因分析实战_RNAseq差异表达基因分析实战 (https://mushiming.com/)  第2张
如果想要像我上面这种方式写脚本,一定要注意在每个反斜杠后面不能有空格或者其他字符!否则它认不得!其实可以直接写一行,但是为了方便看参数,我习惯这样写了,你们根据自己的习惯!

result
RNAseq差异表达基因分析实战_RNAseq差异表达基因分析实战 (https://mushiming.com/)  第3张

三、mapping

for file in 'SRR' 'SRR' 'SRR' 'SRR' 'SRR' 'SRR' 'SRR' 'SRR' do echo $file STAR \ --runThreadN 40 \ --genomeDir /share2/pub/yangjy/yangjy/database/STAR_index69 \ --readFilesIn /share2/pub/yangjy/yangjy/rna-seq-data/GSE/fastq_data/$file.fastq \ --outFileNamePrefix /share2/pub/yangjy/yangjy/rna-seq-data/GSE/bbam/$file \ --outFilterMultimapNmax 500 \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes NH HI NM MD XS AS done 

result
RNAseq差异表达基因分析实战_RNAseq差异表达基因分析实战 (https://mushiming.com/)  第4张
用过老版本的,新版本真的快很多很多~

THE END

发表回复