用Python处理vcf文件的代码

系统教程3320 更新时间：2025-09-08 03:38:31

2023年12月7日发(作者：)

用Python处理vcf文件的代码

问题背景：

我现在有几百个同一菌株的基因组，但是我只需要差别较大的样品去做后续分析，我要把相似的冗余样品去掉那么我该

怎么办呢？直接用mega构建进化树是不可行的，因为基因组太大，分析不出来。我把这些样品中的一个样品作为参考

基因组，将剩余样品比对到这个样品上，然后call SNP，最终得到数百个vcf文件，这些vcf文件包含了这些样品相对于

参考基因组的SNP，写一个脚本将这些SNP分别连接起来作为一条序列，所有序列输出为一个fasta文件，再用mega

构建进化树就可以了！

以下是python代码

#Pyvcf是python专门处理vcf文件的一个包

import vcf

import os

import numpy

import collections

#vcf文件所在的路径

filepath=r'C:'

filelist=r(filepath)

output_name=[]

for ech in filelist:

output_(e('_',''))

output_name+=['VibrioFF75']

dicREF_list=[]

dicALT_list=[]

for ech in filelist:

ech_vcf=(filename=r'C:%s' % ech)

dicREF,dicALT={},{}

for SNP in ech_vcf:

if _snp == 1:

dicREF[ + '_' + str()] =

if len() > 1:

dicALT[ + '_' + str()] = [0]

else:

dicALT[ + '_' + str()] =

dicREF_(dicREF)

dicALT_(dicALT)

#求出所有vcf文件中snp的位点

SNP_REF={}

for i in dicREF_list:

SNP_REF=dict(SNP_REF,**i)

pos_list=list(SNP_())

pos_()

print(len(pos_list))

#将每个vcf中的snp连接起来，如果该位点存在突变，则输出ALT，否则输出REF，保证每条SNP序列长度相同

all_list=[]

for ech_dic in dicALT_list:

ech_dic_index = dicALT_(ech_dic)

ech_name=output_name[ech_dic_index]

ech_list=[]

for pos in pos_list:

if pos in ech_dic:

theSNP = str(ech_dic[pos]).replace('[', '').replace(']', '')

ech_(theSNP)

else:

theSNP = str(SNP_REF[pos]).replace('[', '').replace(']', '')

ech_(theSNP)

all_(ech_list)

#如果将SNP连接起来的序列还是过长，则需要进行二次过滤

snp_array=[]

for pos in pos_list:

theSNP=str(SNP_REF[pos]).replace('[','').replace(']','')

snp_(theSNP)

for i in all_list:

snp_array=_stack((snp_array,i)) snp_array=_stack((snp_array,i))snp_array=ose(snp_array)array_len=len(snp_array)print(array_len)snp_least=[]i=0while i'+output_name[-1]+'n'for pos in pos_list: theSNP=str(SNP_REF[pos]).replace('[','').replace(']','') allStr+=theSNPoutput=open(r'C:al_','w')(allStr)()代码仅供参考，具体问题具体分析。。。

本文发布于:2023-12-07，感谢您对本站的认可！

本文链接:https://www.fzithome.com/xitong/1701935739a51465.html

用Python处理vcf文件的代码

发布评论取消回复

最近发表

相关推荐

标签列表

用Python处理vcf文件的代码

发布评论 取消回复

最近发表

相关推荐

标签列表

发布评论取消回复