python - Frequency of sequences in many files -
suppose have 50 files (in same folder), , each 1 contains character ">" @ beginning of search. examples:
file1.txt >organism1 >organism2 >organism3 >organism4 >organism5 file2.txt >organism3 >organism4 >organism5 >organism6
my intention count frequency of each organism in each file , generate table. @ moment count each 1 file-by-file generate table:
table 1. frequency organism1 1 organism2 1 organism3 2 organism4 2 organism5 2 organism6 1
until now, can list file in folder can't open them make want.
import sys bio import seqio import glob, os os.chdir(sys.argv[1]) file_list = [] file in glob.glob("*.faa"): if file not in file_list: file_list.append(file) # until here, perfect f in file_list: infile = open(f, 'r') fasta = seqio.parse(infile, 'fasta') seq = fasta.description #.split("|")[2] print seq
the problem when try open files in file_list.
<generator object parse @ 0x7f76867c7a00> <generator object parse @ 0x7f76867c7a50> <generator object parse @ 0x7f76867c7a00> <generator object parse @ 0x7f76867c7a50>
you're using seqio.parse()
if seqio.read()
won't work. seqio.parse()
returns generator since produces multiple results file:
import sys import glob import os bio import seqio os.chdir(sys.argv[1]) file_list = [] file in glob.glob("*.faa"): if file not in file_list: file_list.append(file) file_name in file_list: fasta in seqio.parse(file_name, 'fasta'): description = fasta.description print(description)
this works in environment, python 3.6.0; biopython 1.69. in environment, code generates error:
attributeerror: 'generator' object has no attribute 'description'
instead of producing output. see you're running python 2.7 biopython?
why filtering duplicates result of glob()
?
Comments
Post a Comment