Biopython - read and write a fasta file
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
file_in ='gene_seq_in.fasta'
file_out='gene_seq_out.fasta'
with open(file_out, 'w') as f_out:
for seq_record in SeqIO.parse(open(file_in, mode='r'), 'fasta'):
# remove .id from .description record (remove all before first space)
seq_record.description=' '.join(seq_record.description.split()[1:])
# do something (print or edit seq_record)
print('SequenceID = ' + seq_record.id)
print('Description = ' + seq_record.description + '\n')
# write new fasta file
r=SeqIO.write(seq_record, f_out, 'fasta')
if r!=1: print('Error while writing sequence: ' + seq_record.id)
SeqIO.parse output (reading fasta file)
for any sequence, for example
>gi|16445223|ref|NC_002655.2|:1353261-1353530 stx2B Shiga toxin 2 subunit B
ATGAAGAAGATGTTTATGGCGGTTTTATTTGCATTAGCTTCTGTTAATGCAATGGCGGCG...
# sequence-ID contains all letters before first space
seq_record.id
'gi|16445223|ref|NC_002655.2|:1353261-1353530'
# description is the complete line after '>' (including the sequence-ID)
seq_record.description
'gi|16445223|ref|NC_002655.2|:1353261-1353530 stx2B Shiga toxin 2 subunit B'
# sequence
seq_record.seq
Seq('ATGAAGAAGATGTTTATGGCGGTTTTATTTGCATTAGCTTCTGTTAATGCAATG...TGA', SingleLetterAlphabet())
SeqIO.write
SeqIO.write concatenates the content of id and description:
'>' + seq_record.id + ' ' + seq_record.description
Attention! To avoid that the sequence-ID appears twice (as ID and in the description)
the ID needs to be removed from description record before writing
seq_record.description=' '.join(seq_record.description.split()[1:])
or alternatively use .replace to remove the sequence-ID from description
seq_record.description=seq_record.description.replace(seq_record.id+' ','',1)