Biopython pairwise alignment results in segmentation fault when run in loop

Multi tool use
Biopython pairwise alignment results in segmentation fault when run in loop
I am trying to run pairwise global alignment method in biopython
in loop for about 10000 pair of strings. Each string on an average is 20 characters long. Running the method for a single pair of sequences works fine. But running this in a loop, for as low as 4 pairs, results in segmentation fault. How can this be solved?
biopython
from Bio import pairwise2
def myTrial(source,targ):
if source == targ:
return [source,targ,source]
alignments = pairwise2.align.globalmx(source, targ,1,-0.5)
return alignments
sour = ['najprzytulniejszy', 'sadystyczny', 'wyrzucić', 'świat']
targ = ['najprzytulniejszym', 'sadystycznemu', 'wyrzucisz', 'świat']
for i in range(4):
a = myTrial(sour[i],targ[i])
1 Answer
1
The segmentation fault isn't happening because you are using a loop, but because you are providing non-ASCII characters as input for an alignment mode that takes ASCII string inputs only. Luckily, Bio.pairwise2.align.globalmx
also permits aligning lists that contain arbitrary strings of ASCII and non-ASCII characters as tokens(i.e. aligning lists of strings, such as ['ABC', 'ABD']
with ['ABC', 'GGG']
to produce alignments like
Bio.pairwise2.align.globalmx
['ABC', 'ABD']
['ABC', 'GGG']
['ABC', 'ABD', '-' ]
['ABC', '-' , 'GGG']
or in your case, aligning lists of non-ASCII characters such as ['ś', 'w', 'i', 'a', 't']
and ['w', 'y', 'r', 'z', 'u', 'c', 'i', 's', 'z']
to produce alignments like
['ś', 'w', 'i', 'a', 't']
['w', 'y', 'r', 'z', 'u', 'c', 'i', 's', 'z']
['ś', 'w', '-', '-', '-', '-', '-', 'i', 'a', 't', '-', '-']
['-', 'w', 'y', 'r', 'z', 'u', 'c', 'i', '-', '-', 's', 'z']
To accomplish this with Biopython, in your code, replace
alignments = pairwise2.align.globalmx(source, targ,1,-0.5)
with
alignments = pairwise2.align.globalmx(list(source), list(targ), 1, -0.5, gap_char=['-'])
So for an input of
source = 'świat'
targ = 'wyrzucisz'
the modified code will produce
[(['ś', 'w', '-', '-', '-', '-', '-', 'i', 'a', 't', '-', '-'],
['-', 'w', 'y', 'r', 'z', 'u', 'c', 'i', '-', '-', 's', 'z'],
2.0,
0,
12)]
instead of a segmentation fault.
And since each token in the list is only one character long, you can also convert the resulting aligned lists back into strings using:
new_alignment =
for aln in alignment:
# Convert lists back into strings
a = ''.join(aln[0])
b = ''.join(aln[1])
new_aln = (a, b) + aln[2:]
new_alignment.append(new_aln)
In the above example, new_alignment
would then be
new_alignment
[('św-----iat--', '-wyrzuci--sz', 2.0, 0, 12)]
as desired.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.