Biopython pairwise alignment results in segmentation fault when run in loop

I am trying to run pairwise global alignment method in biopython in loop for about 10000 pair of strings. Each string on an average is 20 characters long. Running the method for a single pair of sequences works fine. But running this in a loop, for as low as 4 pairs, results in segmentation fault. How can this be solved?

biopython

from Bio import pairwise2 def myTrial(source,targ): if source == targ: return [source,targ,source] alignments = pairwise2.align.globalmx(source, targ,1,-0.5) return alignments sour = ['najprzytulniejszy', 'sadystyczny', 'wyrzucić', 'świat'] targ = ['najprzytulniejszym', 'sadystycznemu', 'wyrzucisz', 'świat'] for i in range(4): a = myTrial(sour[i],targ[i])

1 Answer
1

The segmentation fault isn't happening because you are using a loop, but because you are providing non-ASCII characters as input for an alignment mode that takes ASCII string inputs only. Luckily, Bio.pairwise2.align.globalmx also permits aligning lists that contain arbitrary strings of ASCII and non-ASCII characters as tokens(i.e. aligning lists of strings, such as ['ABC', 'ABD'] with ['ABC', 'GGG'] to produce alignments like

Bio.pairwise2.align.globalmx

['ABC', 'ABD']

['ABC', 'GGG']

['ABC', 'ABD', '-' ] ['ABC', '-' , 'GGG']

or in your case, aligning lists of non-ASCII characters such as ['ś', 'w', 'i', 'a', 't'] and ['w', 'y', 'r', 'z', 'u', 'c', 'i', 's', 'z'] to produce alignments like

['ś', 'w', 'i', 'a', 't']

['w', 'y', 'r', 'z', 'u', 'c', 'i', 's', 'z']

['ś', 'w', '-', '-', '-', '-', '-', 'i', 'a', 't', '-', '-'] ['-', 'w', 'y', 'r', 'z', 'u', 'c', 'i', '-', '-', 's', 'z']

To accomplish this with Biopython, in your code, replace

alignments = pairwise2.align.globalmx(source, targ,1,-0.5)

with

alignments = pairwise2.align.globalmx(list(source), list(targ), 1, -0.5, gap_char=['-'])

So for an input of

source = 'świat' targ = 'wyrzucisz'

the modified code will produce

[(['ś', 'w', '-', '-', '-', '-', '-', 'i', 'a', 't', '-', '-'], ['-', 'w', 'y', 'r', 'z', 'u', 'c', 'i', '-', '-', 's', 'z'], 2.0, 0, 12)]

instead of a segmentation fault.

And since each token in the list is only one character long, you can also convert the resulting aligned lists back into strings using:

new_alignment = for aln in alignment: # Convert lists back into strings a = ''.join(aln[0]) b = ''.join(aln[1]) new_aln = (a, b) + aln[2:] new_alignment.append(new_aln)

In the above example, new_alignment would then be

new_alignment

[('św-----iat--', '-wyrzuci--sz', 2.0, 0, 12)]

as desired.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Search between a Gas Station