“Too many to count.” FL4K, the Beastmaster.

As you may already know, millions of puzzle solutions have been submitted by the community of untiring BL3 players. If you’ve ever wondered what happens to these solutions after they are gathered, by reading this post you may find some of the answers you seek.

The realignment pipeline is the centermost step of the Borderlands science project. The goal of this step is to create a complete alignment of the biological sequences. Why “re”alignment you may ask; Puzzles are created from a base alignment – which we refer to as the PASTA alignment (we use the PASTA software) – by slicing out small regions, then players modify them in the game, and finally we receive the solutions. So we’re not actually doing an alignment, we are just using what you guys submitted through the game to enhance the alignment.

Before solutions can be processed in the realignment pipeline, they need to undergo filtering and then nucleotides (a.k.a blocks) are mapped to their corresponding columns in the initial PASTA alignment. The realignment pipeline takes this mapping – which we call consensus – and starts its work.

The realignment pipeline consists of 3 major steps:

The first step is to align each sequence’s PASTA version with its consensus. For aligning 2 sequences together, the goto approach is to use dynamic programming (which in simple terms is breaking the problem into small sub-problems and saving the results for the future so one would not have to calculate them again). Here mainly because we are aligning a sequence to a consensus, we had to make some changes to this algorithm, but the core concept is still the same. In shorter words, the consensus (=players) tells us where to add more gaps. The output of this step is a set of sequences with gaps and varying lengths.
Because the altered sequences have varying lengths, we still do not have a Multiple Sequence Alignment (MSA). To address this, we progressively align the sequences together to form groups of sequences called profiles, until all sequences belong to one big profile. We do so by choosing sequences that are closest to each other first, and then sequences/profiles that are closest to each other at each step. To determine how close two sequences are to each other, we calculate the number of edits required to go from one sequence to the other one. For profiles, this is more complicated. At the end of this step, we have realigned our sequences into a valid MSA.
In the early version of the pipeline, after the previous step, we were done. But we realized that the resulting MSA has some unwanted artifacts; there were mistakes that were found when scanning the alignment by eye. Therefore, we added a post-processing (PP) step that takes in the alignment and tries to homogenize it. You can see the effect of the PP in the picture below:

Before PP

After PP

All of the algorithms we have used are as simple as possible, in order to emphasize the role of the player-submitted solutions in the quality of the final alignment. This means there is a lot of room to grow. Despite the fact above, constructing the realignment pipeline has not been an easy task! The final working version of the software is a result of roughly 1.5-2 years of work. More than 20 different ideas and versions were dismissed either because of quality or efficiency. For each idea, massive testing was performed before accepting it. All said we are proud of what we have achieved today, and we have you to thank for that.

The many BLS player. Too many to count.

“Too many to count.” FL4K, the Beastmaster.

Parham

Next Post

A look back (and a look ahead) on citizen science

When Borderlands Science meets Project Discovery

“Too many to count.” FL4K, the Beastmaster.

Parham

Next Post

Recent Posts

A look back (and a look ahead) on citizen science

When Borderlands Science meets Project Discovery