The goal is to align the DNA sequences of gut bacteria. DNA sequences contain important (but hidden) information about the function and ancestry of these bacteria. In the game, each column is a fragment of a sequence. By aligning these columns, it is possible to figure out which bacteria are related to which and thus learn more about their history and function. I.e., if we know what Bacteria A does and we learn through this that bacteria B is its close cousin, we learn a lot about Bacteria B. This initiative is performed with close consultation with the Microsetta initiative at UCSD,
There are two types of bricks in the game. The bricks that you can move, which are part of the puzzles, represent nucleotides. Nucleotides are to DNA sequences what letters are to a sentence. Each color represents one of A,C,G and T. The bricks you cannot move, on the left side of the puzzle, are there to guide the alignment. They represent the consensus of one million other bacteria, as aligned by a computer. This, solving a puzzle equates trying to align a few sequences to a million others, and if you repeat this a bunch of times you get millions of sequences aligned!
The high score is achieved by other players! Each puzzle is given to many players, and the high score you see is the highest scored achieved on this puzzle by other humans. The target score is the basic objective to reach to help science by solving the puzzle. Currently, the target scores you see, which sometimes can seem quite low, are achieved by a computer.
The task of aligning sequences consists of trying to match the nucleotides of two sequences (at a time). For each pair of nucleotides (one from each sequence), the computer must decide between considering the two nucleotides “aligned” (and assigning a score based on how well they match), or if we want to add a gap.. For example the following is an alignment of the sequences AACAG and ATAG.
A A C A G
A T – A G
The objective is to maximize the number of correctly aligned nucleotides (matches, implemented in the game as the score), while minimizing the number of gaps (implemented in the game as the yellow tokens). Computers are not very gifted at this task because, in order to correctly evaluate the trade-off between score and gaps, one must consider the context of the other nucleotides and the other sequences. As of today, the most reliable alignments are obtained from humans doing it manually. This is why we are asking you to do it!
The issue with these small alignments is not that they are difficult in essence but rather that we do not know how to evaluate them. If we do not know how to evaluate them, we cannot train an AI to solve it for us because we do not have a “ground truth” to give it. The score you see is there to motivate you to solve the puzzles, but we are in fact more interested in the solution the players provide than the score they achieve,
Our goal is to understand how humans intuitively solve simple alignment problems and get a lot of examples of human-solved alignments. The games don’t need to be difficult because we are looking for input from humans about how to solve simple problems to then generalize these approaches to the large scale problem.
No! What makes alignments difficult to evaluate in the first place is that there are two main, clashing objectives: increasing the number of correctly aligned nucleotides, and decreasing the number of gaps. This means there is a trade-off between maximizing the score and minimizing the number of gaps. It is difficult to teach to a computer how to evaluate this trade-off, but humans understand it very intuitively (based on what looks right). Moreover, computers tend to compare one pair of row at a time, whereas humans can see the whole portrait, which is key.
In other words, an untrained human is typically better than a computer at solving this problem. Which is why we need your help, and this is where the key advantage of citizen science comes in: different players play differently. Some players will play the game until they reach the target score and then move on immediately, often not using the full number of yellow tokens available. These player are minimizing the number of actions, without caring about the score. Other players will fight tooth and nail for the high score, most of the time using all their gaps. These players are maximizing the score without caring about the number of tokens used. And many players will stand somewhere in-between.
All three types of players hold a key importance in our strategy because they will show us different ways of solving the problem.
Having access to many valid solutions of different people who think differently for many different local contexts of the big alignment. we can look at the the answers of everyone and see which solutions are more popular in different contexts.
This will allow us to train an Artificial Intelligence to align sequences based on what seems right to most humans. This sounds simple, but it requires a lot of data, which is why it has never been done until now.
Well, nothing is ever certain in citizen science, but we are quite confident, and we will be sharing results soon!
The assumptions we have described in this FAQ are not entirely novel, and they have been tested in the past in other citizen science games, namely one named Phylo (game website) (academic paper), also produced by our group.
You can ask us anything on our subreddit!