It has been two months since we launched the Borderlands Science project. Your participation has been amazing: Hundreds of thousands of Borderlands fans joined the project and generated tens of millions of puzzle solutions that we are now analyzing. This is a lot of data and far more than we anticipated as early in the process.
For the context, we previously launched in 2010 an experimental citizen science game called Phylo (http://phylo.cs.mcgill.ca) that aims to solve a similar task to Borderlands Science (i.e. the alignment of multiple DNA sequences) but for a different purpose (i.e. analysis of human genes). The origin of the data is obviously different, but the principles of the game are close enough to allow us to make a comparison. Borderlands Science produced five times more data in the first 12 hours than Phylo did during its 10 years of operations! We are now two months after the launch, so you can image the size of the data set we are analyzing.
One of important lessons we learned with Phylo is that we need between 15 and 20 solutions per puzzle to have a good chance to improve the DNA mapping. We are getting there. The histograms below show the levels of completion of each batch of puzzles sorted by difficulty level. The blue stacks indicate the number of puzzles already completed (i.e. we received enough solutions) while the red parts show what remains to be done. As you can see, we have almost collected enough solutions to move to the next phase of the project.
Still, we already had a look at the solutions you returned, and what we see is that these solutions improve significantly the base score previously set by our computer algorithms. Although, it does not mean that we already have a better mapping of the complete data sets yet, it does suggest that we are in the right path and that your solutions are valuable. The diagrams below show the average improvement per puzzle at level 3 and 7. The values on the x-axis indicate the improvement of the score vs the baseline set by the computer for puzzles. The red lines show the median. More than half of the players managed to get an important improvement! What could be even more interesting is that the best solutions increase the base score by nearly 50 (at level 3) or 80 (at level 7). This is huge! It also suggests that top players managed to find ways to improve the mapping that could not be found with basic approaches. We are having a close look at these submissions.
Now, will the project be complete once all these puzzles are solved? Absolutely not! But this data gives us a solid base to produce preliminary results and advance to our next steps. What are they?
- We will first quantify the magnitude of the improvement made by individual players on single puzzles and provide a better picture on what human are good at.
- Next, we will assemble the individual solutions to build a complete mapping of the full microbial DNA data set. Then, we will compare this mapping to the solutions returned by computer programs.
- Finally, based on the solutions already collected, we will build a brand-new set of puzzles. You can expect more challenging puzzles with an even greater potential to improve the analysis of microbiome samples.
Stay tuned! You will hear more about this in our upcoming posts. Meanwhile, continue to play Borderlands Science. We need your help more than ever and promise to make your contribution fun!