Phase 1 - Borderlands Science Documentation
Processing output files
File containing general information about puzzles, its location within initial alignment, and solutions. Partially contains “pikdik” information for a quick lookup.
Files
Prerequisites
- All the pickle files described in this section use dnapuzzle.py .
- Make sure you have “dnapuzzle.py” in the working directory.
- Pandas library is used in the Code section, however it is not required to open the file.
Code
To open the pickle file (replace filepath with appropriate string):
#python
import pandas as pd
from dnapuzzle import Puzzle
filepath = "extract_output_1_2021-06-19.pickle"
obj = pd.read_pickle(filepath)
Format
Pickle file encodes a dictionary where:
ATTRIBUTE | TYPE | DESCRIPTION |
keys | [str] | Stringified list of column indexes in original alignment
(e.g '[99, 100, 101, 102, 103, 104, 105]') |
values | [str] | List of dictionaries that describe distinct puzzles (see below)
|
Puzzle description dictionary
Pickle values list element contains the following attributes:
ATTRIBUTE | TYPE | DESCRIPTION |
originalCode | [str] | puzzle ID in BLS system
(e.g. "VeoWZMUzLmzgbU3e") |
pikdik | [dnapuzzle.Puzzle] | Puzzle Object (see below) |
nGaps | [int] | Number of gaps allowed in a puzzle (e.g. 6) |
score | [float] | Expected score for a particular puzzle (e.g. 17.0)
|
pareto | [str] | Type of strategy used (e.g. "FalseSubopt")
|
playerSolutions | [list] | List of players' solutions (e.g. [['TTG-A', 'GT--AA', 'GCGA', '-TGCA', 'CT', 'CT-GA']]),
|
playerIDs | [list] | List of player IDs that submitted solutions described in playerSolutions (e.g. ['1269979'])
|
pikdik: Puzzle Object
ATTRIBUTE | TYPE | DESCRIPTION |
pikdik.puzzle | [list] | Collapsed symbol-wise split puzzle sequences
(e.g. [['T', 'T', 'G', 'A', '-', '-', '-'], ['G', 'T', 'A', 'A', '-', '-', '-'], ['G', 'C', 'G', 'A', '-', '-', '-'], ['T', 'G', 'C', 'A', '-', '-', '-'], ['C', 'T', '-', '-', '-', '-', '-'], ['C', 'T', 'G', 'A', '-', '-', '-']] ) |
pikdik.par_puzzle | [list] | Symbol-wise split pareto solution of the puzzle
(e.g. [['T', 'T', '-', 'G', 'A', '-', '-'], ['G', 'T', '-', 'A', 'A', '-', '-'], ['G', 'C', '-', 'G', 'A', '-', '-'], ['T', 'G', '-', 'C', 'A', '-', '-'], ['C', 'T', '-', '-', '-', '-', '-'], ['C', 'T', '-', 'G', 'A', '-', '-']] ) |
pikdik.columns | [str(list)] | Column indexes in original alignment
(e.g. '[99, 100, 101, 102, 103, 104, 105]' ) |
pikdik.flanks | [list] | Column indexes used as a flanks for a puzzle (e.g [105, 106]) |
pikdik.consensus | [list] | Predefined guides for a puzzle (used in scoring)
(e.g. [('C', 'T'), ('C', 'T'), ('-', 'C'), ('C', 'T'), ('-', 'A'), ('G', 'A'), ('-', 'C'), ('-', 'G')] ) |
pikdik.cons_scores | [list] | Counter for each nucleotide from consenus to appear in the alignment (e.g. [(6287, 2552), (3416, 2688), (9666, 1), (2938, 2669), (9545, 55)] ) |
pikdik.n | [int] | Number of sequences. (e.g. 6)
|
pikdik.bonus | [float] | Multiplier for aligned "line" perfectly matching given consensus / guide (e.g. 1.15)
|
pikdik.level | [int] | Difficulty level of the puzzle in range [1, 9] (e.g. 2) |