Output Data and Processing Scripts

Phase 1 - Borderlands Science Documentation

Processing output files

File containing general information about puzzles, its location within initial alignment, and solutions. Partially contains “pikdik” information for a quick lookup.

Files

Prerequisites

  • All the pickle files described in this section use dnapuzzle.py .
  • Make sure you have “dnapuzzle.py” in the working directory. 
  • Pandas library is used in the Code section, however it is not required to open the file.

Code

To open the pickle file (replace filepath with appropriate string):
#python
import pandas as pd
from dnapuzzle import Puzzle
filepath = "extract_output_1_2021-06-19.pickle"
obj = pd.read_pickle(filepath) 

Format

Pickle file encodes a dictionary where:
ATTRIBUTE
TYPE
DESCRIPTION
keys
[str]
Stringified list of column indexes in original alignment (e.g '[99, 100, 101, 102, 103, 104, 105]')
values
[str]
List of dictionaries that describe distinct puzzles (see below)

Puzzle description dictionary

Pickle values list element contains the following attributes:
ATTRIBUTE
TYPE
DESCRIPTION
originalCode
[str]
puzzle ID in BLS system (e.g. "VeoWZMUzLmzgbU3e")
pikdik
[dnapuzzle.Puzzle]
Puzzle Object (see below)
nGaps
[int]
Number of gaps allowed in a puzzle (e.g. 6)
score
[float]
Expected score for a particular puzzle (e.g. 17.0)
pareto
[str]
Type of strategy used (e.g. "FalseSubopt")
playerSolutions
[list]
List of players' solutions (e.g. [['TTG-A', 'GT--AA', 'GCGA', '-TGCA', 'CT', 'CT-GA']]),
playerIDs
[list]
List of player IDs that submitted solutions described in playerSolutions (e.g. ['1269979'])

pikdik: Puzzle Object

ATTRIBUTE
TYPE
DESCRIPTION
pikdik.puzzle
[list]
Collapsed symbol-wise split puzzle sequences (e.g. [['T', 'T', 'G', 'A', '-', '-', '-'], ['G', 'T', 'A', 'A', '-', '-', '-'], ['G', 'C', 'G', 'A', '-', '-', '-'], ['T', 'G', 'C', 'A', '-', '-', '-'], ['C', 'T', '-', '-', '-', '-', '-'], ['C', 'T', 'G', 'A', '-', '-', '-']] )
pikdik.par_puzzle
[list]
Symbol-wise split pareto solution of the puzzle (e.g. [['T', 'T', '-', 'G', 'A', '-', '-'], ['G', 'T', '-', 'A', 'A', '-', '-'], ['G', 'C', '-', 'G', 'A', '-', '-'], ['T', 'G', '-', 'C', 'A', '-', '-'], ['C', 'T', '-', '-', '-', '-', '-'], ['C', 'T', '-', 'G', 'A', '-', '-']] )
pikdik.columns
[str(list)]
Column indexes in original alignment (e.g. '[99, 100, 101, 102, 103, 104, 105]' )
pikdik.flanks
[list]
Column indexes used as a flanks for a puzzle (e.g [105, 106])
pikdik.consensus
[list]
Predefined guides for a puzzle (used in scoring) (e.g. [('C', 'T'), ('C', 'T'), ('-', 'C'), ('C', 'T'), ('-', 'A'), ('G', 'A'), ('-', 'C'), ('-', 'G')] )
pikdik.cons_scores
[list]
Counter for each nucleotide from consenus to appear in the alignment (e.g. [(6287, 2552), (3416, 2688), (9666, 1), (2938, 2669), (9545, 55)] )
pikdik.n
[int]
Number of sequences. (e.g. 6)
pikdik.bonus
[float]
Multiplier for aligned "line" perfectly matching given consensus / guide (e.g. 1.15)
pikdik.level
[int]
Difficulty level of the puzzle in range [1, 9] (e.g. 2)

This website stores cookies on your computer. Cookies Policy