Welcome to PGR-TK’s documentation!¶
## Pangenome Research Toolkit
The PGR-TK is a python library for pangenome analysis. It includes features to (1) index human pangneome assembly and use the index for query (2) construct Minimizer Anchored Pangenome (MAP) Graph (3) Perform principal bundle decomposition for each haplotype to understand the repeat structures in pangenome.
PGR-TK implements the most computationally intensive algorithms with the Rust programming language and exposes an application interface in the Python programming language. This enables scripting and interactive analysis, e.g., with Jupyter Notebook.
Here is a brief example using PGR-TK to generate a local MAP Graph of the MHC Class II locus:
import pgrtk
# lazy load an agc file of the reference without any SHIMMER index
ref_db = pgrtk.AGCFile("/data/pgr-tk-HGRP-y1-evaluation-set-v0.agc")
sdb = pgrtk.SeqIndexDB()
sdb.load_from_agc_index("/data/pgr-tk-HGRP-y1-evaluation-set-v0")
ref_file_name, roi_chr, roi_b, roi_e = 'hg19_tagged.fa', "chr6_hg19", 32130918, 32959917
padding = 0
# get a segment of a reference
roi_seq = ref_db.get_sub_seq(ref_file_name, roi_chr, roi_b-padding, roi_e+padding)
# get the hits from the pangenome reference
aln_range = pgrtk.query_sdb(sdb, roi_seq, merge_range_tol=200000)
# collect the target sequences from the hits
seq_list = []
i = 0
for k in list(aln_range.keys()):
ctg_name, source, _ = seq_info[k]
seq_id = k
rgns = aln_range[k].copy()
rgns = pgrtk.merge_regions(rgns,tol=1000)
for rgn in rgns:
b, e, length, orientation, aln = rgn
seq = sdb.get_sub_seq(source, ctg_name, b, e)
if orientation == 1:
seq = pgrtk.rc_byte_seq(seq)
seq_list.append(("{}_{}_{}_{}".format(ctg_name, b, e, orientation), seq))
i += 1
# Create a shimmer index database with smaller window (denser shimmers)
new_sdb = pgrtk.SeqIndexDB()
new_sdb.load_from_seq_list(seq_list, w=80, k=56, r=12, min_span=18)
new_sdb.generate_mapg_gfa(0, "/results/HLA-ClassII.gfa")