Projects and Resources

bracken_plot – a shiny app for taxonomic abundance visualization

Expand details…

As a graduate student studying metagenomics, I frequently use Derrick Wood’s Kraken2 software for accurate and lightning-fast classification of short reads. Bracken is complimentary software that uses Kraken reports to produce genus- and species-level abundance estimates. Often, a first step in metagenomic analysis is examining the distribution of different organisms across samples. To that end, I’ve created a simple shiny app for quick, customizable plotting of taxonomic structures encoded in multi-sample Bracken reports.

Example output from the bracken_plot app showing genus-level relative abudance across oral microbiome samples. Don’t mind the human read contamination!

findmotifs.R – an R script for DNA motif identification in metagenomic bins

Expand details…

To understand the genetic context and biological relevance of DNA motifs, it’s often valuable to analyze their distribution and sequence conservation. Many programs exist for both the identification of novel motifs and the scoring of known motifs, though few of these tools work out-of-the-box for highly fragmented bacterial genomes. To address this, I’ve written findmotifs.R – an easy-to-implement R script to find and score short sequences using position-frequency matrices (PFMs). Given a set of contigs and a list of PFMs, this script returns an easy-to-parse table containing the sequence, score, and strand-wise position of each match above a user-defined threshold. findmotifs.R requires R version 4.0 and three packages: Biostrings, optparse, and stringr.

findmotifs.R is invoked from the command line with user inputs. Progress for each motif prints to the console.