6.1 Software Development

The plyranges package develops a suite of verbs for interacting with genomic data as a GRanges object. Since its release on Bioconductor, it has been relatively successful: it has been downloaded 26,874 times from 14,271 unique IP addresses. I have also had the privilege of teaching workshops on plyranges at Bioconductor conferences which also led to the development of the fluentGenomics workflow package, outlined in Chapter 3. A broader impact of the work, has been the discussions around the concepts of fluent interfaces and tidy data within the Bioconductor community, which has led to several developments currently in place that are exploring different approaches for fluent interfaces for other types of omics data. The plyranges package is available to download from https://bioconductor.org/packages/plyranges and the fluentGenomics workflow is available to download from https://bioconductor.org/packages/release/workflows/html/fluentGenomics.html.

The superintronic software described in Chapter 4 has been used in S. Lee et al. (2020) to disentangle and view intron signal in RNA-seq data. Here, we again show the strengths of providing a long-form representations of genomics data (in this case coverage vectors). By leveraging plyranges we were then able to create a set of data descriptors that we could link back to the raw data to discover genes thought to be associated with a real biological signal. An interesting extension to this work would be applying it to single cell and long-read based transcriptomics data, where scalability and much larger design matrices would become an issue. The superintronic package is available to download from https://github.com/sa-lee/superintronic.

Finally, the liminal software aims to provide a more holistic approach to analysis tasks requiring the use of dimensionality reduction algorithms. We showed how to incorporate interactive graphics and tours to identify problems with embeddings. Based on the case studies provided I believe that the methods used in liminal could be broadly applicable to many high dimensional datasets and NLDR methods. The liminal package is available to download from https://github.com/sa-lee/liminal.