NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements

Ryan Connor, Rodney Brister, Jan P Buchmann, Ward Deboutte, Rob Edwards, Joan Martí-Carreras, Mike Tisza, Vadim Zalunin, Juan Andrade-Martínez, Adrian Cantu, Michael D’Amour, Alexandre Efremov, Lydia Fleischmann, Laura Forero-Junco, Sanzhima Garmaeva, Melissa Giluso, Cody Glickman, Margaret Henderson, Benjamin Kellman, David Kristensen, Carl Leubsdorf, Kyle Levi, Shane Levi, Suman Pakala, Vikas Peddu, Alise Ponsero, Eldred Ribeiro, Farrah Roy, Lindsay Rutter, Surya Saha, Migun Shakya, Ryan Shean, Matthew Miller, Benjamin Tully, Christopher Turkington, Ken Youens-Clark, Bert Vanmechelen, and Ben Busby

In January 2019 the National Center for Biotechnology Information (NCBI) launched a virus discovery hackathon. Ten teams comprised of over 40 participants from six countries, created a crowd-sourced set of analysis and processing pipelines in a three-day event. From the NCBI Sequence Read Archive (SRA) a total of 4.2 million contiguous assemblies with minimal length of 1 kb were used as input. The original intent of the hackathon was to develop an index of SRA run sets that could be searched based on the viral content. The hackathon provided use cases, among which instruction manuals guiding future hackathons, and clustering (see figure below) a promising data compression approach.

Published September 2019 in Genes, Special Issue Viral Diagnostics Using Next-Generation Sequencing