Informatics at CIDR

CIDR has a multi-disciplinary informatics team led by Sean Griffith, Informatics Software Project Manager, and Elvin Hsu, IT Systems & Support Manager. In addition to bioinformatics, database, systems, and user support professionals, our software development group designs custom solutions for internal clients. Using a team-based, flexible development methodology, the software development team -- fluent in Java, Python, hibernate, JDBC, Django, Perl, JavaScript, SQL and shell scripting -- works collaboratively with colleagues to build and maintain a variety of essential tools to support CIDR's genotyping and next-generation sequencing services:
- laboratory information management system (LIMS)
- automated research data analysis pipelines
- automated clinical diagnostic testing pipelines
- tailor-made reporting tools
- integration and customization of third-party products
Informatics Infrastrucure
To meet the enormous computational and data management challenges created by genotyping and sequencing over 100,000 samples per year, we have continuously expanded and improved our informatics infrastructure. We also have access to the new Maryland Advanced Research Computing Center (MARCC) on the Hopkins Bayview campus. This HPC facility opened in May 2015 with over 19,000 cores (including large-memory and GPU nodes), 20 petabytes of storage and 100 gigabits/second network connectivity, and has optional CLIA-certified services.
Technical Reports
CIDR informatics staff, working closely with laboratory and scientific colleagues, regularly conduct technology evaluation projects that, if successful, may result in new production processes or systems. When appropriate and time permits, a project technical report is produced to share what we've learned: Implementing a Third-Party Inventory System at CIDR.
CIDR-Authored Software Applications
Because of CIDR's high-throughput workflows, quality control standards, and customized services, off-the-shelf software products do not always meet our needs. When it is not possible to integrate a third-party software into our workflows, CIDR's in-house software development team creates analytical, bioinformatics, laboratory information management and utility applications that are tightly coupled and integrated with our custom databases.
LastCall Pipeline and Genotyping Data Release Tools
Genotype calling and QC are performed by LastCall, a Java application integrated with the in-house developed Cerberus LIMS and Phoenix sample handling system. LastCall is an automated analysis pipeline for data generated by Illumina's Infinium genotyping products. CIDR’s pipeline augments the functionality of Illumina®'s IAAP utility, which converts scanner-produced IDAT files into GTC files. These GTC files are then read to calculate metrics from these data, including call rate, AA/AB/BB call frequencies, mean intensities for raw X and raw Y, estimated gender, and no-call and total call counts. These GTC files are also used to generate release genotype files and PLINK files via a set of Genotyping Data Release Tools mirroring some of the functionality of Illumina’s GenomeStudio, but designed to more smoothly handle very large genotyping projects.
Bioinformatics Applications Manager (BAM)
CIDR staff typically access our information systems through Web or Java client-based graphical user interfaces created by the CIDR software development team. Most CIDR custom software is executed via the Bioinformatics Applications Manager (BAM). Written in-house using Java SE 8 and JavaFX, the BAM provides a single consistent desktop GUI for most applications, regardless of programming language or other constraints.
Cerberus
Designed to replace the legacy genotyping wet-bench LIMS developed in 2006, Cerberus is a MySQL database backed Java framework consisting of tools to reduce deployment disruption to lab processing, limit database schema complexity, and create a catalogue of reusable graphical user interfaces (GUI) to flexibly build and extend workflows for any protocol driven lab technology. Currently in production use for Illumina Infinium protocols, lab managers create experiments by queuing samples from our in-house sample handling system (Phoenix) and connecting these lists to code templates that load wet-bench protocol specific phases that accept required and optional user input (UI). Phase classes that are common between protocols can be reused for each job, reducing codebase complexity. Custom validations are in place to help efficiently use lab automation resources, check expected reagent information, and prevent sample swapping. A robust comment tracking UI is easily accessible from any phase ensuring comments are not forgotten. Reports and utility jobs integral to downstream analysis and administrative needs have been implemented in a consolidated and efficient manner. Multiple installations exist to aid in testing and transition of jobs from software developers to lab management and into production use. Cerberus has been in production use since early 2019.
Phoenix and PhoenixWeb
Phoenix is a sample and project tracking system developed in-house by the CIDR software development team. Designed to accommodate the volume and complexity of sample management for CIDR’s SNP genotyping and sequencing work, Phoenix has been used in production starting in 2015. Phoenix is tightly integrated with a web portal that allows investigators to communicate with Phoenix to check on the progress of their projects and respond to problem reports. Phoenix provides the flexibility and sophisticated problem-handling that enables us to deliver the high-throughput, high-quality service that makes CIDR a unique national resource to the life science research community.
Contact
Please feel free to contact us if you are a CIDR project investigator with any questions or concerns, are interested in our informatics tools or services or in pursuing potential collaborations. We can most easily be reached at cidr_informatics at lists.johnshopkins.edu.