The accurate long-read Next-Generation Technology
MAIN STEPS IN PACBIO SEQUENCING
First step: Sample preparation
PacBio sequencing is highly sensitive to contaminants that are carried over from the DNA or RNA extraction. This makes it critical to subject the extracted nucleic acids to additional steps of contaminant removal, which depend on the source of the material. The purity of the sample as it enters into the next step, ultimately has a significant impact on sequencing yield.
Second step: Library preparation
This step involves a number of procedures that prepare the material for sequencing. In the case of RNA, it involves the enrichment of transcripts to the desired target size as well as the retrotranscription into double stranded DNA. As it's for the case of DNA as well, it involves steps that repair any possible damage in the dsDNA and adding adapters to both ends. It also combines different methods for DNA shearing and size selection to obtain the desired average molecular size of the genetic material to maximize downstream performance of the entire workflow in terms of CCS yield, raw sequencing yield, loading efficiency into the SMRTcell, and sequencing coverage.
Third step: PacBio sequencing
The sequencing itself involves the annealing of the sequencing primers and binding of the polymerase to the template. Loading efficiency, which is defined as the proportion of ZMWs that produce viable sequencing reads, is affected by multiple factors that include the size of the DNA molecule and possible contaminants that are carried over all the way to the sequencing chip. In an average sequencing reaction, more than half of the 8 million ZMWs of a Sequel II SMRTcell are expected to produce data, but about a quarter will involve sequencing reactions that generated sufficient number of "passes" to produce a CCS read. This means that, by limiting the bioinformatics to CCS reads, half of ZMWs that produced useful data are normally discarded. Therefore, while CCS accuracy is needed for most current PacBio applications, algorithms that are able to handle the sequencing noise of raw data, such as the ones created by Sequegenics, can be much more powerful.
Fourth step: Bioinformatics
Bioinformatics of PacBio sequencing data is performed with tools that are normally developed specifically for long-read NGS. These tools leverage the length of the reads for multiple applications such as building de novo assemblies, identifying structural variants, calling haplotype variants. Some are able to handle the error rate of raw subreads, while others need CCS reads to be called before processing data. Sequegenics offers a number of bioinformatic tools, which include end-to-end pipelines and a unique set of tools built around proprietary algorithms.
Fifth step: Analytics
Many times, building an accurate genome assembly or calling haplotypes or structural variants from a sample is just the beginning. Specially relevant for population genomics projects, downstream analytics can leverage the higher sequencing resolution achieved with long-reads to drive new discoveries. AI and ML pipelines require input data to be in formats that facilitate rapid and efficient query process. Sequegenics has developed tools that automatically transform variant calling data into special data storage format compatible with big data downstream analytics.