Crowdsourcing: Viruses, Variants, and Genomic…

Jessica Hockett

3 hrs ago

Open comments

Read →

4 Comments

Thomas Kenworthy

Not a specific answer. But check out Jamie Andrews substack controlstudies.

He is doing it.

Expand full comment

Reply (1)

Jessica Hockett

Thanks. I'm aware of what he's doing but it doesn't truly address the questions above, AFAIK. Feel free to quote/cite from the work.

Expand full comment

Rob (c137)

1hEdited

Strain and variant are the same thing. Covid was full of renaming in order to create confusion and/or hype. I suspect most of it was just PR propaganda like "novel virus" which I recall not being able to find anything about back then.

DNA testing/genetics is unreliable and full of scientism belief leading to assumptions.

They did go into sequencing here.

https://controlstudies.substack.com/p/the-dna-hoax-4e4

Apparently they start with a template. This means that they need an assumption before they find the object.

And here's the part where forensic DNA testing is shown to be unreliable.

https://controlstudies.substack.com/p/the-dna-hoax-0a2

I recall a show where they mentioned that in court a DNA match can be challenged as there are many false positives.

Perhaps this is due to the template process?

Apparently PCR is used to amplify the DNA which is sequenced. This is another pseudoscientific invention that crazily assumes that the copies are identical to the original. If that were true, running many cycles would not lead to false positives, like copying a digital file over and over. Instead, PCR is more like an analog copy and we know that when copying a VHS or tape over and over one gets static more and more which is random noise aka not a perfect copy.

https://robc137.substack.com/p/pcr-fails-logic-from-the-start-sorry

Expand full comment

henjin1024

26m

1. In the Fan Wu paper the authors did metagenomic sequencing, where they sequenced total RNA in a sample of lung fluid, which mostly consisted of the genomic RNA of SARS-CoV-2, RNA expressed by SARS-CoV-2, RNA expressed by the human host, and RNA expressed by bacteria. The library kit used in the study included a gDNA eliminator step which removed most genomic DNA, but small amounts of DNA have remained in the sample. The authors then used reverse transcriptase to convert the RNA to DNA in order to sequence the DNA.

The authors then used the de-novo assembler MEGAHIT to merge overlapping reads into longer fragments of contiguous segmetns. They got a total of 384,096 contigs with MEGAHIT, which some no-virus people misinterpreted to mean 384,096 candidate genomes of SARS-CoV-2. But actually almost all of the contigs were fragments of the human genome of bacterial genomes, and there was likely only a single contig for SARS-CoV-2.

In the version of Fan Wu's reads that are available for download from the Sequence Read Archive, human reads have been replaced with N letters for privacy reasons, so it's not possible to reproduce their MEGAHIT results exactly. But when I ran MEGAHIT with the same settings as the authors, I got only one contig for SARS-CoV-2, one contig for a Streptococcus phage, and two short contigs for a parvovirus, but no contigs for any other species of viruses: https://sars2.net/hamburgmath.html#Short_version.

In order to determine which contigs matched viruses, I downloaded a collection of about 15,000 virus reference sequences, and I aligned all of my contigs against the refseqs. Even if I would've used a set of refseqs from 2019 that didn't yet include SARS-CoV-2, the contig for SARS-CoV-2 would've still aligned against the refseqs of SARS-CoV and the Bulgarian SARS-like virus BM48. Usually even a novel species of virus would be similar enough to an existing species of viruses that it gets aligned against one or more sequences of previously published viruses.

2. In order to demarcate species of coronaviruses, the ICTV employs a framework which performs hierarchical clustering of aligned sequences in the 3CLpro, NiRAN, RdRP, ZBD and HEL1 domains, which produces a measure called PPD (pairwise patristic distance). Within the order _Nidovirales_, ICTV employs a PPD threshold of about 0.1 at the species level, 0.2 at the subgenus level, 0.9 at the genus level, 1.6 at the subfamily level, and 3.0 at the family level: https://ictv.global/report/chapter/coronaviridae/coronaviridae.

3. Normally in genetics the term "variant" is synonymous with a mutation, so for example "variant calling" means finding what positions within sequencing reads have a mutation relative to a reference sequence. And people would say that current strains of Omicron have about 150 variants from the Wuhan strain, which typically consists of about 130-150 substitutions, 8-10 deletions, and 1 insertion.

The use of the term "variant" to refer to the major subtypes of SARS-CoV-2 is somewhat unusual. The equivalent term is "subtype" for HIV, influenza A, and RSV, "genotype" for measles, hepatitis, rotavirus A, and "type" for HPV.

4. The 2009 swine flu strain of H1N1 had about 5% nucleotide distance to its closest neighbor in the HA segment. The HA segment is the equivalent of the spike protein in SARS-CoV-2, so it is more variable than other segments of the genome, and the nucleotide distance across all segments combined was lower than 5%. If your hypothetical virus detected in Wuhan would've had a similarly low distance to previously documented strains of influenza, it wouldn't have been classified as a new species of virus.

5. Yes if you're talking about metagenomic sequencing. But metagenomic sequencing of SARS-CoV-2 is rare in practice, and almost all of the millions of public sequences of SARS-CoV-2 were sequenced with a PCR-based protocol, where PCR primers were used to amplify a panel of overlapping segments of the genome. Sometimes the emergence of a new mutation led to amplicon dropoff, where one segment out of the panel of segments ended up missing, but it was easy to if there were no reads that covered one part of the genome.

If a new SARS-like virus entered into circulation in humans, and someone tried to sequence it using a PCR amplicon panel designed for SARS-CoV-2, it might result in most amplicons getting dropped off, but if even one segment of the genome was amplified successfully, and the authors did variant calling for the reads that matched the segment, they could see that the segment had unusual mutations which had not been documented among variants of SARS-CoV-2. So as a result the authors could next do metagenomic sequencing for the same sample to sequence the whole genome of the new SARS-like virus.

Expand full comment

Wood House 76

Crowdsourcing: Viruses, Variants, and Genomic…