Date China gathers 1B citizen genomes.


New York Times (June 17, 2020) has an article on how the Chinese state is collecting a massive genomic database of 700 million men (full population coverage). This is used in order to fight crime, so far resulting in captures of otherwise elusive criminals (similar to in the USA, e.g. Golden State Killer). The database will also present the Chinese government with enormous statistical power for genomics research that could be used to train very accurate genomic prediction models for medical and eugenic purposes (artificial selection). British intelligence researcher Richard Lynn speculated already back in 2001 that China would pursue such eugenic technology in the first half of the 21st century. In 2018, Chinese researcher He Jiankui caused a shockwave when it was revealed he had genetically altered embryos and which were later born as healthy babies. Many Western governments or government bodies are also pursuing large-scale genome biobank projects (e.g. US Million Veteran Program), though not so far at the scale of the Chinese program. Private consumer companies are also building massive databanks.

The question is: When will the Chinese government have gathered genome-wide data on 1 billion citizens?

This question resolves positively when a reputable scientific source reports that Chinese has reached 1 billion genotyped or sequenced genomes from their own citizens (a reputable source is e.g. Nature News, MIT Tech Review, or similar, as well as any reviewed paper in the unlikely event that it is reported in a journal before a science news source).

Further details:

  • The date for resolving is the one where this goal was reached, not the reporting date. For the purpose of this question, whole genome sequencing and whole-genome microarray technology count

  • By citizens, we mean citizens of Mainland China, Macau and Hong Kong

  • Genomes from monozygotic (identical) twins count as multiple different genomes for the purpose of this question

Genotyping only counts if it is both broad (samples widely in the genome), and deep (samples many loci, say >500k).

