The $1,000 Challenge!
To go straight to the actual challenge click here.
Crack the code! The authors announce the HEGP (heegeepee;
hiːdʒiːpiː) Challenge, with a $1,000 (one thousand dollar) prize for
the individual or group who can crack HEGP encrypted data. HEGP, when
proven solid, will have a large impact on the way human genetics is
pursued today because it will allow for sharing genotype data while
preserving privacy of the individual. Not only is sharing data
required for reproducible Science, there is also a large interest in
hosting data on laptops and servers that need not be HIPAA compliant.
On this page we chat about cracking HEGP, Rubik cube, DNA, strawberries and Enigma.
To go straight to the actual challenge click here.
Relevance
With a publication in Genetics we present a novel Homomorphic encryption method named HEGP that allows for sharing of genotypes and phenotypes in the context of Genome Wide Association (GWA) studies. Importantly sharing but without giving away private information and making individuals identifiable. For more information check out a privacy researcher's perspective and a recent follow up paper on Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality by Tianjing Zhao, Fangyi Wang, Richard Mott, Jack Dekkers, and Hao Cheng, Genetics. 2024 Mar 6;226(3):iyad210. doi: 10.1093/genetics/iyad210. PMID: 38085098, where they combine Bayesian variable selection methods for genetic parameter estimation, genomic prediction, and GWAS.
With HEGP anyone can freely hand over genotype and phenotype data to anyone else and have them reproduce the results. Even better, the encoding can be stacked. So the second party can encrypt and combine their data independently and give it to a third party.
So any party can use our data to add it their own analysis. And they can share their data back with us. As such, genotype data can be stored on (public) servers and GWA analysis can be reproduced.
This is a breakthrough in FAIR data sharing and contrasts greatly with current practice of hiding/protecting genotype data and only providing summary statistics. One example is an important UK Biobank depression study where genetic markers are presented with their statistical significance. We can only assume that this study can be reproduced by a group of researchers having access to the original data. The truth is that these outcomes can not be reproduced by you or me! HEGP will make data sharing and reproducible analysis a reality.
Cite
Mott R, Fischer C, Prins P, Davies RW. Private Genomes and Public SNPs: Homomorphic Encryption of Genotypes and Phenotypes for Shared Quantitative Genetics. Genetics. 2020 Jun;215(2):359-372. doi: 10.1534/genetics.120.303153. Epub 2020 Apr 23. PMID: 32327562; PMCID: PMC7268998.
Simple explanation
Homomorphic encryption is a mathematical translation of data into an
encrypted form where the result of a computation is the same for the
unencrypted and encrypted forms. With HEGP a matrix of data is
transformed by a high-dimensional random linear orthogonal
transformation key as described in the paper (open access) and
visualised in below animation (hit ). The resulting matrix
scrambles the data while preserving the 'shape of the data' for
analysis.
One way to think about this is that when a Rubik cube gets rotated the
fields change colour, but the object still maintains its shape as a
three dimensional cube. With DNA the genotype/phenotype shape is
typically used to predict associations between genotypes and
phenotypes.
An example of a phenotype is a preference for
strawberry taste. An example of an associated genotype is a DNA
encoded olfactory receptor.
Here we display a typical example of genome-wide association (GWA) of phenotype against genotype:
Genes (on chromosomes) involved in some trait are marked. This is the backbone computation for finding genes involved in some trait and pursued in the UK Biobank involving half a million subjects. To find associations GWA is applied to find genes involved in, for example, cancer or COVID-19 mortality; i.e., the first step towards finding causality and potentially better treatments.
In above image data is shown before and after encryption. The unencrypted data contains three values while the encrypted data shows a normal distribution.
Enigma and why a challenge?
The Enigma machine is an encryption device developed and used in the
20th century to protect commercial, diplomatic and military
communication. It was employed extensively by Nazi Germany during
World War II, in all branches of the German military (source
wikipedia). Enigma
encrypted text by a transformation and was cracked by the Polish
Cipher Bureau in 1932 and the crack was used by the allied forces to
win the war. To ascertain HEGP is bullet proof, unlike ENIGMA, we
invite the algorithmic inclined to crack the code and make HEGP
history (one way or the other).