Genotyping: What is its output and how do we process it?
Now you may ask yourself: what after genotyping? How do we obtain results and user data from the GeneTitan machine? Let’s look at Figure 1 below, representing the scheme that we follow to process genotyping data.
Figure 1: Genotyping data processing
After genotyping on the GeneTitan machine, we obtain raw signal intensity values and convert them to meaningful genotype data. We store these intensity value as a set of 5 different files, including the CEL file, which we will explain below.
What is a CEL file?
A CEL file is an Affymetrix Probe GeneChip results file. The file contains information on the probe set’s intensity values, where a probe represents genes. We store the following information within the CEL file:
- intensity values
- the standard deviation of the intensity
- the number of pixels used to calculate the intensity value
- a flag to indicate an outlier as calculated by the algorithm
- and a user-defined flag indicating the feature should be excluded from the future analysis [1].
In other words, it contains information about single nucleotide polymorphisms (SNPs). We extract Information about probes from image DNA microarray data by an image analysis software called Affymetrix and a process known as a genotype calling. More about this in the following section.
What is genotype calling?
Genotype calling is a process where intensity CEL files are converted to VCF files containing the user's genotype data. It takes intensity data from CEL files, performs classification of every SNP, assigns genotype, and obtains data files used for all other processes.
Let’s explain this in more detail. The Gene Titan machine is measuring the color intensity for any given SNP where each SNP has its own probe to which a stain/color molecule attaches. Once the probe is photoexcited it emits the signal in the specific wavelength range and we record this in .CEL/.DAT files. The genotype calling process, in its essence, is converting these intensity calculations into concrete genotype information (CC, AC, AA).
Axiom Power Tools
Genotype calling can be performed by either using software called Axiom Analysis Suite provided by Affymetrix, which has a graphical user interface and is used to perform various sets of analysis, or by using Axiom Power Tools, which is a set of command-line tools for analyzing Affymetrix microarray data.
We use Axiom Power Tools to automatically perform genotype calling through our pipeline, which takes CEL files and generates VCF data. Once we finish genotyping call, we obtain VCF files for all our users.
Variant Call Format (VCF) files contain meta-information about an SNP like a chromosomal position in the genome, SNP identifier such as rs number, the reference base non-reference bases, quality score, etc. [2].
Extracted data may contain thousands of data points, making it large in size. After obtaining our SNPs data, we are ready to process them further, including SNPs selection and report generation—more on that in the following articles.
Written by: Nermin Đuzić, M.Sc. in Genetics, Content Specialist