What is the output from genotyping and how do we process it? – BioCertica
DNA within man

What is the output from genotyping and how do we process it?

What is the output from genotyping and how do we process it?

Written by: Nermin Đuzić, M.Sc. in Genetics, Content Specialist

At the beginning of our series of articles, we introduced you to the basics of genetics and genetic testing. Then we explained DNA sample collection and extraction, followed by quality checks. These were necessary steps to undergo genotyping explained in the last few articles.

Now you may ask yourself: what after genotyping? How do we obtain results and user data from the GeneTitan machine? Let’s look at Figure 1 below, representing the scheme that we follow to process genotyping data.

Processing genotyping data

Figure 1: Genotyping data processing

After genotyping on the GeneTitan machine, we obtain raw signal intensity values that have to be converted to meaningful genotype data. These intensity value are stored as a set of 5 different files, including the CEL file, which we will explain below.

A CEL file is an Affymetrix Probe GeneChip results file. The file contains information on the probe set’s intensity values, where a probe represents genes. The following information is stored within the CEL file: 

  • intensity values
  • the standard deviation of the intensity
  • the number of pixels used to calculate the intensity value
  • a flag to indicate an outlier as calculated by the algorithm 
  • and a user-defined flag indicating the feature should be excluded from the future analysis [1]. 

In other words, it contains information about single nucleotide polymorphisms (SNPs). Information about probes gets extracted from image DNA microarray data by an image analysis software called Affymetrix and a process known as a genotype calling. More about this in the following section.

What is genotype calling?

Genotype calling is a process where intensity CEL files are converted to VCF files containing the user's genotype data. It takes intensity data from CEL files, performs classification of every SNP, assigns genotype, and obtains genotype data files used for all other processes.

Let’s explain this in more detail. The Gene Titan machine is measuring the color intensity for any given SNP where each SNP has its own probe to which a stain/color molecule is attached. Once the probe is photoexcited it emits the signal in the specific wavelength range and this is recorded in .CEL/.DAT files. The genotype calling process, in its essence, is converting these intensity calculations into concrete genotype information (CC, AC, AA).

Genotype calling can be performed by either using software called Axiom Analysis Suite provided by Affymetrix, which has a graphical user interface and is used to perform various sets of analysis, or by using Axiom Power Tools, which is a set of command-line tools for analyzing Affymetrix microarray data.

We use Axiom Power Tools to automatically perform genotype calling through our pipeline, which takes CEL files and generates VCF data. Once the genotype call is finished, we obtain VCF files for all our users. 

Variant Call Format (VCF) files contain meta-information about an SNP like a chromosomal position in the genome,  SNP identifier such as rs number, the reference base non-reference bases, quality score, etc. [2].

Extracted data may contain thousands of data points, making it large in size. After obtaining our SNPs data, we are ready to process them further, including SNPs selection and report generation—more on that in the following articles. 

References

[1] Affymetrix® CEL Data File Format. (n.d.). Retrieved from Affymetrix CEL data file format 

[2] IGSR: The International Genome Sample Resource. (n.d.). Retrieved from VCF (Variant Call Format) version 4.0 | 1000 Genomes (internationalgenome.org)