May 13, 2026

How to Validate Reference Genes with geNorm and NormFinder

If you're using GAPDH as your reference gene because "everyone in the lab uses it," you probably have unstable normalisation and don't know it. Validating reference genes with algorithms like geNorm and NormFinder is the single most impactful thing you can do to improve the reliability of your RT-qPCR data — and it takes maybe half a day of bench work plus an hour of analysis.

The short version: run a panel of 6–8 candidate reference genes across all your experimental conditions, then feed the Ct (or Cq) values into geNorm and NormFinder. geNorm ranks genes by pairwise variation and tells you how many reference genes you need. NormFinder estimates intra- and intergroup variation and picks the single most stable gene. Use both. If they agree, you're solid. If they disagree, look at why — it usually tells you something useful about your experimental system.

Choosing Your Candidate Panel

You need candidates before you can validate them, and the candidates should be biologically diverse. If you pick five ribosomal genes, they'll all co-regulate under translational stress and geNorm will happily tell you they're "stable" — because it measures pairwise consistency, not absolute stability. That's the most common pitfall with geNorm, and I'll come back to it.

A reasonable starting panel for mammalian cells:

GAPDH — glycolysis
ACTB — cytoskeleton
HPRT1 — purine salvage
B2M — MHC class I component
TBP — basal transcription
YWHAZ — signal transduction
RPL13A or RPLP0 — ribosomal (pick one, not both)
SDHA — mitochondrial electron transport

For plant work, PP2A, UBC, EF1α, and TUA are good starting points. For zebrafish, eef1a1l1 and rpl13a tend to do well, but validate anyway — that's the whole point.

Run these across every condition in your experiment: all treatment groups, all time points, all tissue types. Use the same RNA samples you'll use (or comparable ones) for your actual study. Minimum of three biological replicates per group, though five or more gives you much more confidence in the stability rankings. Use a single reference gene assay plate if you can — this minimises technical noise in the validation itself.

Each assay should have verified efficiency between 90–110% (ideally 95–105%) before you start. If efficiencies differ wildly between genes, the algorithms' assumptions start to break down, particularly for geNorm which assumes comparable efficiencies in its pairwise comparisons. Run standard curves with a 5-point, 4-fold dilution series (or 5-fold, your call) and confirm R² ≥ 0.98.

How geNorm Works (and What M and V Actually Mean)

geNorm was published by Vandesompele et al. (2002) and remains the most widely used reference gene stability algorithm. It calculates a gene-stability measure M for each candidate by looking at the average pairwise variation of that gene with all other candidates across your samples.

Here's the logic: if two genes have a constant Ct ratio across all your samples, they're either both stable or both co-regulated. geNorm assumes that co-regulation of all candidates is unlikely (which is why your panel needs biological diversity). The gene with the highest M value is the least stable and gets eliminated. The algorithm iterates, removing one gene at a time, until two genes remain.

Interpreting M values:

M < 0.5: highly stable (typical for homogeneous cell line experiments)
M < 1.0: acceptable for most experiments
M > 1.5: unstable — don't use this gene under these conditions

The other key output is the V value (pairwise variation Vn/n+1), which tells you how many reference genes you need. V compares the normalisation factor calculated with n genes versus n+1 genes. The original paper suggests a cutoff of V < 0.15 — once adding another gene changes your normalisation factor by less than 15%, you have enough.

In practice:

If V2/3 < 0.15, two reference genes suffice.
If V2/3 > 0.15 but V3/4 < 0.15, use three.
If nothing drops below 0.15, your system is highly variable and you should consider whether your candidate panel is adequate — or whether your experimental conditions are simply too heterogeneous for a single normalisation strategy.

A common scenario: you're comparing gene expression across four different mouse tissues. V2/3 is 0.22, V3/4 is 0.18, V4/5 is 0.11. You'd use the top five reference genes. Yes, that's a lot of reference genes. That's also why single-tissue experiments are so much easier to normalise than multi-tissue panels.

The co-regulation trap: If you include RPL13A, RPLP0, and RPS18 (all ribosomal), geNorm will see them as beautifully stable pairs. They are — relative to each other. But if ribosomal biogenesis shifts across your conditions, all three move together and geNorm can't detect it. This is a fundamental limitation of pairwise comparison methods. It's not a bug, it's the math. Mitigate it by ensuring your panel spans different functional categories.

How NormFinder Works (and When It Disagrees with geNorm)

NormFinder (Andersen et al., 2004) takes a different approach. Instead of pairwise comparisons, it uses a model-based method that estimates both intragroup and intergroup variation for each candidate gene. You define your experimental groups (e.g., control vs. treated, or tissue A vs. tissue B vs. tissue C), and the algorithm separates systematic variation between groups from random variation within groups.

The output is a stability value — lower is better, similar to geNorm's M. But the key difference: NormFinder can identify a gene that is individually stable even if it doesn't pair well with others. It can also suggest the best combination of two genes, which is often more informative than the best single gene.

When geNorm and NormFinder disagree:

This happens more often than you'd think, and it's usually informative. The typical scenario:

geNorm ranks GAPDH and ACTB as the top pair because they track each other tightly.
NormFinder ranks TBP first because it has low intergroup variation, even though it doesn't correlate with other genes as well.

In this case, NormFinder is probably giving you the better answer. GAPDH and ACTB might be co-regulated (both respond to cell proliferation rate, metabolic state, or serum concentration), and their agreement with each other masks their shared instability. TBP, which sits in a completely different pathway, may genuinely be less variable across your conditions.

My rule of thumb: trust NormFinder for ranking the single best gene, trust geNorm for telling you how many genes you need. Use both and look for consensus in the top 3–4 genes. If a gene appears in the top tier of both algorithms, use it with confidence.

Running the Analysis: A Practical Walkthrough

Let's say you have Ct data for 6 candidate genes across 4 groups (control, treatment A, treatment B, treatment C), 4 biological replicates each — 16 samples total. Here's the workflow.

1. Organise your data. You need a matrix: genes in columns, samples in rows, raw Ct values in cells. Average your technical replicates first (and flag any with SD > 0.5 Ct — those need re-running). This gives you a 16 × 6 table.

2. Convert Ct to linear scale for geNorm. geNorm works on relative quantities, not raw Ct. Convert each Ct to a relative quantity using:

$$Q = E^{(\text{Ct}{\min} - \text{Ct}{\text{sample}})}$$

where E is the gene's amplification efficiency (e.g., 2.0 for 100% efficiency) and Ct_min is the lowest Ct observed for that gene across all samples. This ensures the highest-expressed sample gets a value of 1 and everything else is a fraction.

If your efficiencies are all close to 2.0, you can simplify to $Q = 2^{(\text{Ct}{\min} - \text{Ct}{\text{sample}})}$. If they're not close, use gene-specific efficiencies — this matters.

3. Run geNorm. Feed the relative quantities into the geNorm algorithm. You can use:

The original qBase+ software (now part of CellCarta's Biogazelle suite)
The NormqPCR package in R (Bioconductor)
The ctrlGene R package
RefFinder (web tool at https://www.ciidirsinaloa.com.mx/RefFinder-master/) — runs geNorm, NormFinder, BestKeeper, and delta-Ct method simultaneously

Record the M values and the V values. Plot them — the stepwise elimination chart and the pairwise variation bar chart are the two figures you'll want for your paper's supplementary materials.

4. Run NormFinder. NormFinder takes raw Ct values or log-transformed linear quantities, depending on the implementation. Critically, you must define your experimental groups — this is what enables the intergroup variance estimation. Feed it the same 16 × 6 matrix plus a group identifier column.

The original NormFinder is an Excel add-in (still works fine), or use the R implementation in the NormqPCR package. It outputs a stability value per gene and the best two-gene combination.

5. Compare and decide. Make a table showing the ranking from each algorithm. Something like:

Rank	geNorm	NormFinder
1	TBP	TBP
2	YWHAZ	SDHA
3	SDHA	YWHAZ
4	HPRT1	HPRT1
5	ACTB	B2M
6	B2M	ACTB

In this (hypothetical but realistic) example, TBP wins both. YWHAZ and SDHA swap positions but are both in the top tier. Use TBP and YWHAZ (or TBP and SDHA) as your two reference genes. Check the geNorm V2/3 — if it's < 0.15, two genes are enough and you're done.

That's a lot of manual wrangling. Converting Ct to relative quantities, iterating geNorm, recording M and V values — VoilaPCR Plus does the geNorm half for you: upload your validation plate and get M values, a stability ranking, and the V-based count of how many reference genes you need.

Try VoilaPCR free →

Mistakes I See Regularly

Validating in one condition and applying to another. You validated reference genes in your cell line under normoxia vs. hypoxia, then six months later you're using the same genes for a drug treatment experiment. Re-validate. It takes one qPCR plate.

Using too few samples. Running three samples per group with two groups gives you six data points per gene. That's not enough for stable rankings. geNorm's pairwise approach is particularly sensitive to small sample sizes — the rankings can shuffle with the addition of a single outlier sample. Aim for at least 12–15 total samples.

Ignoring efficiency differences. If GAPDH runs at 98% efficiency and HPRT1 runs at 88% efficiency, the Ct gap between them shifts systematically with template concentration. This biases geNorm's pairwise calculation. Either use efficiency-corrected relative quantities or get your assays into the 95–105% range before validating.

Reporting geNorm M values without saying which software you used. The original geNorm and qBase+ calculate M identically, but some implementations (including some older RefFinder versions) have had bugs or use slightly different elimination procedures. State your tool and version.

Not including this analysis in your paper. MIQE guidelines (Bustin et al., 2009) explicitly require reference gene validation. Reviewers increasingly check for it. Put the geNorm M and V charts and the NormFinder stability values in your supplementary data. It takes one supplementary figure and one supplementary table, and it preempts the most common reviewer criticism of any qPCR-based paper.

Wrapping Up

Once you've validated your reference genes, the rest of your analysis — ΔΔCt calculations, statistical comparisons, fold-change plots — becomes dramatically more trustworthy. It's the foundation, and skipping it is like running a Western blot without confirming your antibody specificity. If you want to skip the manual data wrangling and spreadsheet gymnastics, VoilaPCR can run stability analysis on your candidate reference genes and flag unstable ones automatically when you upload your qPCR data. But whether you use a tool or do it in R or Excel, just do it. Your data deserves stable normalisation.