CLI_tools source code on github: pairwise.fst

###### Genotype

Populations | AA | Aa | aa |
---|---|---|---|

Subpopulation 1 | 125 | 250 | 125 |

Subpopulation 2 | 50 | 30 | 20 |

Subpopulation 3 | 100 | 500 | 400 |

*N* (number of individuals genotyped. The sum of each of the rows in the table above):

1 | Population 1: 500 |

Remember that the number of alleles is **TWICE** the number of genotypes.

**Step 1. Calculate the gene (allele) frequencies**

Each homozygote will have two alleles, each heterozygote will have one allele. Note that the denominator will be twice `Ni`

(twice as many alleles as individuals).

Eqns FST.1

`p1`

means the frequency of allele`A`

in population 1.

**Step 2. Calculate the expected genotypic counts under Hardy-Weinberg Equilibrium**, and then calculate the **excess or deficiency of homozygotes in each subpopulation**:

1 | Pop. 1 Expected AA = 500*0.5^2 = 125 (= observed) |

Note that sum of two types of homozygote excess = amount of heterozygote deficiency. These quantities have to balance (it’s a mathematical necessity, given that `p + q =1`

.

1 | Pop. 3 Expected AA = 1,000*0.35^2 = 122.5 (observed has deficiency of 22.5) |

Summary of homozygote deficiency or excess relative to `HWE`

:

- Pop. 1. Observed = Expected: perfect fit
- Pop. 2. Excess of 15.5 homozygotes: some inbreeding
- Pop. 3. Deficiency of 45 homozygotes: outbred or experiencing a Wahlund effect (isolate breaking).

**Step 3. Calculate the local observed heterozygosities** of each subpopulation (we will call them Hobs s, where the s subscript refers to the

`sth`

of `n`

populations – 3 in this example).Here we count genotypes:

1 | Hobs 1 = 250/500 = 0.5 |

**Step 4. Calculate the local expected heterozygosity, or gene diversity, of each subpopulation** (modified version of Eqn 35.1):

Eqns FST.2

(With two alleles it would actually be easier to use

`2pq`

than to use the summation format of Eqn 33.1)

**Notation**: Note that I am using `p1`

and `q1`

here (where the subscripts refer to subpopulations 1 through 3). We would need to use multiple subscripts if we were using the notation of Eqn 35.1 where the alleles are `pi`

(and the i refer to alleles 1 to k). Indeed, with real multi-locus multipopulation data, we would have a triple summation and three subscripts; one for alleles (`i =1 to k`

), one for the loci (l =1 tom), and one for subpopulations (s = 1 to n).

**Step 5. Calculate the local inbreeding coefficient of each subpopulation** (same as Eqn 35.4, except that we are subscripting for the subpopulations):

Eqn FST.3where

s(`s = 1 to 3`

) refers to the subpopulation

1 | F1 = (0.5 - 0.5) / 0.5 = 0 |

**Step 6. Calculate p (p-bar, the frequency of allele A) over the total population.**

[Note that if we had more alleles we could put this and Step 7 all together as a single “global gene frequencies” step, or have one for each allele frequency].

1 | (2*125+250+2*50+30+2*100+500)/(1000+200+2000)=0.4156 {genotype splitting method} |

or (yields same answer)

1 | (0.5*1000+0.65*200+0.35*2000)/(1000+200+2000)=0.4156 {using Eqn FST.1 values for ps} |

Note that we weight by **population size**

**Step 7. Calculate q (q-bar, the frequency of allele a) over the total population**

1 | (2*125+250+2*20+30+2*400+500)/(1000+200+2000)=0.5844 |

Check: `p-bar + q-bar = 0.4156 + 0.5844 = 1.0`

(as required by Eqn 31.1).

The check doesn’t guarantee that our result is correct, but if they don’t sum to one, we know we miscalculated.

**Step 8. Calculate the global heterozygosity indices (over Individuals, Subpopulations and Total population)**

Note that the first two calculations employ a weighted average of the values in the whole set of subpopulations.

`HI`

based on **observed** heterozygosities in **individuals** in subpopulations

Eqn FST.4

`HS`

based on **expected** heterozygosities in **subpopulations**

Eqn FST.5

`HT`

based on **expected** heterozygosities for overall total population (using global allele frequencies and a modified form of Eqn 35.1):

Eqn FST.6

or we could calculate it as `2*p-bar *q-bar = 2 * 0.4156 * 0.5844 = 0.4858`

**Step 9. CALCULATE THE GLOBAL F-STATISTICS**

Compare and contrast the global FISbelow with the

**local inbreeding coefficient**

`Fs`

of Step 5.Here we are using a weighted average of the individual heterozygosities over all the subpopulations.

Both

`FIS`

and `Fs`

are, however, based on the `observed`

heterozygosities, whereas `FST`

and `FIT`

are based on expected heterozygosities.

Eqn FST.7

Eqn FST.8

Eqn FST.9

**Notation note**: the subscripts `I`

, `S`

, and `T`

are not summation subscripts. They simply indicate the level of our analysis. Likewise, the `s`

on `Fs`

in Step 5 or on the `ps`

in Step 1 (the `s`

was implicit there) just tell us what we are referring to. In contrast, the subscripts for `Eqn 35.1`

and `35.2`

are used in summations and change as we work through the pieces of the calculation.

**Step 10. Finally, draw some conclusions about the genetic structure of the population and its subpopulations**.

- One of the possible HWE conclusions we could make:
- Pop. 1 is consistent with
`HWE`

(results of Step 2)

- Pop. 1 is consistent with
- Two of the possible “local inbreeding” conclusions we could make from Step 5:
- Pop. 2 is inbred (results of Step 5), and
- Pop. 3 may have disassortative mating or be experiencing a Wahlund effect (more heterozygotes than expected).

- Conclusion concerning overall degree of genetic differentiation (
`FST`

)- Subdivision of populations, possibly due to genetic drift, accounts for approx. 3.4% of the total genetic variation (result of Eqn FST.8 FST calculation in Step 9),

- No excess or deficiency of heterozygotes over the total population (
`FIT`

is nearly zero).