Python modules

Access Python modules by the following command or variations thereof:

import selectiontest
selectiontest.selectiontest.test_neutrality(sfs, variates0=None, variates1=None, reps=10000)

Calculate \(\rho\), the log odds ratio of the data for the distribution given by variates0 over the distribution given by variates1.

Parameters:
  • sfs (list) – Site frequency spectrum, e.g. [1, 3, 0, 2, 1]
  • variates0 (numpy array) – Array of variates from null hypothesis distribution. Default uses Wright-Fisher model.
  • variates1 (numpy array) – Array of variates from alternative distribution. Default uses `uniform’ model.
  • reps (int) – Number of variates to generate if default is used.
Returns:

\(\rho\) (value of log odds ratio). Values can include inf, -inf or nan if one or both probabilities are zero due to underflow error.

Return type:

numpy.float64

selectiontest.selectiontest.calculate_D(sfs)

Calculate Tajima’s D from a site frequency spectrum.

Parameters:sfs (list) – Site frequency spectrum, e.g. [1, 3, 0, 2, 1]
Returns:Value of Tajima’s D.
Return type:numpy.float64
selectiontest.selectiontest.sample_wf_distribution(n, reps)

Calculate variates for the probability distribution Q under Wright Fisher model.

Parameters:
  • n (int) – Sample size
  • reps (int) – Number of variates to generate if default is used.
Yields:

numpy.ndarray – Array of variates (n-1)

selectiontest.selectiontest.sample_uniform_distribution(n, reps)

Calculate variates for the uniform probability distribution Q.

Parameters:
  • n (int) – Sample size
  • reps (int) – Number of variates to generate if default is used.
Returns:

Array of variates (reps, n-1)

Return type:

numpy.ndarray

selectiontest.selectiontest.compute_threshold(n, seg_sites, sreps=10000, wreps=10000, fpr=0.02)

Calculate threshold value of \(\rho\) corresponding to a given false positive rate (FPR). For values of \(\rho\) above the threshold we reject the null (by default neutral) hypothesis.

Parameters:
  • n (int) – Sample size
  • seg_sites (int) – Number of segregating sites in sample.
  • sreps (int) – Number of SFS configs and of uniform variates to generate if default is used.
  • wreps (int) – Number of Wright-Fisher variates to generate if default is used.
  • fpr (float) – Selected FPR tolerance.
Returns:

Threshold value for log odds ratio

Return type:

numpy.float64

selectiontest.selectiontest.piecewise_constant_variates(n, timepoints, pop_sizes, reps=10000)

Generate variates corresponding to a piecewise constant demographic history.

Parameters:
  • n (int) – Sample size
  • timepoints (array-like) – Times at which population changes (in generations, backward from the present).
  • pop_sizes (array-like) – Population sizes between timepoints (only relative sizes matter.)
  • reps (int) – Number of variates to generate.
Yields:

numpy.ndarray – Variates

selectiontest.selectiontest.vcf2sfs(vcf_file, panel, coord, start, end, select_chr=True)

Get SFS from vcf data for given population and sequence. The panel file is used to select samples.

Parameters:
  • vcf_file (pyvcf class: Reader (https://pyvcf.readthedocs.io/en/latest/)) – Variant details
  • panel (pandas DataFrame) – Proband details
  • coord (str) – Coordinate (e.g. chromosome).
  • start (int) – Start position of sequence.
  • end (int) – End position of sequence.
  • select_chr (bool) – If True, sample first chromosome. If False, use both.
Returns:

  • list – Site frequency spectrum
  • int – Sample size
  • list – Names of variants common to all elements of the sample.

The module vcf2sfs uses the pyVCF library for VCF processing: see https://pypi.org/project/PyVCF/.