BED file input class.
More...
#include <varfiles.hpp>
|
uint64_t | _numLines () |
| Get number of lines in the _bimFile More...
|
|
uint64_t | _famLines () |
| Get number of lines in the _famFile More...
|
|
uint64_t | _famLines (fstream &fam) |
| Copy the .fam file and count number of lines. More...
|
|
void | _ld (const char *snp1, const char *snp2, const size_t &N, const unsigned short &pad, double &rSq, double &Dprime, double &dcnt1, double &dcnt2) |
| Between-SNP linkage disequilibrium (LD) More...
|
|
void | _ld (const char *snp1, const char *snp2, const PopIndex &popID, vector< double > &rSq, vector< double > &Dprime, vector< double > &dcnt1, vector< double > &dcnt2) |
| Between-SNP LD within populations. More...
|
|
| VarFile () |
| Default constructor (protected)
|
|
|
fstream | _famFile |
| Corresponding .fam file stream.
|
|
fstream | _bimFile |
| Corresponding .bim file stream.
|
|
string | _fileStub |
| File name stub (minus the extension)
|
|
string | _fileName |
| File name.
|
|
size_t | _nCols |
| Number of elements in a row.
|
|
size_t | _elemSize |
| Size of each element in bytes.
|
|
fstream | _varFile |
| Variant file stream.
|
|
static const vector< char > | _masks = {static_cast<char>(0x03), static_cast<char>(0x0C), static_cast<char>(0x30), static_cast<char>(0xC0)} |
| Genotype bit masks. More...
|
|
static const unordered_map< char, string > | _tests |
| Genotype bit tests. More...
|
|
BED file input class.
Reads BED files and the auxiliary files that come with them (.fam and .bim) as necessary. Only the SNP-major version is supported.
◆ BedFileI()
sampFiles::BedFileI::BedFileI |
( |
const string & |
stubName | ) |
|
|
inline |
File name constructor.
- Parameters
-
[in] | stubName | file name minus the extension |
◆ _famLines() [1/2]
uint64_t BedFileI::_famLines |
( |
| ) |
|
|
protected |
Get number of lines in the _famFile
Assumes Unix-like line endings. The result is equal to the number of individuals.
- Returns
- number of lines in
_famFile
◆ _famLines() [2/2]
uint64_t BedFileI::_famLines |
( |
fstream & |
fam | ) |
|
|
protected |
Copy the .fam file and count number of lines.
Assumes Unix-like line endings. The result is equal to the number of individuals. The current object's .fam file is copied to the provided file stream, which should be open for raading. If not, the function throws a string object `‘Output .fam filestream not open’'.
- Parameters
-
- Returns
- number of lines in
_famFile
◆ _ld() [1/2]
void BedFileI::_ld |
( |
const char * |
snp1, |
|
|
const char * |
snp2, |
|
|
const PopIndex & |
popID, |
|
|
vector< double > & |
rSq, |
|
|
vector< double > & |
Dprime, |
|
|
vector< double > & |
dcnt1, |
|
|
vector< double > & |
dcnt2 |
|
) |
| |
|
protected |
Between-SNP LD within populations.
Calculates two LD statistics ( \( r^2 \) and \( D' \)) between two SNPs from a BED file. Missing values are ignored. If there are fewer than three haplotypes with data present at both loci, the return values are -9. This value is also returned if one of the loci is monomorphic after taking out missing data at the other SNP. Minor (not necessarily derived) allele counts are also reported to enable downstream filtering. Note that the populations are assumed diploid and the counts are of haploid chromosomes (i.e. one homozygote yields count of 2). The values are calculted within each population as indicated by the PopIndex
object. The results are returned in the supplied vectors, which are assumed to be of correct size. Since this is an internal function unexposed to the user, this is not chaecked to save on compuation steps. Care must be taken that the char
arrays passed to the function have lengths compatible with the number of individuals indexed by PopIndex
. This is not checked.
- Parameters
-
[in] | snp1 | first SNP |
[in] | snp2 | second SNP |
[in] | popID | population index |
[out] | rSq | vector of \( r^2 \) estimates |
[out] | Dprime | vector of \( D' \) estimates |
[out] | dcnt1 | vector of minor allele counts at locus 1 |
[out] | dcnt2 | vector of minor allele counts at locus 2 |
◆ _ld() [2/2]
void BedFileI::_ld |
( |
const char * |
snp1, |
|
|
const char * |
snp2, |
|
|
const size_t & |
N, |
|
|
const unsigned short & |
pad, |
|
|
double & |
rSq, |
|
|
double & |
Dprime, |
|
|
double & |
dcnt1, |
|
|
double & |
dcnt2 |
|
) |
| |
|
protected |
Between-SNP linkage disequilibrium (LD)
Calculates two LD statistics ( \( r^2 \) and \( D' \)) between two SNPs from a BED file. Missing values are ignored. If there are fewer than three haplotypes with data present at both loci, the return values are -9. This value is also returned if one of the loci is monomorphic after taking out missing data at the other SNP. Minor (not necessarily derived) allele counts are also reported to enable downstream filtering. Note that the populations are assumed diploid and the counts are of haploid chromosomes (i.e. one homozygote yields count of 2).
- Parameters
-
[in] | snp1 | first SNP |
[in] | snp2 | second SNP |
[in] | N | length of the genotype vector in bytes (four genotypes per byte) |
[in] | pad | number of bit pairs of padding in the last byte |
[out] | rSq | the \( r^2 \) estimate |
[out] | Dprime | the \( D' \) estimate |
[out] | dcnt1 | minor allele count at locus 1 |
[out] | dcnt2 | minor allele count at locus 2 |
◆ _numLines()
uint64_t BedFileI::_numLines |
( |
| ) |
|
|
protected |
Get number of lines in the _bimFile
Assumes Unix-like line endings. The result is equal to the number of SNPs.
- Returns
- number of lines in
_bimFile
◆ sample()
void BedFileI::sample |
( |
BedFileO & |
out, |
|
|
const uint64_t & |
n |
|
) |
| |
Sample SNPs and save to BED file.
Sample \(n\) SNPs without replacement from the file represented by the current object and save to the out
object. Uses Vitter's [3] method. Number of samples has to be smaller that the number of SNPs in the file.
- Parameters
-
[in] | out | output object |
[in] | n | number of SNPs to sample |
◆ sampleLD() [1/2]
void BedFileI::sampleLD |
( |
const PopIndex & |
popID, |
|
|
const uint64_t & |
n |
|
) |
| |
LD among sampled sites within populations.
Samples sequential pairs of SNPs and calculates two LD measures ( \( r^2 \) and \( D' \)) within populations indicated by PopIndex
. Saves to a file with the same name as the one preceding the .bed etc extensions, but adds _LD.tsv at the end. Each line is tab-delimited with the chromosome number (from the .bim file), between-SNP distance, non-reference allele count for each SNP, \( r^2 \), and \( D' \). Missing data are ignored (only pairwise-complete observations are included). If one of the SNPs is monomorphic or if the total number of pairwise present genotypes is fewer than three (exclusive), the LD measures are returned as -9 to indicate missing values.
- Parameters
-
[in] | popID | population index |
[in] | n | number of SNP pairs to sample |
◆ sampleLD() [2/2]
void BedFileI::sampleLD |
( |
const uint64_t & |
n | ) |
|
Linkage disequilibrium among sampled sites.
Samples sequential pairs of SNPs and calculates two LD measures ( \( r^2 \) and \( D' \)). Saves to a file with the same name as the one preceding the .bed etc extensions, but adds _LD.tsv at the end. Each line is tab-delimited with the chromosome number (from the .bim file), between-SNP distance, non-reference allele count for each SNP, \( r^2 \), and \( D' \). Missing data are ignored (only pairwise-complete observations are included). If one of the SNPs is monomorphic or if the total number of pairwise present genotypes is fewer than three (exclusive), the LD measures are returned as -9 to indicate missing values.
- Parameters
-
[in] | n | number of SNP pairs to sample |
The documentation for this class was generated from the following files: