|
| GenoTableBin () |
| Default constructor.
|
|
| GenoTableBin (const std::string &inputFileName, const uint32_t &nIndividuals, std::string logFileName) |
| Constructor with input file name.
|
|
| GenoTableBin (const std::string &inputFileName, const uint32_t &nIndividuals, std::string logFileName, const size_t &nThreads) |
| Constructor with input file name and thread count.
|
|
| GenoTableBin (const std::vector< int > &maCounts, const uint32_t &nIndividuals, std::string logFileName) |
| Constructor with count vector.
|
|
| GenoTableBin (const std::vector< int > &maCounts, const uint32_t &nIndividuals, std::string logFileName, const size_t &nThreads) |
| Constructor with count vector and thread count.
|
|
| GenoTableBin (const GenoTableBin &toCopy)=delete |
| Copy constructor (deleted)
|
|
GenoTableBin & | operator= (const GenoTableBin &toCopy)=delete |
| Copy assignment operator (deleted)
|
|
| GenoTableBin (GenoTableBin &&toMove) noexcept=default |
| Move constructor.
|
|
GenoTableBin & | operator= (GenoTableBin &&toMove) noexcept=default |
| Move assignment operator.
|
|
| ~GenoTableBin ()=default |
| Destructor.
|
|
void | saveGenoBinary (const std::string &outFileName) const |
| Save the binary genotype file.
|
|
void | allJaccardLD (const InOutFileNames &bimAndLDnames, const size_t &suggestNchunks=static_cast< size_t >(1)) const |
| All by all Jaccard similarity LD with locus names.
|
|
void | saveLogFile () const |
| Save the log to a file.
|
|
Class to store binary compressed genotype tables.
Converts genotype data to a lossy compressed binary code. Genotypes are stored in memory in a one-bit format: bit set for the minor allele, unset for the major. Bits corresponding to missing data are unset (this is the same as mean imputation), heterozygotes are set with a 50% probability.
BayesicSpace::GenoTableBin::GenoTableBin |
( |
const std::string & | inputFileName, |
|
|
const uint32_t & | nIndividuals, |
|
|
std::string | logFileName, |
|
|
const size_t & | nThreads ) |
Constructor with input file name and thread count.
The file should be in the plink
.bed format. Heterozygotes are assigned the major or minor allele at random, missing genotypes are assigned the major allele. If necessary, alleles are re-coded so that the set bit is always the minor allele. The number of threads requested is maximum to be used, depending on available system resources.
- Parameters
-
[in] | inputFileName | input file name |
[in] | nIndividuals | number of genotyped individuals |
[in] | logFileName | name of the log file |
[in] | nThreads | maximal number of threads to use |
BayesicSpace::GenoTableBin::GenoTableBin |
( |
const std::vector< int > & | maCounts, |
|
|
const uint32_t & | nIndividuals, |
|
|
std::string | logFileName ) |
|
inline |
Constructor with count vector.
Input is a vector of minor allele counts (0, 1, or 2) or -9 for missing data. Heterozygotes are assigned the major or minor allele at random, missing genotypes are assigned the major allele. The counts are checked and re-coded if necessary so that set bits represent the minor allele. This function should run faster if the 0 is the major allele homozygote. While the above values are the norm, any negative number will be interpreted as missing, any odd number as 1, and any (non-0) even number as 2. The input is a vectorized matrix of genotypes. The original matrix has individuals on rows, and is vectorized by row.
- Parameters
-
[in] | maCounts | vector of minor allele numbers |
[in] | nIndividuals | number of genotyped individuals |
[in] | logFileName | name of the log file |
BayesicSpace::GenoTableBin::GenoTableBin |
( |
const std::vector< int > & | maCounts, |
|
|
const uint32_t & | nIndividuals, |
|
|
std::string | logFileName, |
|
|
const size_t & | nThreads ) |
Constructor with count vector and thread count.
Input is a vector of minor allele counts (0, 1, or 2) or -9 for missing data. Heterozygotes are assigned the major or minor allele at random, missing genotypes are assigned the major allele. The counts are checked and re-coded if necessary so that set bits represent the minor allele. This function should run faster if the 0 is the major allele homozygote. While the above values are the norm, any negative number will be interpreted as missing, any odd number as 1, and any (non-0) even number as 2. The input is a vectorized matrix of genotypes. The original matrix has individuals on rows, and is vectorized by row. The number of threads requested is maximum to be used, depending on available system resources.
- Parameters
-
[in] | maCounts | vector of minor allele numbers |
[in] | nIndividuals | number of genotyped individuals |
[in] | logFileName | name of the log file |
[in] | nThreads | maximal number of threads to use |
void BayesicSpace::GenoTableBin::allJaccardLD |
( |
const InOutFileNames & | bimAndLDnames, |
|
|
const size_t & | suggestNchunks = static_cast< size_t >(1) ) const |
All by all Jaccard similarity LD with locus names.
Calculates linkage disequilibrium among all loci using Jaccard similarity and \(r^2\) as the statistics. Result is a vectorized lower triangle of the symmetric \(N \times N\) similarity matrix, where \(N\) is the number of loci. Row and column locus names are also included in the tab-delimited output file. The lower triangle is vectorized by column (i.e. all correlations of the first locus, then all remaining correlations of the second, etc.). If the result does not fit in RAM, calculates in blocks and saves to disk periodically.
- Parameters
-
[in] | bimAndLDnames | name of the input .bim file that has locus names and the output LD value file name |
[in] | suggestNchunks | force processing in chunks |