|
| GenoTableHash () |
| Default constructor.
|
|
| GenoTableHash (const std::string &inputFileName, const IndividualAndSketchCounts &indivSketchCounts, const size_t &nThreads, std::string logFileName) |
| Constructor with input file name and thread number.
|
|
| GenoTableHash (const std::string &inputFileName, const IndividualAndSketchCounts &indivSketchCounts, std::string logFileName) |
| Constructor with input file name.
|
|
| GenoTableHash (const std::vector< int > &maCounts, const IndividualAndSketchCounts &indivSketchCounts, const size_t &nThreads, std::string logFileName) |
| Constructor with count vector and thread number.
|
|
| GenoTableHash (const std::vector< int > &maCounts, const IndividualAndSketchCounts &indivSketchCounts, std::string logFileName) |
| Constructor with count vector.
|
|
| GenoTableHash (const GenoTableHash &toCopy)=delete |
| Copy constructor (deleted)
|
|
GenoTableHash & | operator= (const GenoTableHash &toCopy)=delete |
| Copy assignment operator (deleted)
|
|
| GenoTableHash (GenoTableHash &&toMove) noexcept=default |
| Move constructor.
|
|
GenoTableHash & | operator= (GenoTableHash &&toMove) noexcept=default |
| Move assignment operator.
|
|
| ~GenoTableHash ()=default |
| Destructor.
|
|
void | allHashLD (const float &similarityCutOff, const InOutFileNames &bimAndLDnames, const size_t &suggestNchunks=static_cast< size_t >(1)) const |
| All by all LD from hashes.
|
|
std::vector< HashGroup > | makeLDgroups (const size_t &nRowsPerBand) const |
| Assign groups by linkage disequilibrium (LD)
|
|
void | makeLDgroups (const size_t &nRowsPerBand, const InOutFileNames &bimAndGroupNames) const |
| Assign groups by LD and save to a file with locus names.
|
|
void | ldInGroups (const SparsityParameters &sparsityValues, const InOutFileNames &bimAndLDnames, const size_t &suggestNchunks=static_cast< size_t >(1)) const |
| LD in groups.
|
|
void | saveLogFile () const |
| Save the log to a file.
|
|
Class to store compressed genotype tables.
Provides facilities to store and manipulate compressed genotype tables. Genotypes are stored in a one-bit format: bit set for the minor allele, unset for the major. Bits corresponding to missing data are unset (this is the same as mean imputation), heterozygotes are set with a 50% probability.
BayesicSpace::GenoTableHash::GenoTableHash |
( |
const std::vector< int > & | maCounts, |
|
|
const IndividualAndSketchCounts & | indivSketchCounts, |
|
|
const size_t & | nThreads, |
|
|
std::string | logFileName ) |
Constructor with count vector and thread number.
Input is a vector of minor allele counts (0, 1, or 2) or -9 for missing data. Heterozygotes are assigned the major or minor allele at random, missing genotypes are assigned the major allele. The counts are checked and re-coded if necessary so that set bits represent the minor allele. This function should run faster if the 0 is the major allele homozygote. While the above values are the norm, any negative number will be interpreted as missing, any odd number as 1, and any (non-0) even number as 2. The input is a vectorized matrix of genotypes. The original matrix has individuals on rows, and is vectorized by row. The binary stream is then hashed using a one-permutation hash (OPH; one sketch per locus). Bits are permuted using the Fisher-Yates-Durstenfeld algorithm. Filling in empty bins using the Mai et al. (2020) algorithm. The number of threads specified is the maximal that will be used. Actual number depends on system resources.
- Parameters
-
[in] | maCounts | vector of minor allele numbers |
[in] | indivSketchCounts | number of individuals and sketches |
[in] | nThreads | maximal number of threads to use |
[in] | logFileName | name of the log file |
BayesicSpace::GenoTableHash::GenoTableHash |
( |
const std::vector< int > & | maCounts, |
|
|
const IndividualAndSketchCounts & | indivSketchCounts, |
|
|
std::string | logFileName ) |
|
inline |
Constructor with count vector.
Input is a vector of minor allele counts (0, 1, or 2) or -9 for missing data. Heterozygotes are assigned the major or minor allele at random, missing genotypes are assigned the major allele. The counts are checked and re-coded if necessary so that set bits represent the minor allele. This function should run faster if the 0 is the major allele homozygote. While the above values are the norm, any negative number will be interpreted as missing, any odd number as 1, and any (non-0) even number as 2. The binary stream is then hashed using a one-permutation hash (OPH; one sketch per locus). Bits are permuted using the Fisher-Yates-Durstenfeld algorithm. Filling in empty bins using the Mai et al. (2020) algorithm.
- Parameters
-
[in] | maCounts | vector of minor allele numbers |
[in] | indivSketchCounts | number of individuals and sketches |
[in] | logFileName | name of the log file |
void BayesicSpace::GenoTableHash::allHashLD |
( |
const float & | similarityCutOff, |
|
|
const InOutFileNames & | bimAndLDnames, |
|
|
const size_t & | suggestNchunks = static_cast< size_t >(1) ) const |
All by all LD from hashes.
Calculates linkage disequilibrium among all loci using a modified OPH. Result is a vectorized lower triangle of the symmetric \(N \times N\) similarity matrix, where \(N\) is the number of loci. All values belong to the same group. Row and column locus names are also included in the tab-delimited output file. The lower triangle is vectorized by column (i.e. all correlations of the first locus, then all remaining correlations of the second, etc.). If suggestNchunks
is set, processing the data at least in the given number of chunks even if everything fits in RAM. If the resulting chunks are still too big to fit in RAM, the number is adjusted up. Otherwise, set the number of chunks automatically. If the .bim file name is left blank or the file does not exist, base-1 locus indexes are used instead of locus names.
- Parameters
-
[in] | similarityCutOff | only save pairs with at least this similarity |
[in] | bimAndLDnames | name of the .bim file with locus names and the output LD results file |
[in] | suggestNchunks | force processing in chunks |