Sample SNPs
Fast ordered sampling of rows from large text or binary files. Special cases for DNA variant files (.bed, VCF, HapMap, etc).
Public Member Functions | Protected Member Functions | List of all members
sampFiles::GtxtFileI Class Reference

Text file input class. More...

#include <varfiles.hpp>

Inheritance diagram for sampFiles::GtxtFileI:
[legend]
Collaboration diagram for sampFiles::GtxtFileI:
[legend]

Public Member Functions

 GtxtFileI ()
 Default constructor.
 
 GtxtFileI (const string &fileName)
 File name constructor with header specification. More...
 
 GtxtFileI (const string &fileName, const bool &head)
 File name constructor with header specification. More...
 
 GtxtFileI (const GtxtFileI &in)=default
 Copy constructor.
 
GtxtFileIoperator= (const GtxtFileI &in)=default
 Copy assignment.
 
 GtxtFileI (GtxtFileI &&in)=default
 Move constructor.
 
GtxtFileIoperator= (GtxtFileI &&in)=default
 Move assignment.
 
 ~GtxtFileI ()
 Destructor.
 
void open ()
 Open stream to read.
 
void sample (GtxtFileO &out, const uint64_t &n, const bool &headSkip)
 Sample rows and save to a text file. More...
 
void sample (const uint64_t &n, const bool &headSkip, const char &delim, vector< string > &out)
 Sample rows and save export to a vector of strings. More...
 
uint64_t nlines ()
 Number of SNPs in the object.
 

Protected Member Functions

virtual uint64_t _numLines ()
 Get number of rows in the text file. More...
 

Detailed Description

Text file input class.

Reads text files, skipping or copying the header as necessary.

Constructor & Destructor Documentation

◆ GtxtFileI() [1/2]

sampFiles::GtxtFileI::GtxtFileI ( const string &  fileName)
inline

File name constructor with header specification.

Parameters
[in]fileNamefile name including extension

◆ GtxtFileI() [2/2]

sampFiles::GtxtFileI::GtxtFileI ( const string &  fileName,
const bool &  head 
)
inline

File name constructor with header specification.

Parameters
[in]fileNamefile name including extension
[in]headheader presence

Member Function Documentation

◆ _numLines()

uint64_t GtxtFileI::_numLines ( )
protectedvirtual

Get number of rows in the text file.

Assumes Unix-like line endings. Header, if present, is not counted. Is overriden in some, but not all, derived classes.

Returns
number of rows

◆ sample() [1/2]

void GtxtFileI::sample ( const uint64_t &  n,
const bool &  headSkip,
const char &  delim,
vector< string > &  out 
)

Sample rows and save export to a vector of strings.

Sample \(n\) rows without replacement from the file represented by the current object and output a vector of strings. Each field separated by the specified delimiter is stored as an element of the vector. Uses Vitter's [3] method. Number of samples has to be smaller that the number of rows in the file. The output vector is erased if it is not empty.

Parameters
[in]nnumber of SNPs to sample
[in]headSkipskip header? Ignored if there is no header
[in]delimfield delimiter
[out]outoutput vector

◆ sample() [2/2]

void GtxtFileI::sample ( GtxtFileO out,
const uint64_t &  n,
const bool &  headSkip 
)

Sample rows and save to a text file.

Sample \(n\) lines without replacement from the file represented by the current object and save to the out object. Uses Vitter's [3] method. Number of samples has to be smaller that the number of rows in the file.

Parameters
[in]outoutput object
[in]nnumber of rows to sample
[in]headSkipskip header? Ignored if there is no header

The documentation for this class was generated from the following files: