yieldReduce {Rsamtools}R Documentation

Iterate through a BAM (or other) file, reducing output to a single result.

Description

Rsamtools files can be created with a ‘yieldSize’ argument that influences the number of records (chunk size) input at one time (see, e.g,. BamFile). yieldReduce iterates through the file, processing each chunk and reducing it with previously input chunks. This is a memory efficient way to process large data files, especially when the final result fits in memory.

Usage

yieldReduce(X, MAP, REDUCE, DONE, ..., init, ITERATE = TRUE)

Arguments

X

A BamFile instance (or other class for which isOpen, open, close methods are defined, and which support input of sequential chunks).

MAP

A function of one or more arguments, X, ..., returning a VALUE passed to DONE and REDUCE.

REDUCE

A function of one (ITERATE=FALSE or two (ITERATE=TRUE) arguments, returning the reduction (e.g., addition) of the argument(s). If missing, REDUCE is c (when ITERATE=TRUE) or identity when (when ITERATE=FALSE).

DONE

A function of one argument, the VALUE of the most recent call to MAP(X, ...). If missing, DONE is function(VALUE) length(VALUE) == 0.

...

Additional arguments, passed to MAP.

init

(Optional) Initial value used for REDUCE when ITERATE=TRUE.

ITERATE

logical(1) determining whether the call to REDUCE is iterative (ITERATE=TRUE) or cumulative (ITERATE=FALSE).

Details

When ITERATE=TRUE, REDUCE is initially invoked with either the init value and the value of the first call to MAP or, if init is missing, the values of the first two calls to MAP.

When ITERATE=FALSE, REDUCE is invoked with a list containing a list with as many elements as there were calls to MAP. Each element the result of an invocation of MAP.

Value

The return value is the value returned by the final invocation of REDUCE, or init if provided and no data were yield'ed, or list() if init is missing and no data were yield'ed.

Author(s)

Martin Morgan mtmorgan@fhcrc.org

See Also

BamFile, TabixFile, RsamtoolsFile.

Examples

fl <- system.file(package="Rsamtools", "extdata", "ex1.bam")

## nucleotide frequency of mapped reads
bf <- BamFile(fl, yieldSize=500) ## typically, yieldSize=1e6
param <- ScanBamParam(
    flag=scanBamFlag(isUnmappedQuery=FALSE),
    what="seq")
MAP <- function(X, param) {
    value <- scanBam(X, param=param)[[1]][["seq"]]
    if (length(value))
        alphabetFrequency(value, collapse=TRUE)
    else value       # will be integer(0)
}
REDUCE <- `+`        # add successive alphabetFrequency matrices 
yieldReduce(bf, MAP, REDUCE, param=param)

## coverage
if (require(GenomicAlignments)) {
    MAP <- function(X)
        coverage(readGAlignments(X))
    REDUCE <- `+`
    DONE <- function(VALUE)
        ## coverage() on zero GAlignments returns an RleList,
        ## each element of  which has 0 coverage
        sum(sum(VALUE)) == 0L
    yieldReduce(bf, MAP, REDUCE, DONE)
}


[Package Rsamtools version 1.16.0 Index]