matchPWM {Biostrings}R Documentation

PWM creating, matching, and related utilities

Description

Position Weight Matrix (PWM) creating, matching, and related utilities for DNA data. (PWM for amino acid sequences are not supported.)

Usage

PWM(x, type = c("log2probratio", "prob"),
    prior.params = c(A=0.25, C=0.25, G=0.25, T=0.25))

matchPWM(pwm, subject, min.score="80%", ...)
countPWM(pwm, subject, min.score="80%", ...)
PWMscoreStartingAt(pwm, subject, starting.at=1)

## Utility functions for basic manipulation of the Position Weight Matrix
maxWeights(x)
minWeights(x)
maxScore(x)
minScore(x)
unitScale(x)
## S4 method for signature 'matrix'
reverseComplement(x, ...)

Arguments

x For PWM: a rectangular character vector or rectangular DNAStringSet object ("rectangular" means that all elements have the same number of characters) with no IUPAC ambiguity letters, or a Position Frequency Matrix represented as an integer matrix with row names containing at least A, C, G and T (typically the result of a call to consensusMatrix).

For maxWeights, minWeights, maxScore, minScore, unitScale and reverseComplement: a Position Weight Matrix represented as a numeric matrix with row names A, C, G and T.

type The type of Position Weight Matrix, either "log2probratio" or "prob". See Details section for more information.
prior.params A positive numeric vector, which represents the parameters of the Dirichlet conjugate prior, with names A, C, G, and T. See Details section for more information.
pwm A Position Weight Matrix represented as a numeric matrix with row names A, C, G and T.
subject A DNAString, XStringViews or MaskedDNAString object for matchPWM and countPWM.

A DNAString object for PWMscoreStartingAt.

min.score The minimum score for counting a match. Can be given as a character string containing a percentage (e.g. "85%") of the highest possible score or as a single number.
starting.at An integer vector specifying the starting positions of the Position Weight Matrix relatively to the subject.
... Additional arguments for methods.

Details

The PWM function uses a multinomial model with a Dirichlet conjugate prior to calculate the estimated probability of base b at position i. As mentioned in the Arguments section, prior.params supplies the parameters for the DNA bases A, C, G, and T in the Dirichlet prior. These values result in a position independent initial estimate of the probabilities for the bases to be priorProbs = prior.params/sum(prior.params) and the posterior (data infused) estimate for the probabilities for the bases in each of the positions to be postProbs = (consensusMatrix(x) + prior.params)/(length(x) + sum(prior.params)). When type = "log2probratio", the PWM = unitScale(log2(postProbs/priorProbs)). When type = "prob", the PWM = unitScale(postProbs).

Value

A numeric matrix representing the Position Weight Matrix for PWM.

A numeric vector containing the Position Weight Matrix-based scores for PWMscoreStartingAt.

An XStringViews object for matchPWM.

A single integer for countPWM.

A vector containing the max weight for each position in pwm for maxWeights.

A vector containing the min weight for each position in pwm for minWeights.

The highest possible score for a given Position Weight Matrix for maxScore.

The lowest possible score for a given Position Weight Matrix for maxScore.

The modified numeric matrix given by (x - minScore(x)/ncol(x))/(maxScore(x) - minScore(x)) for unitScale.

A PWM obtained by reverting the column order in PWM x and by reassigning each row to its complementary nucleotide for reverseComplement.

Author(s)

H. Pages and P. Aboyoun

References

Wasserman, WW, Sandelin, A., (2004) Applied bioinformatics for the identification of regulatory elements, Nat Rev Genet., 5(4):276-87.

See Also

consensusMatrix, matchPattern, reverseComplement, DNAString-class, XStringViews-class

Examples

  ## Data setup:
  data(HNF4alpha)
  library(BSgenome.Dmelanogaster.UCSC.dm3)
  chr3R <- Dmelanogaster$chr3R
  chr3R

  ## Create a PWM from a PFM or directly from a rectangular
  ## DNAStringSet object:
  pfm <- consensusMatrix(HNF4alpha)
  pwm <- PWM(pfm)  # same as 'PWM(HNF4alpha)'

  ## Perform some general routines on the PWM:
  round(pwm, 2)
  maxWeights(pwm)
  maxScore(pwm)
  reverseComplement(pwm)

  ## Score the first 5 positions:
  PWMscoreStartingAt(pwm, unmasked(chr3R), starting.at=1:5)

  ## Match the plus strand:
  hits <- matchPWM(pwm, chr3R)
  nhit <- countPWM(pwm, chr3R)  # same as 'length(hits)'

  ## Post-calculate the scores of the hits:
  scores <- PWMscoreStartingAt(pwm, subject(hits), start(hits))

  ## Match the minus strand:
  matchPWM(reverseComplement(pwm), chr3R)

[Package Biostrings version 2.20.1 Index]