Tuesday, April 26, 2016

Parallel Computing for R: A convenient wrapper function for the R package "parallel"


I wrote the following wrapper function that makes it easy to call parallel computing in R.  You may want to change the "nCores" parameter to the number of CPU cores you have and want to utilize.


my.parallel <- function(X, FUN,
clFUN=c("parSapply", "parLapply", "parRapply", "parCapply"),
nCores=2, ...)
{
## PURPOSE: Wrapper for R Parallel Computing Package "parallel"
## ARGUMENT:
## X: a list or vector of data
## FUN: a function to be applied over the data X.
## clFUN: a string for the parallel version of the "apply" function
## varieties available from the R package "parallel".
## Only four versions are allowed.
## nCores: Numer of CPU cores to be utilized for parallel computing
## ...: other arguments that will be passed the function "clFUN".
## RETURN: Computed data that are returned from the "clFUN" call.
## DATE: 26 Apr 2016, 12:43
## -----------------------------------------------------
clFUN <- match.arg(clFUN)
require(parallel)
cl <- makeCluster(nCores, type="FORK")
res <- do.call(clFUN, list(cl=cl, X, FUN, ...))
stopCluster(cl)
res
}
if (F) { # Unit Test 
    parallel::detectCores()       # check how many CPU cores are available
base <- 2
f <- function(exponent) base^exponent
my.parallel(2:4, f) # 4 8 16
my.parallel(matrix(1:9, 3), sum, "parRapply") # 12 15 18
}

Thursday, February 11, 2016

Probit Model for Limit of Detection Estimation per CLSI EP17A2E Guidance with Application to Quantitative Molecular Measurement Procedures

Following is the R function:
lod.model <- function(dat)
{
## PURPOSE: Make a Probit Model Fitting and Ploting for LOD Data
## ----------------------------------------------------------------------
## ARGUMENT:
## dat: a data frame with three columns "conc", "success", "failure"
## "conc" is the Target Concentration Levels.
## "success" is the Number of Positive Results.
## "failure" is the Number of Negative Results.
## ----------------------------------------------------------------------
## RETURN: Plot, GLM model object, and 95% LOD Estimate.
## ----------------------------------------------------------------------
## AUTHOR: Feiming Chen, Date: 10 Feb 2016, 14:28

## Per "CLSI EP17A2E" Guidance, apply default recommendation:
## 1. Remove Zero Predictor Values.
## 2. Appply "log10" transformation on predictors.
## 3. Use Probit Model
## 4. Report 95% LOD (concentration level)

## remove 0 levels in "conc" since will do "log10" transform on predictor.
dat <- subset(dat, conc > 0)

r <- glm(cbind(success, failure) ~ log10(conc), data=dat, family=binomial(link="probit"))

grid <- seq(0.001, max(1, dat$conc), length.out=200)
preds <- predict(r, newdata=data.frame(conc=grid), type="link", se.fit=TRUE)

se <- 1.96 * preds$se.fit
upr <- unlist(r$family$linkinv(preds$fit + se))
lwr <- unlist(r$family$linkinv(preds$fit - se))
fit <- unlist(r$family$linkinv(preds$fit))

ce <- coef(r)
lod95 <- 10^((r$family$linkfun(0.95) - ce[1])/ce[2]) # 95% LOD Estimate

## Plot
with(dat, plot(conc, success/(success+failure),ylab="Probability", xlab="Concentration", log="x", pch=19, ylim=c(0, 1)))
title(main=paste("Probit Model, 95% LOD =", round(lod95, 3)))

lines(grid, fit)
lines(grid, lwr, lty=2)
lines(grid, upr, lty=2)
abline(h=0.95, lty=2, col="red")
abline(v=lod95, lty=2, col="red")
axis(2, 0.95, labels = "0.95")

invisible(list(r=r, lod95=lod95))
}
if (F) { # Unit Test
dat <- data.frame(conc=c(5.4e5, 1.1e5, 5e4, 1e4, 5e3, 1e3),
success=c(6, 6, 6, 3, 0, 0),
failure=c(0, 0, 0, 3, 6, 6))
lod.model(dat)

## CLSI Example: EP17A2E Table C1 & C2, Lot 1, LOD=0.077 CFU/mL.
dat <- data.frame(conc=c(0, 0.025, 0.05, 0.15, 0.3, 0.5, 0.006, 0.014),
success=c(0, 23, 29, 32, 32, 32, 11, 15),
failure=c(22, 9, 3, 0, 0, 0, 19, 15))
r <- lod.model(dat) # Verify 95% LOD = 0.077
}

Wednesday, February 10, 2016

Estimate a Non-parametric Distribution from Mean and Range Using Data Simulation and R Package "logspline"

When we have limited information (say, only median and range) about a distribution, we can simulate the data in some way and use the R package "logspline" to estimate the distribution nonparametrically.   Below is my R function that employs a method like this. 


calc.dist.from.median.and.range <- function(m, r)
{
## PURPOSE: Return a Log-Logspline Distribution given (m, r).
## It may be necessary to call this function multiple times in order to get
## a satisfying distribution (from the plot).
## ----------------------------------------------------------------------
## ARGUMENT:
## m: Median
## r: Range (a vector of two numbers)
## ----------------------------------------------------------------------
## RETURN: A log-logspline distribution object.
## ----------------------------------------------------------------------
## AUTHOR: Feiming Chen, Date: 10 Feb 2016, 10:35

if (m < r[1] || m > r[2] || r[1] > r[2]) stop("Misspecified Median and Range")

mu <- log10(m)
log.r <- log10(r)

## Simulate data that will have median of "mu" and range of "log.r"
## Distribution on the Left/Right: Simulate a Normal Distribution centered at "mu" and
## truncate the part above/below the "mu".
## May keep sample size intentionaly small so as to introduce uncertainty about
## the distribution.
d1 <- rnorm(n=200, mean=mu, sd=(mu - log.r[1])/3) # Assums 3*SD informs the bound
d2 <- d1[d1 < mu] # Simulated Data to the Left of "mu"
d3 <- rnorm(n=200, mean=mu, sd=(log.r[2] - mu)/3)
d4 <- d3[d3 > mu] # Simulated Data to the Right of "mu"
d5 <- c(d2, d4) # Combined Simulated Data for the unknown distribution

require(logspline)
ans <- logspline(x=d5)
plot(ans)
return(ans)
}
if (F) { # Unit Test
calc.dist.from.median.and.range(m=1e10, r=c(3.6e5, 3.1e12))
my.dist <- calc.dist.from.median.and.range(m=1e7, r=c(7e2, 3e11))
dlogspline(log10(c(7e2, 1e7, 3e11)), my.dist) # Density
plogspline(log10(c(7e2, 1e7, 3e11)), my.dist) # Probability
10^qlogspline(c(0.05, 0.5, 0.95), my.dist) # Quantiles
10^rlogspline(10, my.dist) # Random Sample
}