Thursday, February 11, 2016

Probit Model for Limit of Detection Estimation per CLSI EP17A2E Guidance with Application to Quantitative Molecular Measurement Procedures

Following is the R function:
lod.model <- function(dat)
{
## PURPOSE: Make a Probit Model Fitting and Ploting for LOD Data
## ----------------------------------------------------------------------
## ARGUMENT:
## dat: a data frame with three columns "conc", "success", "failure"
## "conc" is the Target Concentration Levels.
## "success" is the Number of Positive Results.
## "failure" is the Number of Negative Results.
## ----------------------------------------------------------------------
## RETURN: Plot, GLM model object, and 95% LOD Estimate.
## ----------------------------------------------------------------------
## AUTHOR: Feiming Chen, Date: 10 Feb 2016, 14:28

## Per "CLSI EP17A2E" Guidance, apply default recommendation:
## 1. Remove Zero Predictor Values.
## 2. Appply "log10" transformation on predictors.
## 3. Use Probit Model
## 4. Report 95% LOD (concentration level)

## remove 0 levels in "conc" since will do "log10" transform on predictor.
dat <- subset(dat, conc > 0)

r <- glm(cbind(success, failure) ~ log10(conc), data=dat, family=binomial(link="probit"))

grid <- seq(0.001, max(1, dat$conc), length.out=200)
preds <- predict(r, newdata=data.frame(conc=grid), type="link", se.fit=TRUE)

se <- 1.96 * preds$se.fit
upr <- unlist(r$family$linkinv(preds$fit + se))
lwr <- unlist(r$family$linkinv(preds$fit - se))
fit <- unlist(r$family$linkinv(preds$fit))

ce <- coef(r)
lod95 <- 10^((r$family$linkfun(0.95) - ce[1])/ce[2]) # 95% LOD Estimate

## Plot
with(dat, plot(conc, success/(success+failure),ylab="Probability", xlab="Concentration", log="x", pch=19, ylim=c(0, 1)))
title(main=paste("Probit Model, 95% LOD =", round(lod95, 3)))

lines(grid, fit)
lines(grid, lwr, lty=2)
lines(grid, upr, lty=2)
abline(h=0.95, lty=2, col="red")
abline(v=lod95, lty=2, col="red")
axis(2, 0.95, labels = "0.95")

invisible(list(r=r, lod95=lod95))
}
if (F) { # Unit Test
dat <- data.frame(conc=c(5.4e5, 1.1e5, 5e4, 1e4, 5e3, 1e3),
success=c(6, 6, 6, 3, 0, 0),
failure=c(0, 0, 0, 3, 6, 6))
lod.model(dat)

## CLSI Example: EP17A2E Table C1 & C2, Lot 1, LOD=0.077 CFU/mL.
dat <- data.frame(conc=c(0, 0.025, 0.05, 0.15, 0.3, 0.5, 0.006, 0.014),
success=c(0, 23, 29, 32, 32, 32, 11, 15),
failure=c(22, 9, 3, 0, 0, 0, 19, 15))
r <- lod.model(dat) # Verify 95% LOD = 0.077
}

Wednesday, February 10, 2016

Estimate a Non-parametric Distribution from Mean and Range Using Data Simulation and R Package "logspline"

When we have limited information (say, only median and range) about a distribution, we can simulate the data in some way and use the R package "logspline" to estimate the distribution nonparametrically.   Below is my R function that employs a method like this. 


calc.dist.from.median.and.range <- function(m, r)
{
## PURPOSE: Return a Log-Logspline Distribution given (m, r).
## It may be necessary to call this function multiple times in order to get
## a satisfying distribution (from the plot).
## ----------------------------------------------------------------------
## ARGUMENT:
## m: Median
## r: Range (a vector of two numbers)
## ----------------------------------------------------------------------
## RETURN: A log-logspline distribution object.
## ----------------------------------------------------------------------
## AUTHOR: Feiming Chen, Date: 10 Feb 2016, 10:35

if (m < r[1] || m > r[2] || r[1] > r[2]) stop("Misspecified Median and Range")

mu <- log10(m)
log.r <- log10(r)

## Simulate data that will have median of "mu" and range of "log.r"
## Distribution on the Left/Right: Simulate a Normal Distribution centered at "mu" and
## truncate the part above/below the "mu".
## May keep sample size intentionaly small so as to introduce uncertainty about
## the distribution.
d1 <- rnorm(n=200, mean=mu, sd=(mu - log.r[1])/3) # Assums 3*SD informs the bound
d2 <- d1[d1 < mu] # Simulated Data to the Left of "mu"
d3 <- rnorm(n=200, mean=mu, sd=(log.r[2] - mu)/3)
d4 <- d3[d3 > mu] # Simulated Data to the Right of "mu"
d5 <- c(d2, d4) # Combined Simulated Data for the unknown distribution

require(logspline)
ans <- logspline(x=d5)
plot(ans)
return(ans)
}
if (F) { # Unit Test
calc.dist.from.median.and.range(m=1e10, r=c(3.6e5, 3.1e12))
my.dist <- calc.dist.from.median.and.range(m=1e7, r=c(7e2, 3e11))
dlogspline(log10(c(7e2, 1e7, 3e11)), my.dist) # Density
plogspline(log10(c(7e2, 1e7, 3e11)), my.dist) # Probability
10^qlogspline(c(0.05, 0.5, 0.95), my.dist) # Quantiles
10^rlogspline(10, my.dist) # Random Sample
}