R Function Library: February 2016

Thursday, February 11, 2016

Probit Model for Limit of Detection Estimation per CLSI EP17A2E Guidance with Application to Quantitative Molecular Measurement Procedures

Following is the R function:

lod.model <- function(dat)
{
    ## PURPOSE: Make a Probit Model Fitting and Ploting for LOD Data
    ## ----------------------------------------------------------------------
    ## ARGUMENT:
    ##   dat: a data frame with three columns "conc", "success", "failure"
    ##        "conc" is the Target Concentration Levels.
    ##        "success" is the Number of Positive Results.
    ##        "failure" is the Number of Negative Results. 
    ## ----------------------------------------------------------------------
    ## RETURN: Plot, GLM model object, and 95% LOD Estimate. 
    ## ----------------------------------------------------------------------
    ## AUTHOR: Feiming Chen,  Date: 10 Feb 2016, 14:28

    ## Per "CLSI EP17A2E" Guidance, apply default recommendation:
    ##   1. Remove Zero Predictor Values.
    ##   2. Appply "log10" transformation on predictors.
    ##   3. Use Probit Model
    ##   4. Report 95% LOD (concentration level)

    ## remove 0 levels in "conc" since will do "log10" transform on predictor. 
    dat <- subset(dat, conc > 0)

    r <- glm(cbind(success, failure) ~ log10(conc), data=dat, family=binomial(link="probit"))

    grid <- seq(0.001, max(1, dat$conc), length.out=200)
    preds <- predict(r, newdata=data.frame(conc=grid), type="link", se.fit=TRUE)

    se <- 1.96 * preds$se.fit
    upr <- unlist(r$family$linkinv(preds$fit + se))
    lwr <- unlist(r$family$linkinv(preds$fit - se))
    fit <- unlist(r$family$linkinv(preds$fit))

    ce <- coef(r)
    lod95 <- 10^((r$family$linkfun(0.95) - ce[1])/ce[2]) # 95% LOD Estimate
    
    ## Plot
    with(dat, plot(conc, success/(success+failure),ylab="Probability", xlab="Concentration", log="x", pch=19, ylim=c(0, 1)))
    title(main=paste("Probit Model, 95% LOD =", round(lod95, 3)))

    lines(grid, fit)
    lines(grid, lwr, lty=2)
    lines(grid, upr, lty=2)
    abline(h=0.95, lty=2, col="red")
    abline(v=lod95, lty=2, col="red")
    axis(2, 0.95, labels = "0.95")

    invisible(list(r=r, lod95=lod95))
}
if (F) {                                # Unit Test 
    dat <- data.frame(conc=c(5.4e5, 1.1e5, 5e4, 1e4, 5e3, 1e3), 
                      success=c(6, 6, 6, 3, 0, 0), 
                      failure=c(0, 0, 0, 3, 6, 6))
    lod.model(dat)

    ## CLSI Example: EP17A2E Table C1 & C2, Lot 1, LOD=0.077 CFU/mL. 
    dat <- data.frame(conc=c(0, 0.025, 0.05, 0.15, 0.3, 0.5, 0.006, 0.014),
                      success=c(0, 23, 29, 32, 32, 32, 11, 15), 
                      failure=c(22, 9, 3, 0, 0, 0, 19, 15))
    r <- lod.model(dat)                 # Verify 95% LOD = 0.077
}

Wednesday, February 10, 2016

Estimate a Non-parametric Distribution from Mean and Range Using Data Simulation and R Package "logspline"

When we have limited information (say, only median and range) about a distribution, we can simulate the data in some way and use the R package "logspline" to estimate the distribution nonparametrically. Below is my R function that employs a method like this.


calc.dist.from.median.and.range <- function(m, r) 
{
    ## PURPOSE: Return a Log-Logspline Distribution given (m, r).
    ##          It may be necessary to call this function multiple times in order to get
    ##          a satisfying distribution (from the plot). 
    ## ----------------------------------------------------------------------
    ## ARGUMENT:
    ##   m: Median
    ##   r: Range (a vector of two numbers)
    ## ----------------------------------------------------------------------
    ## RETURN: A log-logspline distribution object.
    ## ----------------------------------------------------------------------
    ## AUTHOR: Feiming Chen,  Date: 10 Feb 2016, 10:35

    if (m < r[1] || m > r[2] || r[1] > r[2]) stop("Misspecified Median and Range")

    mu <- log10(m)
    log.r <- log10(r)

    ## Simulate data that will have median of "mu" and range of "log.r"
    ## Distribution on the Left/Right: Simulate a Normal Distribution centered at "mu" and
    ## truncate the part above/below the "mu".
    ## May keep sample size intentionaly small so as to introduce uncertainty about
    ## the distribution. 
    d1 <- rnorm(n=200, mean=mu, sd=(mu - log.r[1])/3) # Assums 3*SD informs the bound
    d2 <- d1[d1 < mu]                   # Simulated Data to the Left of "mu"
    d3 <- rnorm(n=200, mean=mu, sd=(log.r[2] - mu)/3)
    d4 <- d3[d3 > mu]                   # Simulated Data to the Right of "mu"
    d5 <- c(d2, d4)                     # Combined Simulated Data for the unknown distribution

    require(logspline)
    ans <- logspline(x=d5)
    plot(ans)
    return(ans)
}
if (F) {                                # Unit Test 
    calc.dist.from.median.and.range(m=1e10, r=c(3.6e5, 3.1e12))
    my.dist <- calc.dist.from.median.and.range(m=1e7, r=c(7e2, 3e11))
    dlogspline(log10(c(7e2, 1e7, 3e11)), my.dist) # Density
    plogspline(log10(c(7e2, 1e7, 3e11)), my.dist) # Probability
    10^qlogspline(c(0.05, 0.5, 0.95), my.dist) # Quantiles 
    10^rlogspline(10, my.dist) # Random Sample 
}

Thursday, February 11, 2016

Probit Model for Limit of Detection Estimation per CLSI EP17A2E Guidance with Application to Quantitative Molecular Measurement Procedures

Wednesday, February 10, 2016

Estimate a Non-parametric Distribution from Mean and Range Using Data Simulation and R Package "logspline"

Blog Archive