Wednesday, February 10, 2016

Estimate a Non-parametric Distribution from Mean and Range Using Data Simulation and R Package "logspline"

When we have limited information (say, only median and range) about a distribution, we can simulate the data in some way and use the R package "logspline" to estimate the distribution nonparametrically.   Below is my R function that employs a method like this. 


calc.dist.from.median.and.range <- function(m, r)
{
## PURPOSE: Return a Log-Logspline Distribution given (m, r).
## It may be necessary to call this function multiple times in order to get
## a satisfying distribution (from the plot).
## ----------------------------------------------------------------------
## ARGUMENT:
## m: Median
## r: Range (a vector of two numbers)
## ----------------------------------------------------------------------
## RETURN: A log-logspline distribution object.
## ----------------------------------------------------------------------
## AUTHOR: Feiming Chen, Date: 10 Feb 2016, 10:35

if (m < r[1] || m > r[2] || r[1] > r[2]) stop("Misspecified Median and Range")

mu <- log10(m)
log.r <- log10(r)

## Simulate data that will have median of "mu" and range of "log.r"
## Distribution on the Left/Right: Simulate a Normal Distribution centered at "mu" and
## truncate the part above/below the "mu".
## May keep sample size intentionaly small so as to introduce uncertainty about
## the distribution.
d1 <- rnorm(n=200, mean=mu, sd=(mu - log.r[1])/3) # Assums 3*SD informs the bound
d2 <- d1[d1 < mu] # Simulated Data to the Left of "mu"
d3 <- rnorm(n=200, mean=mu, sd=(log.r[2] - mu)/3)
d4 <- d3[d3 > mu] # Simulated Data to the Right of "mu"
d5 <- c(d2, d4) # Combined Simulated Data for the unknown distribution

require(logspline)
ans <- logspline(x=d5)
plot(ans)
return(ans)
}
if (F) { # Unit Test
calc.dist.from.median.and.range(m=1e10, r=c(3.6e5, 3.1e12))
my.dist <- calc.dist.from.median.and.range(m=1e7, r=c(7e2, 3e11))
dlogspline(log10(c(7e2, 1e7, 3e11)), my.dist) # Density
plogspline(log10(c(7e2, 1e7, 3e11)), my.dist) # Probability
10^qlogspline(c(0.05, 0.5, 0.95), my.dist) # Quantiles
10^rlogspline(10, my.dist) # Random Sample
}

No comments:

Post a Comment