Model of small counts from US Census ACS data

Krzysztof Sakrejda1,2

1Center for Social Epidemiology and Population Health, University of Michigan School of Public Health, Ann Arbor, Michigan

2Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, Michigan

Small Area Complications

  • Some difficult issues:
    • Estimated counts can be small
    • Estimated counts can be zero
    • Margin of Error (MOE) estimates can be larger than estimates

Non-zero small-area estimates

  • Estimated count
  • Model-based estimate of MOE from observed count data

Zero small-area estimates

  • Estimated zero count
  • Model-based estimate of MOE from ten-year census

A sampling distribution

(When census estimates are over-dispersed)

  • \(Pr[y|\alpha, \beta] = \binom{y + \alpha - 1}{\alpha - 1} \left(\frac{\beta}{\beta+1}\right)^\alpha\left(\frac{1}{\beta+1}\right)^y\)
  • \(E[y] = \frac{\alpha}{\beta} = \mu\)
  • \(Var[y] = \frac{\alpha}{\beta^2}\left(\beta + 1\right) = \sigma^2 \approx \left(\frac{MOE}{1.645}\right)^2\)[1]
  • \(\beta = \frac{\mu}{\sigma^2 - \mu}\)
  • \(\alpha = \frac{\mu^2}{\sigma^2 - \mu}\)
  • Either:
    • Simulate a \(y_k\) for each sample in a larger model; or
    • Sum out \(y_k\) according to \(Pr[y_k|\alpha,\beta]\)

A sampling distribution

(When census estimates are not over-dispersed)

  • Typically heavily populated tracts
  • \(Pr[y|\mu, \sigma] = \frac{1}{\sigma\sqrt{2\pi}}e^\frac{(x-\mu)^2}{\sigma^2}\)
  • \(E[y] = \mu \)
  • \(Var[y] = \sigma^2 \approx \left(\frac{MOE}{1.645}\right)^2\)[1]
  • Either:
    • Simulate a \(y_k\) for each sample in a larger model; or
    • Sum out \(y_k\) according to \(Pr[y_k|\mu, \sigma]\)

A sampling distribution

(When census estimates are zero)

  • \(Pr[y|\lambda] = 1 - e^\frac{-y}{\lambda}\)
  • \(Var[y] = \lambda^2 \approx \left(\frac{MOE}{1.645}\right)^2\)[1]
  • \(\lambda = \frac{MOE}{1.645}\)
  • Either:
    • Simulate a \(y_k\) for each sample in a larger model; or
    • Sum out \(y_k\) according to \(Pr[y_k|\lambda]\)

Conceptual match

  • Sampling a larger population we are trying to estimate
  • For non-zero estimates the MOE informs us about over-dispersion of the counts
  • For zero estimate the MOE tells us how many we might expect to *try* to count before we count one successfully
  • In all cases the MOE plus a single observation leave uncertainty about the distribution
  • Further uncertainty comes from the sampling of the counts from the distribution

Helpful properties

  • All sampled counts are positive integers
  • Sampled counts should include observed census counts
  • Calibration should match the calibration of census estimates

(Calibration is worth testing via simulation)

Downstream Estimates

  • Populations are typically used as denominators in downstream estimates
  • When the numerator, \(x\), is known precisely, the uncertainty in the population, \(N\) can be included directly by calculating \(Pr[x/N]\) from \(Pr[N]\)
  • When the numerator should be estimated in the context of denominator uncertainty, there is no direct calculation.

Arbitrary discrete distribution

  • Using the three models above calculate \(Pr[N_i = n]\) for all plausible \(n\).
  • Drop estimates where \(Pr[N_i = n] \approx 0\)
  • Pick up to \(K\) equally spaced points in \(N\) to represent the distribution
  • Drop other points
  • Re-normalize so that \(\sum_k Pr[N_i = n] = 1\)
  • Each of the \(K\) points, \(\eta_k\), stands in for a larger set of similar values

Discrete-Poisson Mixture model

  • \( Pr\left[m_i|\lambda, \vec{\eta}\right]=\sum_1^K \eta_k\times \frac{(\lambda_{i,k})^m_i \exp{\left(-\lambda_{i,k} \right)}}{m_i!} \)
  • \( \lambda_{i,k} = \exp{\left(\beta_0 + \beta_{TRACT} + ... + \log 10^5 - \log n_k \right)} \)
  • Scaling:
    • The main dataset has \(n_{units} \times n_{timepoints} \) rows
    • The dimension resulting from population uncertainty can be including using a second step

References

  1. ACS Design and Methodology
  2. Stan Functions Reference
  3. A Bayesian hierarchical small-area population model...
  4. Various sums of Negative Binomial distributions...