Sampling of small counts from Census ACS estimates

Model of small counts from US Census ACS data

Krzysztof Sakrejda^1,2

¹Center for Social Epidemiology and Population Health, University of Michigan School of Public Health, Ann Arbor, Michigan

²Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, Michigan

Some difficult issues:
- Estimated counts can be small
- Estimated counts can be zero
- Margin of Error (MOE) estimates can be larger than estimates

\(Pr[y|\alpha, \beta] = \binom{y + \alpha - 1}{\alpha - 1} \left(\frac{\beta}{\beta+1}\right)^\alpha\left(\frac{1}{\beta+1}\right)^y\)
\(E[y] = \frac{\alpha}{\beta} = \mu\)
\(Var[y] = \frac{\alpha}{\beta^2}\left(\beta + 1\right) = \sigma^2 \approx \left(\frac{MOE}{1.645}\right)^2\)[1]
\(\beta = \frac{\mu}{\sigma^2 - \mu}\)
\(\alpha = \frac{\mu^2}{\sigma^2 - \mu}\)
Either:
- Simulate a \(y_k\) for each sample in a larger model; or
- Sum out \(y_k\) according to \(Pr[y_k|\alpha,\beta]\)

Typically heavily populated tracts
\(Pr[y|\mu, \sigma] = \frac{1}{\sigma\sqrt{2\pi}}e^\frac{(x-\mu)^2}{\sigma^2}\)
\(E[y] = \mu \)
\(Var[y] = \sigma^2 \approx \left(\frac{MOE}{1.645}\right)^2\)[1]
Either:
- Simulate a \(y_k\) for each sample in a larger model; or
- Sum out \(y_k\) according to \(Pr[y_k|\mu, \sigma]\)

\(Pr[y|\lambda] = 1 - e^\frac{-y}{\lambda}\)
\(Var[y] = \lambda^2 \approx \left(\frac{MOE}{1.645}\right)^2\)[1]
\(\lambda = \frac{MOE}{1.645}\)
Either:
- Simulate a \(y_k\) for each sample in a larger model; or
- Sum out \(y_k\) according to \(Pr[y_k|\lambda]\)

Sampling a larger population we are trying to estimate
For non-zero estimates the MOE informs us about over-dispersion of the counts
For zero estimate the MOE tells us how many we might expect to *try* to count before we count one successfully
In all cases the MOE plus a single observation leave uncertainty about the distribution
Further uncertainty comes from the sampling of the counts from the distribution

Populations are typically used as denominators in downstream estimates
When the numerator, \(x\), is known precisely, the uncertainty in the population, \(N\) can be included directly by calculating \(Pr[x/N]\) from \(Pr[N]\)
When the numerator should be estimated in the context of denominator uncertainty, there is no direct calculation.

Using the three models above calculate \(Pr[N_i = n]\) for all plausible \(n\).
Drop estimates where \(Pr[N_i = n] \approx 0\)
Pick up to \(K\) equally spaced points in \(N\) to represent the distribution
Drop other points
Re-normalize so that \(\sum_k Pr[N_i = n] = 1\)
Each of the \(K\) points, \(\eta_k\), stands in for a larger set of similar values

\( Pr\left[m_i|\lambda, \vec{\eta}\right]=\sum_1^K \eta_k\times \frac{(\lambda_{i,k})^m_i \exp{\left(-\lambda_{i,k} \right)}}{m_i!} \)
\( \lambda_{i,k} = \exp{\left(\beta_0 + \beta_{TRACT} + ... + \log 10^5 - \log n_k \right)} \)
Scaling:
- The main dataset has \(n_{units} \times n_{timepoints} \) rows
- The dimension resulting from population uncertainty can be including using a second step