UCL CDS Symposium on Data Science in Public Health
Department of Statistical Science, UCL
Resources
Slides and code here: github.com/n8thangreen/data-science-in-health-talk
Title: Assesses the factors that determine health literacy and the size of their influence/impact for Newham
Health literacy is broadly defined as the ability to access, understand, appraise, and communicate health information, enabling individuals to engage in healthcare and maintain good health throughout their lives.
| Small Area Estimation (SAE) (Previous) | HTA / Statistics Method (Me) | |
|---|---|---|
| Weighted Logistic Regression with Synthetic Estimation |
\(\longrightarrow\) | Multilevel Regression with Post-stratification (MRP) |
| Linear Plug-In Model (Equivalent to Regression-Synthetic Estimator at Unit Level) |
\(\longrightarrow\) | Simulated Treatment Comparison (STC) |
| Residual-adjusted synthetic estimation | \(\longrightarrow\) | Targeted Maximum Likelihood Estimation (TMLE) (in causal inference) |
Skills for Life (SfL) Survey 2011 [MRP]
Newham Residents Survey 2023 (NRS) [MRP]
Additional data
The predicted probability defined as:
\[
\hat{\pi}_i = \text{logit}^{-1} \left( \hat{\beta}_0 + \sum_{x} \hat{\beta}^{x}_{\gamma_x[i]} \right)
\]
The health literacy probabilities for each demographic category (cell \(c\)) are weighted by their proportion in the actual Newham population
11 covariates \(\rightarrow\) 13,824 cells
Post-stratified estimate is: \[ \hat{\pi}^{\text{mrp}} = \sum_{c = 1}^{|\mathcal{S}|} \frac{N_{c} \hat{\pi}_{c}}{N} \]
\(\mathcal{S}\) is the set of all covariate combinations
\(N_c\) is the population frequency for cell \(c\)
\(N\) is the total population size
Adopt Surface Under the Cumulative Ranking Curve (SUCRA)
Percentage of the maximum possible cumulative rank an intervention can achieve
Providing a single value where a higher SUCRA indicates a better overall rank relative to others \[ \text{SUCRA}_{ij} = \sum_{r=1}^{n-1} P_{ijr} / (n-1), \]
where \(P_{ijr}\) is the cumulative probability for variable \(i\) at level \(j\) and rank \(r\)
Mean rank is \[ \mathbb{E}[\text{rank}(i,j)] = n - \sum_{r=1}^{n-1} P_{ijr} \]
The job is not done with the modelling ⛔
Borrow methods from other fields
Data issues are inevitable…deal with it
Clear communication of results for SME and decision-maker is crucial
It’s an iterative, team effort from project inception to decision
Nathan Green | UCL | n.green@ucl.ac.uk