Accessing Restricted Data at the
University of Maryland Federal
Statistical Research Data Center
Michel Boudreaux - HLSA
Andrew Fenelon - HLSA
What is the RDC?
A consolidated statistical computing platform linking
researchers to protected datasets through federal
agencies
Run by the US Census Bureau
Participating organizations include NCHS, AHRQ, BLS,
and Census
What kind of variables can you get?
Restricted geography
State, ZIP, tract, county, etc etc
Full variable distributions
No top-coding
Full count files and unedited variables
No PUMS for us!
Linking Keys
Personal Identification Keys (PIKs) for merging across individual
data
Administrative data
University of Maryland RDC
https://marylandrdc.umd.edu/
Our RDC is a fantastic resource that allows affiliates of
founding schools to use the RDC with no seat charge
(Researchers who use the RDC at NCHS in Hyattsville
are charged $300 per day of use)
Director: Professor Liu Yang – [email protected]
Administrator: Currently Vacant (Veronika Penciakova
starting in October!)
Neighborhood Disadvantage Index
p<0.10 * p<0.05 ** p<0.01
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Public Housing Housing Choice Vouchers
Current Pseudo-waitlist
**
Steps in the RDC Process
1. Proposal Application
2. Certification (Special Sworn Status)
3. Data Merging and Upload (NCHS Only)
4. Analysis
5. Disclosure
Example Health Datasets in the RDC
Examples of Census Data
Demographic
Decennial, ACS, CPS, AHS, etc
Economic
Economic Census, LEHD, Annual Survey of Manufacturers, etc
Administrative Data
IRS, Public Programs, SSA
Hard to get public information on all that they have, but they have a
lot. Assume they have what you want.
IRS data is usually very hard to get, best to have inside
collaborator
Proprietary Data
In theory, Census will PIK any data set you can send them which
allows it to be merged with any of these files. No extra charge for
this service.
RDC Application
NCHS
https://www.cdc.gov/rdc/b3prosal/PP300.htm
Census
https://www.census.gov/ces/rdcresearch/howtoapply.html
AHRQ
https://meps.ahrq.gov/mepsweb/data_stats/onsite_datacenter.jsp
BLS
https://www.bls.gov/rda/
NCHS Application
NCHS Application
A. Abstract
B. Research Question
C. Background
D. Public Health Benefit
E. Data Requirements (data you need, years of surveys,
restricted variables, additional data merged)
F. Methodology (what you’ll be doing)
G. Output (Table Shells and presentation of results)
H. Data Dictionary (list of all public and restricted variables
you will use)
Census Application
Generally starts with a ‘pre-proposal’ that local RDC
admin will give you feed back on
Provide benefit to Census
Hardest part, but RDC Admin will help you figure this out.
Demonstrate Scientific Merit
Requires non-public data
Be feasible
Pose no disclosure risk
Special Sworn Status
To use the RDC, you become an “employee” of the
census bureau, and you have to go through the
background check process to obtain that status
The whole process can take several months, so start early
(but it lasts for life!!!!)
In addition to filling out a ton of forms, you also have to
Submit fingerprint cards
A passport photo
Be interviewed by the OPM Federal Investigator
List of SSS documents
Data Merging for NCHS Data
You provide your public use data, then the analyst merges
it with the restricted variables
Public Use
Data
You
Your analyst
Merges with
restricted
variables
Removes
identifying
variables
Places file in
RDC
$$$$
Data Merging for Census
You send any external data (not Census data) and then
you are on your own, which is good and bad.
You will be given access to internal files that you are
granted permission for. It is then up to you to merge
Greater control, but requires you have a deeper
understanding of the data
Analysis
Data analysis can be done with a variety of statistical
packages including SAS, Stata, and R via Unix
Stata at least is interactive. Not true for SAS.
You can have your analyst upload program code to the
RDC
Unfortunately you have to leave the RDC to consult any
online statistical programming help
Disclosure
Unfortunately, you can’t just take your results with you
when you leave. Disclosing results involves several steps
and differs by agency and analyst.
1. Run analysis and place .log file in ‘disclosure’ folder
2. Send analyst a disclosure request along with path of
folder
3. Wait
4. Receive results in email as long as analyst deems them
releasable (this can be annoying and frustrating)
Examples – Our Current Projects
HUD housing assistance and child health and well-being
(Andy, Michel, and Natalie)
NHIS/NHANES data linked with HUD records and geographic info
NHIS and NHANES merged with census tract and zip
code data (Quynh)
American Housing Survey and census tract data (Andy)
School-based health centers, school screening mandates,
and Medicaid eligibility: Effects to health and health care
access in the NHIS (Michel)
Adult Fair/poor health by Housing
Assistance
p<0.10 * p<0.05 ** p<0.01
0%
5%
10%
15%
20%
25%
30%
35%
40%
No Housing
Assistance
Public Housing Housing Choice
Vouchers
Multifamily
Housing
**
**
**
Source: Fenelon et al. (2017) AJPH
Adult Public Housing Current vs. Future
*
*
30%
32%
34%
36%
38%
40%
42%
44%
Current Future
Source: Fenelon et al. (2017) AJPH
Child Mental Health by Housing Assistance
p<0.10 * p<0.05 ** p<0.01
1.0
1.5
2.0
2.5
3.0
Current Assistance No Assistance
Source: Fenelon et al. (2018) JHSB
Health days of school missed
0
1
2
3
4
5
6
7
8
All Programs
Housing Choice
Vouchers
Public Housing
Multifamily
Housing
Campus Contacts
Executive Director: Liu Yang [email protected]
SPH Rep: Michel Boudreaux [email protected]
BSOS Rep: Andres Villareal [email protected]
Local Health Data users:
Andy Fenelon [email protected]
Quynh Nguyen [email protected]
Natalie Slopen [email protected]
Interns:
Seth Murray [email protected]
Claire Hou [email protected]