The Fragile Families survey documentation can be confusing. We’ve put together this blog post so you can find out what variables in the Challenge data file mean.
Using the Fragile Families website
The first place to go to find out what a given variable represents is the Fragile Families and Child Wellbeing Study website: http://www.fragilefamilies.princeton.edu/
Once there, click the “Data and Documentation” tab.
This brings you to the main documentation for the full study. On the left, you will see a set of links that will take you to the documentation for particular waves of the data.
Clicking on the link for Year 9 (Wave 5) as an example, we see the following page of documentation for this survey.
Let’s look at the mother questionnaire and codebook. On page 5 of the questionnaire, you will see the following question:
In the corresponding codebook, we see the count of respondents who gave each answer:
Two things are worth noting here.
- The question referred to in the questionnaire as A3B is called m5a3b in the codebook. This is because the prefix “m5” indicates that this question comes from the mother wave 5 interview.
- Lot’s of people got coded -6 for “Skip.” Looking back at the questionnaire, we can see why they were skipped over this question: it was only asked of those for whom “PCG = NONPARENT AND RELATIONSHIP = FOSTER CARE.” For children not in foster care, this question would not be meaningful, so it wasn’t asked.
In general, the questionnaires are the best source for information about why certain respondents get skipped over questions. For more information on all the ways data can be missing, see our blog post on missing data.
Structure of the variable names
The general structure of the variable names is [prefix for questionnaire type][wave number][question number].
What are all the variable prefixes?
The most common prefixes are:
Constructed variables: An additional prefix
Some variables have been constructed based on responses to several questions. These are often variable that are particularly relevant to the models many researchers want to estimate. These variables add the additional prefix c to the front of the variable name. For instance, cm1ethrace indicates constructed mother’s wave 1 race/ethnicity.
What are the wave numbers?
It’s easy to talk about the questionnaires by the rough child ages at which they were conducted. This is how the documentation website is organized. However, the variable names always refer to wave numbers, not child ages. It’s important not to get confused on this point. The table below summarizes the mapping between wave numbers and approximate child ages.
What are the question numbers?
Question numbers typically begin with a letter and a number, i.e. a3.
- In questionnaires, questions are referred to by question number alone.
- In codebooks, questions are referred to by a prefix and then a question number.
How do I find a question I care about?
You might want to find a particular question. For instance, when modeling eviction or material hardship at age 15, you might want to include the same measures collected at age 9. If you ctrl+F or cmd+F for “evicted” in the mother or father codebook or questionnaire at age 9, you will find these variables. In this case, they are m5f23d and f5f23d.