Our Blog

Our Blog

Machine-Readable Fragile Families Codebook

Uncategorized No comments

The Fragile Families and Child Wellbeing study has been running for more than 15 years. As such, it has produced an incredibly rich and complex set of documentation and codebooks. Much of this documentation was designed to be “human readable,” but, over the course of the Fragile Families Challenge, we have had several requests for a more “machine-readable” version of the documentation. Therefore, we are happy to announce that Greg Gundersen, a student in Princeton’s COS 424 (Barbara Engelhardt’s undergraduate machine learning class), has created a machine-readable version of the Fragile Families codebook in the form of a web API. We believe that this new form of documentation will make it possible for researchers to work with the data in unexpected and exciting ways.

There are three ways that you can interact with the documentation through this API.

First, you can search for words inside of question description field. For example, imagine that you are looking for all the questions that include the word “evicted”. You can find them by visiting this URL:
https://codalab.fragilefamilieschallenge.org/f/api/codebook/?q=evicted

Just put your search term after the “q” in the above URL.

The second main way that you can interact with the new documentation is by looking up the question data associated with a variable name. For example, want to know what is “cm2relf”? Just visit:
https://codalab.fragilefamilieschallenge.org/f/api/codebook/cm2relf

Finally, if you just want all of the questionnaire data, visit this URL:
https://codalab.fragilefamilieschallenge.org/f/api/codebook/

A main benefit of a web API is that researchers can now interact with the codebooks programmatically through URLs. For example, here is a snippet of Python 2 code that fetches the data for question “cm2mint'”:

>>> import urllib2
>>> import json
>>> response = urllib2.urlopen('https://codalab.fragilefamilieschallenge.org/f/api/codebook/cm2mint')
>>> data = json.load(response)
>>> data
[{u'source file': u'http://fragilefamilies.princeton.edu/sites/fragilefamilies/files/ff_mom_cb1.txt', u'code': u'cm2mint', u'description': u'Constructed - Was mother interviewed at 1-year follow-up?', u'missing': u'0/4898', u'label': u'YESNO8_mw2', u'range': [0, 1], u'unique values': 2, u'units': u'1', u'type': u'numeric (byte)'}]

We are very grateful to Greg for creating this new documentation and sharing it with everyone.

Notes:

  • Greg has open sourced all his code, so you can help us improve the codebook. For example, someone could write a nice front-end so that you can do more than just interact via the url.
  • The machine-readable documentation should include the following fields: description, source file, type, label, range, units, unique values, missing. If you develop code that can parse some of the missing fields, please let us know, and we can integrate your work into API.
  • The machine-readable documentation includes all the documentation that was in text files (e.g., http://fragilefamilies.princeton.edu/sites/fragilefamilies/files/ff_dad_cb5.txt). It does not include documentation that was in pdf format (e.g., http://fragilefamilies.princeton.edu/sites/fragilefamilies/files/ff_hv_cb5.pdf).
  • When you visit these urls, what gets returned is in JSON format, and different browsers render this JSON differently.
  • If there is a discrepancy between the machine-readable codebook and the traditional codebook, please let us know.
  • To deploy this service we used Flask, which is an open source project. Thank you to the Flask community.

About Matt Salganik

Matthew Salganik is a Professor of Sociology at Princeton University. He is also the author of Bit by Bit: Social Research in the Digital Age (http://www.bitbybitbook.com). You can learn more about his research at http://www.princeton.edu/~mjs3.

Add your comment