Metadata about variables
We are happy to announce that Challenge participant Connor Gilroy, a Ph.D. student in Sociology at the University of Washington, has created a new resource that should make working the Challenge data more efficient. More specifically, he created a csv file that identifies each variable in the Challenge data file as either categorical, continuous, or unknown. Connor has also open sourced the code that he used to create the csv file. We’ve had many requests for such a file, and Connor is happy to share his work with everyone! If you want to check and improve Connor’s work, please consult the official Fragile Families and Child Wellbeing Study documentation.
Connor’s resource is part of a tradition during the Challenge whereby people have open sourced resources to make the Challenge easier for others. Other resources include:
- Greg Gundersen’s parsed and machine-readable codebooks
- Aarshay Jain, Bindia Kalra, and Keerti Agrawal’s constructed data dictionary
- Jeremy Freese’s helper stata code
- Steve McKay’s helper R code
- Open sourced submissions from students in Princeton’s COS 424 (Interacting with data)
- Dawn Koffman’s codebook support
If you have something that you’d like to open source, please let us know.
Finally, Connor work was part of a larger team project at the Summer Institute in Computational Social Science to build a full data processing pipeline for the Fragile Families Challenge. Stay tuned for that blog post on Tuesday, July 18!
Add your comment