What would happen if hundreds of social scientists and data scientists worked together on a scientific challenge to improve the lives of disadvantaged children in the United States?
The Fragile Families Challenge is based on the Fragile Families and Child Wellbeing Study, which has followed thousands of American families for more than 15 years. During this time, the Fragile Families study collected information about the children, their parents, their schools, and their larger environments. These data have been used in hundreds of scientific papers and dozens of dissertations, and insight from these studies are routinely shared with policy makers around the world through the Future of Children, which is jointly published by Princeton University and Brookings Institution. Participants were challenged to use this data in a new way. Given all the background data from birth to year 9 and some training data from year 15, how well could participants infer six key outcomes in the year 15 test data? This predictive modeling is not the end of the project, however. It is just the beginning. We will use the models submitted to the Fragile Families Challenge to advance the scientific goals of the project, and we will publish the results in scientific journals, both individually and collectively. The Fragile Families Challenge is open to everyone, no matter where you live or what you do. In fact, we’re confident that some of the best ideas will come from unexpected places.
Overview
The window for participation has now closed. Participants applied to participate, provided informed consent, signed a data protection agreement, and then downloaded our data files for the Fragile Families Challenge. These files included information about each family from birth to age 9 and some training data from age 15. Participants used the Fragile Families Challenge data and creative modeling strategies to infer six key outcomes at age 15. They could use whatever modeling strategy they thought would work best. Models were evaluated by mean squared error in a holdout set kept private until the end of the Challenge. Each participant prepared a package that includes their predictions, code, and a narrative explanation of their approach. They uploaded their contributions and saw their scores on the leaderboard. Participants could watch their score improve as they developed and uploaded new approaches! At the end of the Challenge, submissions are being released open source in order to advance the scientific goals. The submission deadline was Aug. 1, 2017, 2pm EDT.
How to participate
Apply to participate
Build a model
Upload your contribution
The Fragile Families Challenge is designed to produce scientific knowledge that can be used to improve the lives of disadvantaged children in the United States. Even more than that, we hope the Fragile Families Challenge can serve as a model for how social scientists and data scientists can collaborate on problems of societal importance. The Fragile Families Challenge blends ideas from social science and data science. Maybe you’re a data scientist that wants to start working with social data? Maybe you’re a social scientist that wants to learn more about machine learning? Either way, the Fragile Families Challenge is for you. This blending of ideas also makes the Fragile Families Challenge ideal to assign in a class that you are teaching. The Fragile Families Challenge is real scientific research. While working on the project, participants have a chance to interact with the other participating scientists and the distinguished researchers on our Board of Advisors. We awarded prizes to participants who made important contributions to the project. All prize winners were given an all-expenses paid trip to Princeton University for a scientific workshop. The Fragile Families Challenge could be worked on in teams, and we hoped that participants would enjoy working with data, learning new skills, and cooperating and competing with people from all over the world. We are publishing the results of the Fragile Families Challenge in scientific journals, both individually and collectively. Participants who made important contributions had the opportunity to be a co-author on the paper describing the results of the Fragile Families Challenge.
Why participate?
Help the world:
Learn new skills:
Get involved in scientific research:
Win prizes:
Have fun:
Publish papers:
The Fragile Families Challenge is our attempt to create a new way of doing social research, one that is much more open to the talents and efforts of everyone. We expect that by combining ideas from social science and data science, we can—together—help address important scientific and social problems. And, we expect that through a mass collaboration we will accomplish things that none of us could accomplish individually. The Fragile Families Challenge involves two steps. In the first step, described above and now complete, participants built statistical and machine learning models of several important outcomes in the lives of the children. Participants then submitted their code, their model outputs, and a narrative explanation of their modeling strategy. We used the unreleased test set to evaluate each model. This first step is an example of the common task method, which David Donoho (2015) has called the “secret sauce” of machine learning. In the second step, we will use the individual models and the community model to conduct substantive and methodological research. Here are three examples: These three projects are just some examples of the kinds of research that can be done with the predictions, code, and narratives that are created in the first stage of the Fragile Families Challenge. Because all of the materials created in the first stage will be released open source, we hope that others will dream up other cools things to do with them.
Scientific goals
The community model can be used to identify and help us learn from children who are “beating the odds.” For example, consider children who the community model predicts to have a low grade point average and who actually have a high grade point average. By conducting qualitative, in-depth interviews with these children and their caregivers—as well as children who are struggling—we can help discover previously unmeasured and important factors impacting the lives of children. The newly discovered factors can then be collected in future waves of data collection for the Fragile Families study. This goal is discussed in greater detail in our blog post on the topic.
There are many issues that are potential targets for policy intervention in efforts to improve the lives of children. However, before actually intervening in the lives of children—either through randomized controlled trials or large-scale policy changes—it is important to make the best possible estimates using existing non-experimental data. For example, eviction is a natural target for policy intervention, but it is challenging to estimate the causal impact of eviction on children. We will use the community model to produce propensity scores for eviction. These propensity scores can then be used to estimate the effect of eviction on all outcomes that are measured in future waves of the Fragile Families study. Estimates of causal effects based on propensity scores are by no means perfect—they depend on strong and untestable assumptions—but when combined with sensitivity analysis they can provide useful estimates that can help inform the design of future randomized controlled trials. Further, through targeted in-depth interviews, we can assess the plausibility of these assumptions in this context. For more details on the causal inference goal of the Fragile Families Challenge, read our blog post on the topic.
The dominant modeling strategies in the social sciences involve variations of the generalized linear model. However, social scientists are becoming increasingly interested in modeling approaches emerging from machine learning. Breiman (2001) characterizes these as two different cultures of modeling: one that focuses on informativeness and one that focuses on predictive performance. During the Fragile Families Challenge, researchers will use a variety of different modeling approaches, and we plan to explicitly compare these strategies in terms of their informativeness and predictive performance in order to assess the trade-offs between these two styles of modeling in a specific empirical context. It is our hope that this comparison will lead to insights about which ideas from machine learning can be fruitfully applied to social science problems where there are thousands—rather than millions—of observations.
In order to recognize important contributions to the Fragile Families Challenge, we awarded a series of prizes. Anyone who won one of these prizes was offered an all-expenses paid trip to the concluding workshop of the Fragile Families Challenge, which took place at Princeton University after the end of the Challenge. Beyond prizes, however, we hoped the main reason for participation would be because you are excited by the scientific and policy goals of the project.
Prizes
May 1, 2017May 10, 2017 (six awards)
Here are some specific materials we thought would be helpful to participants: We held weekly office hours via Google Hangout to answer questions about the data. Here are some more general materials that you might find helpful to provide context about the project: If there is something that you need, please let us know (fragilefamilieschallenge@gmail.com).
Resources
None currently scheduled. Princeton University (event info) 1:30pm – 2:30pm 3pm – 4pm 2pm 10:30am – 4pm 12-4pm Hilton Chicago, Conference Room 4G, 10am – 2pm Social Science Research Commons, 3pm – 7pm 190 Wallace Hall, 2:30pm – 5pm
Events
Upcoming events
Past events
Thursday and Friday, November 16-17, Scientific Workshop
All are invited to a scientific workshop recapping the first stage of the Fragile Families Challenge. Prize winners will be offered an all-expenses-paid trip, but all are welcome. Those who cannot attend in person are invited to join by livestream (livestream link, no registration required). Videos from all talks at the workshop are now available.
Tuesday, October 17, The Fragile Families Challenge: What happened and what’s next
Princeton University
Louis A. Simpson International Bldg., Room 271 (event info)
Friday, October 6, Combining Survey Social Science with Data Science Methods: Fragile Families Challenge and Beyond
University of Michigan
ISR-Thompson 1430 (event info)
Sunday, August 13, Gathering at the American Sociological Association Annual Meeting in Montreal
Fragile Families and Child Wellbeing Study Booth, Exhibit Hall, Palais des Congrès de Montréal in Montréal, Québec (event info) (conference info)
June 23, Getting Started Workshop at Princeton with Livestream (event page)
Princeton University (Julis Romo Rabinowitz Building (Room 399), Livestream information will be posted here)
Co-sponsored by the Summer Institute in Computational Social Science.
June 2, Getting Started Workshop at UCLA
(slides from event) (video from event)
CCPR Seminar Room
4240 Public Affairs Building
If you would like to participate, mention the UCLA workshop when you apply to participate!
Co-sponsored by the California Center for Population Research and the Center for Social Statistics.
April 27, Getting Started Workshop at PAA (slides from event)
Annual Meeting of the Population Association of America
April 6, Getting Started Workshop at Indiana University (slides from event)
March 28, Getting Started Workshop at Princeton University
Visit to Sociology 503, open to everyone
The Fragile Families Challenge is a scientific project, and we are publishing the results. Opportunity 1: We are publishing a single paper presenting the design and results from the Fragile Families Challenge. Everyone who made a submission that met a set of basic criteria was invited—but not required—to be a co-author on this paper. There was no limit on the possible number of participants who would qualify as co-authors. Opportunity 2: Separately, the journal Socius will publish a special issue on the Fragile Families Challenge. All participants in the Challenge were invited to submit a manuscript. Papers submitted to the special issue of the journal were peer-reviewed. For more details, see the call for papers.
Publishing
The Fragile Families Challenge is physically housed in Bendheim-Thoman Center for Research on Child Wellbeing at Princeton University. It is being organized by Matthew Salganik, Ian Lundberg, Alex Kindel, and Sara McLanahan. The project is overseen and guided by a Board of Advisors: We have received valuable web development assistance from Luke Baker and Paul Yuen of Agathon Group and Eric Carmichael of CK Collab. We have received valuable research assistance from Cathy Chen and Boriana Pratt. We received wonderful feedback on an early version of this project at a workshop on “Solution-Oriented Social Science” organized by Duncan Watts and Victoria Stodden as part of the Social Science Research Council working group on Digital Social Science. All participants in the Fragile Families and Child Wellbeing Study have consented to have their data used for social research. These procedures, as well as procedures to make de-identified data available to researchers, have been reviewed and approved by the Institutional Review Board of Princeton University (#5767). The procedures for the Fragile Families Challenge have been reviewed and approved by the Institutional Review Board of Princeton University (#8061). In addition, we have also taken further steps to protect the participants in the Fragile Families and Child Wellbeing Study. If you would like to know more, please send us an email. This project relies on open source software, and we are particularly grateful to the communities behind the following projects: The Fragile Families Challenge is supported by a grant from the Russell Sage Foundation.
About