What would happen if hundreds of social scientists and data scientists worked together on a scientific challenge to improve the lives of disadvantaged children in the United States?

Questions? Read blog posts here


Overview

The Fragile Families Challenge is a mass collaboration that will combine predictive modeling, causal inference, and in-depth interviews to yield insights that can improve the lives of disadvantaged children in the United States.  By working together we can discover things that none of us can discover individually.

The Fragile Families Challenge is based on the Fragile Families and Child Wellbeing Study, which has followed thousands of American families for more than 15 years.  During this time, the Fragile Families study collected information about the children, their parents, their schools, and their larger environments.

These data have been used in hundreds of scientific papers and dozens of dissertations, and insight from these studies are routinely shared with policy makers around the world through the Future of Children, which is jointly published by Princeton University and Brookings Institution. Your challenge is to use this data in a new way.  Given all the background data from birth to year 9 and some training data from year 15, how well can you infer six key outcomes in the year 15 test data?

Schematic of the Fragile Families Challenge. Participants will use the background data from birth to year 9 and some training data from year 15 to make inferences about six key outcomes in the year 15 test data.

This predictive modeling is not the end of the project, however.  It is just the beginning.  We will use the models submitted to the Fragile Families Challenge to advance the scientific goals of the project, and we will publish the results in scientific journals, both individually and collectively.

The Fragile Families Challenge is open to everyone, no matter where you live or what you do.  In fact, we’re confident that some of the best ideas will come from unexpected places.


How to participate

Apply to participate

Apply to participate, sign a data protection agreement, and then download our data files for the Fragile Families Challenge. These files include information about each family from birth to age 9 and some training data from age 15.

Build a model

Use the Fragile Families Challenge data and a creative modeling strategy to infer six key outcomes at age 15. You can use whatever modeling strategy you think will work best. Models will be evaluated by mean squared error in a holdout set kept private until the end of the Challenge.

Upload your contribution

Prepare a package that includes your predictions, your code, and a narrative explanation of your approach. Upload your contribution, and see your score on the leaderboard. Watch your score improve as you develop and upload new approaches! At the end of the Challenge, submissions will be released open source in order to advance the scientific goals. Deadline: Aug. 1, 2017


Why participate?

  • Help the world:

    The Fragile Families Challenge is designed to produce scientific knowledge that can be used to improve the lives of disadvantaged children in the United States. Even more than that, we hope the Fragile Families Challenge can serve as a model for how social scientists and data scientists can collaborate on problems of societal importance.

  • Learn new skills:

    The Fragile Families Challenge blends ideas from social science and data science. Maybe you’re a data scientist that wants to start working with social data?  Maybe you’re social scientists that wants to learn more about machine learning?  Either way, the Fragile Families Challenge is for you. This blending of ideas also makes the Fragile Families Challenge ideal to assign in a class that you are teaching.

  • Get involved in scientific research:

    The Fragile Families Challenge is real scientific research. While working on the project, you’ll have a chance to interact with the other participating scientists and the distinguished researchers on our Board of Advisors.

  • Win prizes:

    We will award prizes to participants who make important contributions to the project. All prize winners will be given an all-expenses paid trip to Princeton University for the scientific workshop at the end of the project.

  • Have fun:

    The Fragile Families Challenge can be worked on in teams, and we hope that you’ll enjoy working with data, learning new skills, and cooperating and competing with people from all over the world.

  • Publish papers:

    We will publish the results of the Fragile Families Challenge in scientific journals, both individually and collectively. Participants who make important contributions will have the opportunity to be a co-author on the paper describing the results of the Fragile Families Challenge.


Scientific goals

The Fragile Families Challenge is our attempt to create a new way of doing social research, one that is much more open to the talents and efforts of everyone. We expect that by combining ideas from social science and data science, we can—together—help address important scientific and social problems. And, we expect that through a mass collaboration we will accomplish things that none of us could accomplish individually.

The Fragile Families Challenge will involve two steps. In the first step, described above, participants will build statistical and machine learning models of several important outcomes in the lives of the children. Participants will then submit their code, their model outputs, and a narrative explanation of their modeling strategy. Then, we will use the unreleased test set to evaluate each model. This first step is an example of the common task method, which David Donoho (2015) has called the “secret sauce” of machine learning. At the end of the first step, we will optimally combine all the individual models into a community model. A variety of results about ensemble methods in machine learning suggest that this community model will perform better than the best individual model.

In the second step, we will use the individual models and the community model to conduct substantive and methodological research. Here are three examples:

  • Discover unmeasured and important factors
    The community model can be used to identify and help us learn from children who are “beating the odds.” For example, consider children who the community model predicts to have a low grade point average and who actually have a high grade point average. By conducting qualitative, in-depth interviews with these children and their caregivers—as well as children who are struggling—we can help discover previously unmeasured and important factors impacting the lives of children. The newly discovered factors can then be collected in future waves of data collection for the Fragile Families study. This goal is discussed in greater detail in our blog post on the topic.
  • Prioritize issues for intervention
    There are many issues that are potential targets for policy intervention in efforts to improve the lives of children. However, before actually intervening in the lives of children—either through randomized controlled trials or large-scale policy changes—it is important to make the best possible estimates using existing non-experimental data. For example, eviction is a natural target for policy intervention, but it is challenging to estimate the causal impact of eviction on children. We will use the community model to produce propensity scores for eviction. These propensity scores can then be used to estimate the effect of eviction on all outcomes that are measured in future waves of the Fragile Families study. Estimates of causal effects based on propensity scores are by no means perfect—they depend on strong and untestable assumptions—but when combined with sensitivity analysis they can provide useful estimates that can help inform the design of future randomized controlled trials. Further, through targeted in-depth interviews, we can assess the plausibility of these assumptions in this context. For more details on the causal inference goal of the Fragile Families Challenge, read our blog post on the topic.
  • Compare modeling approaches
    The dominant modeling strategies in the social sciences involve variations of the generalized linear model. However, social scientists are becoming increasingly interested in modeling approaches emerging from machine learning. Breiman (2001) characterizes these as two different cultures of modeling: one that focuses on informativeness and one that focuses on predictive performance. During the Fragile Families Challenge, researchers will use a variety of different modeling approaches, and we plan to explicitly compare these strategies in terms of their informativeness and predictive performance in order to assess the trade-offs between these two styles of modeling in a specific empirical context. It is our hope that this comparison will lead to insights about which ideas from machine learning can be fruitfully applied to social science problems where there are thousands—rather than millions—of observations.

These three projects are just some examples of the kinds of research that can be done with the predictions, code, and narratives that are created in the first stage of the Fragile Families Challenge.  Because all of the materials created in the first stage will be released open source, we hope that others will dream up other cools things to do with them.


Prizes

In order to recognize important contributions to the Fragile Families Challenge, we plan to award a series of prizes. Anyone who wins one of these prizes will be offered an all-expenses paid trip to the concluding workshop of the Fragile Families Challenge, which will take place at Princeton University after the end of the Challenge.

  • Best score for each outcome variable by May 1, 2017May 10, 2017 (six awards)
  • Best score for each outcome variable by the end of the challenge, August 1, 2017 (six awards)
  • Most novel approach using ideas from social science (awarded by the Board of Advisors based on submitted narrative explanations)
  • Most novel approach using ideas from data science (awarded by the Board of Advisors based on submitted narrative explanations)
  • Foundational award (awarded by Board of Advisors to the participant who most helped other participants based on submitted narrative explanations)
  • Event-specific prizes (awarded at some events)
  • Wild card (awarded by Board of Advisors)

Whether you win a prize or not, however, we hope that your main reason to participate is because you are excited by the scientific and policy goals of the project.


Resources

Here are some specific materials that you might find helpful:

We will be holding weekly office hours via Google Hangout to answer questions about the data.

Here are some more general materials that you might find helpful to provide context about the project:

If there is something that you need, please let us know (fragilefamilieschallenge@gmail.com).


Events

We will occasionally host or take part in events to help people get started with the Fragile Families Challenge. This includes visiting classes that assign the Fragile Families Challenge. If you’d like to host an event like this, get in touch (fragilefamilieschallenge@gmail.com).

Upcoming events

June 2, Getting Started Workshop at UCLA
(event page) (preview slides) (register)

12-4pm
CCPR Seminar Room
4240 Public Affairs Building
If you would like to participate, mention the UCLA workshop when you apply to participate!
Co-sponsored by the California Center for Population Research and the Center for Social Statistics.

Past events

 

April 27, Getting Started Workshop at PAA (slides from event)

Hilton Chicago, Conference Room 4G, 10am – 2pm
Annual Meeting of the Population Association of America

April 6, Getting Started Workshop at Indiana University (slides from event)

Social Science Research Commons, 3pm – 7pm

March 28, Getting Started Workshop at Princeton University

190 Wallace Hall, 2:30pm – 5pm
Visit to Sociology 503, open to everyone

We will also host a workshop at Princeton University at the end of the first stage of the Fragile Families Challenge. At this workshop, participants will share their results and begin new collaborations.


Publishing

The Fragile Families Challenge is a scientific project, and we plan to publish the results. First, we will publish a single paper presenting the design and results from the Fragile Families Challenge. Everyone who makes a submission that meets a set of basic criteria will be invited—but not required—to be a co-author on this paper. The threshold for co-authorship will be determined soon and will be an absolute rather than a relative threshold, so that there is no limit on the possible number of participants who will qualify as co-authors. Further, we plan to organize a special issue of a journal where everyone who is a co-author on the collaborative paper will have the opportunity to submit an individual paper describing their contribution. The papers submitted to the special issue of the journal will be peer-reviewed so we cannot guarantee publication.

In addition to these two publishing opportunities, participants are welcome to publish their results in other ways.


About

The Fragile Families Challenge is physically housed in Bendheim-Thoman Center for Research on Child Wellbeing at Princeton University.  It is being organized by Matthew Salganik, Ian Lundberg, and Sara McLanahan.

The project is overseen and guided by a Board of Advisors:

We have received valuable web development assistance from Luke Baker and Paul Yuen of Agathon Group and Eric Carmichael of CK Collab. We have received valuable research assistance from Cathy Chen and Boriana Pratt. We received wonderful feedback on an early version of this project at a workshop on “Solution-Oriented Social Science” organized by Duncan Watts and Victoria Stodden as part of the Social Science Research Council working group on Digital Social Science.

All participants in the Fragile Families and Child Wellbeing Study have consented to have their data used for social research. These procedures, as well as procedures to make de-identified data available to researchers, have been reviewed and approved by the Institutional Review Board of Princeton University (#5767). The procedures for the Fragile Families Challenge have been reviewed and approved Institutional Review Board of Princeton University (#8061). In addition, we have also taken further steps to protect the participants in the Fragile Families and Child Wellbeing Study. If you would like to know more, please send us an email.

This project relies on open source software, and we are particularly grateful to the communities behind the following projects:

The Fragile Families Challenge is supported by a grant from the Russell Sage Foundation.