What would happen if hundreds of social scientists and data scientists worked together on a scientific challenge to improve the lives of disadvantaged children in the United States?

 

Read a quick overview from the Princeton University Office of Communications. More questions? Read blog posts here


Overview

The Fragile Families Challenge is a mass collaboration that will combine predictive modeling, causal inference, and in-depth interviews to yield insights that can improve the lives of disadvantaged children in the United States.  By working together we can discover things that none of us can discover individually.

The Fragile Families Challenge is based on the Fragile Families and Child Wellbeing Study, which has followed thousands of American families for more than 15 years.  During this time, the Fragile Families study collected information about the children, their parents, their schools, and their larger environments.

These data have been used in hundreds of scientific papers and dozens of dissertations, and insight from these studies are routinely shared with policy makers around the world through the Future of Children, which is jointly published by Princeton University and Brookings Institution. Participants were challenged to use this data in a new way.  Given all the background data from birth to year 9 and some training data from year 15, how well could participants infer six key outcomes in the year 15 test data?

Schematic of the Fragile Families Challenge. Participants used the background data from birth to year 9 and some training data from year 15 to make inferences about six key outcomes in the year 15 test data.

This predictive modeling is not the end of the project, however.  It is just the beginning.  We will use the models submitted to the Fragile Families Challenge to advance the scientific goals of the project, and we will publish the results in scientific journals, both individually and collectively.

The Fragile Families Challenge is open to everyone, no matter where you live or what you do.  In fact, we’re confident that some of the best ideas will come from unexpected places.


How to participate

Apply to participate

The window for participation has now closed. Participants applied to participate, signed a data protection agreement, and then downloaded our data files for the Fragile Families Challenge. These files included information about each family from birth to age 9 and some training data from age 15.

Build a model

Participants used the Fragile Families Challenge data and creative modeling strategies to infer six key outcomes at age 15. They could use whatever modeling strategy they thought would work best. Models were evaluated by mean squared error in a holdout set kept private until the end of the Challenge.

Upload your contribution

Each participant prepared a package that includes their predictions, code, and a narrative explanation of their approach. They uploaded their contributions, and saw their scores on the leaderboard. Participants could watch their score improve as they developed and uploaded new approaches! At the end of the Challenge, submissions are being released open source in order to advance the scientific goals. Deadline: Aug. 1, 2017, 2pm EDT


Why participate?

  • Help the world:

    The Fragile Families Challenge is designed to produce scientific knowledge that can be used to improve the lives of disadvantaged children in the United States. Even more than that, we hope the Fragile Families Challenge can serve as a model for how social scientists and data scientists can collaborate on problems of societal importance.

  • Learn new skills:

    The Fragile Families Challenge blends ideas from social science and data science. Maybe you’re a data scientist that wants to start working with social data?  Maybe you’re social scientists that wants to learn more about machine learning?  Either way, the Fragile Families Challenge is for you. This blending of ideas also makes the Fragile Families Challenge ideal to assign in a class that you are teaching.

  • Get involved in scientific research:

    The Fragile Families Challenge is real scientific research. While working on the project, participants have a chance to interact with the other participating scientists and the distinguished researchers on our Board of Advisors.

  • Win prizes:

    We will award prizes to participants who make important contributions to the project. All prize winners will be given an all-expenses paid trip to Princeton University for the scientific workshop at the end of the project.

  • Have fun:

    The Fragile Families Challenge could be worked on in teams, and we hoped that participants would enjoy working with data, learning new skills, and cooperating and competing with people from all over the world.

  • Publish papers:

    We will publish the results of the Fragile Families Challenge in scientific journals, both individually and collectively. Participants who made important contributions will have the opportunity to be a co-author on the paper describing the results of the Fragile Families Challenge.


Scientific goals

The Fragile Families Challenge is our attempt to create a new way of doing social research, one that is much more open to the talents and efforts of everyone. We expect that by combining ideas from social science and data science, we can—together—help address important scientific and social problems. And, we expect that through a mass collaboration we will accomplish things that none of us could accomplish individually.

The Fragile Families Challenge involves two steps. In the first step, described above and now complete, participants built statistical and machine learning models of several important outcomes in the lives of the children. Participants then submitted their code, their model outputs, and a narrative explanation of their modeling strategy. We are currently using the unreleased test set to evaluate each model. This first step is an example of the common task method, which David Donoho (2015) has called the “secret sauce” of machine learning. At the end of the first step, we are optimally combining all the individual models into a community model. A variety of results about ensemble methods in machine learning suggest that this community model will perform better than the best individual model.

In the second step, we will use the individual models and the community model to conduct substantive and methodological research. Here are three examples:

  • Discover unmeasured and important factors
    The community model can be used to identify and help us learn from children who are “beating the odds.” For example, consider children who the community model predicts to have a low grade point average and who actually have a high grade point average. By conducting qualitative, in-depth interviews with these children and their caregivers—as well as children who are struggling—we can help discover previously unmeasured and important factors impacting the lives of children. The newly discovered factors can then be collected in future waves of data collection for the Fragile Families study. This goal is discussed in greater detail in our blog post on the topic.
  • Prioritize issues for intervention
    There are many issues that are potential targets for policy intervention in efforts to improve the lives of children. However, before actually intervening in the lives of children—either through randomized controlled trials or large-scale policy changes—it is important to make the best possible estimates using existing non-experimental data. For example, eviction is a natural target for policy intervention, but it is challenging to estimate the causal impact of eviction on children. We will use the community model to produce propensity scores for eviction. These propensity scores can then be used to estimate the effect of eviction on all outcomes that are measured in future waves of the Fragile Families study. Estimates of causal effects based on propensity scores are by no means perfect—they depend on strong and untestable assumptions—but when combined with sensitivity analysis they can provide useful estimates that can help inform the design of future randomized controlled trials. Further, through targeted in-depth interviews, we can assess the plausibility of these assumptions in this context. For more details on the causal inference goal of the Fragile Families Challenge, read our blog post on the topic.
  • Compare modeling approaches
    The dominant modeling strategies in the social sciences involve variations of the generalized linear model. However, social scientists are becoming increasingly interested in modeling approaches emerging from machine learning. Breiman (2001) characterizes these as two different cultures of modeling: one that focuses on informativeness and one that focuses on predictive performance. During the Fragile Families Challenge, researchers will use a variety of different modeling approaches, and we plan to explicitly compare these strategies in terms of their informativeness and predictive performance in order to assess the trade-offs between these two styles of modeling in a specific empirical context. It is our hope that this comparison will lead to insights about which ideas from machine learning can be fruitfully applied to social science problems where there are thousands—rather than millions—of observations.

These three projects are just some examples of the kinds of research that can be done with the predictions, code, and narratives that are created in the first stage of the Fragile Families Challenge.  Because all of the materials created in the first stage will be released open source, we hope that others will dream up other cools things to do with them.


Prizes

In order to recognize important contributions to the Fragile Families Challenge, we plan to award a series of prizes. Anyone who wins one of these prizes will be offered an all-expenses paid trip to the concluding workshop of the Fragile Families Challenge, which will take place at Princeton University after the end of the Challenge.

  • Best score for each outcome variable by May 1, 2017May 10, 2017 (six awards)
  • Best score for each outcome variable by the end of the challenge, August 1, 2017, at 2pm EDT (six awards)
  • Most novel approach using ideas from social science (awarded by the Board of Advisors based on submitted narrative explanations)
  • Most novel approach using ideas from data science (awarded by the Board of Advisors based on submitted narrative explanations)
  • Foundational award (awarded by Board of Advisors to the participant who most helped other participants based on submitted narrative explanations)
  • Event-specific prizes (awarded at some events)
  • Wild card (awarded by Board of Advisors)

Whether you win a prize or not, however, we hoped the main reason for participation would be because you are excited by the scientific and policy goals of the project.


Resources

Here are some specific materials we thought would be helpful to participants:

We held weekly office hours via Google Hangout to answer questions about the data.

Here are some more general materials that you might find helpful to provide context about the project:

If there is something that you need, please let us know (fragilefamilieschallenge@gmail.com).


Events

Upcoming events

Tuesday, October 17, The Fragile Families Challenge: What happened and what’s next

1:30pm – 2:30pm
Princeton University
Louis A. Simpson International Bldg., Room 271 (event info)

Thursday and Friday, November 16-17, Scientific Workshop

Princeton University (event info)
All are invited to a scientific workshop recapping the first stage of the Fragile Families Challenge. Prize winners will be offered an all-expenses-paid trip, but all are welcome. Those who cannot attend in person are invited to join by livestream (livestream link, no registration required).

Past events

Friday, October 6, Combining Survey Social Science with Data Science Methods: Fragile Families Challenge and Beyond

3pm – 4pm
University of Michigan
ISR-Thompson 1430 (event info)

Sunday, August 13, Gathering at the American Sociological Association Annual Meeting in Montreal

2pm
Fragile Families and Child Wellbeing Study Booth, Exhibit Hall, Palais des Congrès de Montréal in Montréal, Québec (event info) (conference info)

June 23, Getting Started Workshop at Princeton with Livestream (event page)

10:30am – 4pm
Princeton University (Julis Romo Rabinowitz Building (Room 399), Livestream information will be posted here)
Co-sponsored by the Summer Institute in Computational Social Science.

June 2, Getting Started Workshop at UCLA
(slides from event) (video from event)

12-4pm
CCPR Seminar Room
4240 Public Affairs Building
If you would like to participate, mention the UCLA workshop when you apply to participate!
Co-sponsored by the California Center for Population Research and the Center for Social Statistics.

April 27, Getting Started Workshop at PAA (slides from event)

Hilton Chicago, Conference Room 4G, 10am – 2pm
Annual Meeting of the Population Association of America

April 6, Getting Started Workshop at Indiana University (slides from event)

Social Science Research Commons, 3pm – 7pm

March 28, Getting Started Workshop at Princeton University

190 Wallace Hall, 2:30pm – 5pm
Visit to Sociology 503, open to everyone


Publishing

The Fragile Families Challenge is a scientific project, and we plan to publish the results. There are two opportunities for publishing with us.

Opportunity 1: We will publish a single paper presenting the design and results from the Fragile Families Challenge. Everyone who makes a submission that meets a set of basic criteria will be invited—but not required—to be a co-author on this paper. There is no limit on the possible number of participants who will qualify as co-authors.

Opportunity 2: Separately, the journal Socius will publish a special issue on the Fragile Families Challenge. All participants in the challenge are invited to submit a manuscript. Papers submitted to the special issue of the journal will be peer-reviewed, so we cannot guarantee publication. For more details, see the call for papers.


About

The Fragile Families Challenge is physically housed in Bendheim-Thoman Center for Research on Child Wellbeing at Princeton University.  It is being organized by Matthew Salganik, Ian Lundberg, Alex Kindel, and Sara McLanahan.

The project is overseen and guided by a Board of Advisors:

We have received valuable web development assistance from Luke Baker and Paul Yuen of Agathon Group and Eric Carmichael of CK Collab. We have received valuable research assistance from Cathy Chen and Boriana Pratt. We received wonderful feedback on an early version of this project at a workshop on “Solution-Oriented Social Science” organized by Duncan Watts and Victoria Stodden as part of the Social Science Research Council working group on Digital Social Science.

All participants in the Fragile Families and Child Wellbeing Study have consented to have their data used for social research. These procedures, as well as procedures to make de-identified data available to researchers, have been reviewed and approved by the Institutional Review Board of Princeton University (#5767). The procedures for the Fragile Families Challenge have been reviewed and approved Institutional Review Board of Princeton University (#8061). In addition, we have also taken further steps to protect the participants in the Fragile Families and Child Wellbeing Study. If you would like to know more, please send us an email.

This project relies on open source software, and we are particularly grateful to the communities behind the following projects:

The Fragile Families Challenge is supported by a grant from the Russell Sage Foundation.