Progress prizes

We were glad to receive many submissions in time for the progress prizes! As described below, we have downloaded these submissions and look forward to evaluating them and determining the best submissions at the end of the Challenge.

We are excited to announce that progress prizes will be given based on the best-performing models on Wednesday May 10, 2017 at 2pm Eastern time. We will not announce the winners, however, until after the Challenge is complete.

Here’s how it will work. On May 10, 2017 at 2pm Eastern time, we will download all the submissions on the leaderboard. However, we will not calculate which submission has the lowest error on the held-out test data until after the Challenge is complete. The reason for this delay is that we don’t want to reveal any information at all about the held-out test data until after the Challenge is over.

From the submissions that we have received by May 10, 2017 at 2pm Eastern Time, we will pick the ones that have the lowest mean-squared error on the held-out test data for each of the six outcome variables. In other words, there will be one prize for the submission that performs best for grit, and there will be another prize for the submission that performs best for grade point average, and so on.

All prize winners will be invited to participate in the post-Challenge scientific workshop at Princeton University, and we will cover all travel expenses for invited participants. If the prize-winning submission is created by a team, we will cover all travel expenses for one representative from that team.

We look forward to seeing the submissions.

Matthew Salganik is a Professor of Sociology at Princeton University. He is also the author of the forthcoming book Bit by Bit: Social Research in the Digital Age (http://www.bitbybitbook.com). You can learn more about his research at http://www.princeton.edu/~mjs3.


Aarshay Jain - April 26, 2017 Reply

A few clarifications:
1. Earlier this date was May 1st but it has been shifted to May 10 right?
2. Can we know the split of the test set among the public leaderboard score which we see currently and the hold-out set?
3. The final ranking will be based on only the private part of the hold-out set of combined public and private data?


Matt Salganik - April 28, 2017 Reply

Thanks Aarshay.
1) Yes, we moved back the deadline for logistical reasons.
2) The split of the data is 4/8 public, 1/8 leaderboard, and 3/8 held-out test.
3) The prizes will be based on the held-out test data. The leaderboard data will not be included in those calculations.


Aarshay Jain - May 2, 2017 Reply

Thanks Matt!

One more concern, in the train file, many of the values for outcomes are missing. Similarly there will be missing values in the test data as well. How will we be evaluated for those values? Will they be ignored or do you have some baseline predictions for those?

Matt Salganik - May 2, 2017 Reply

You will not be evaluated on the cases with missing outcomes in the held-out test data.

Jeremy Freese - May 4, 2017 Reply

Wait, the initial progress prize isn’t based on the leaderboard? I thought that was what the leaderboard was for?

Matt Salganik - May 4, 2017 Reply

Sorry for the confusion. The leaderboard is just to give you a sense of how you are doing.

Ian Lundberg - May 5, 2017 Reply

This is a common question, though, so thanks for asking. I just added a blog post to clarify (http://www.fragilefamilieschallenge.org/leaderboard/). The leaderboard doesn’t quite qualify as held-out data, since participants get some feedback from it. To deal with that, at the start of the challenge we split the data so we could evaluate submissions on an entirely separate test set. Absent substantial overfitting to the leaderboard, I still expect those who do well in the leaderboard to do fairly well in the test set!

Jeremy Freese - May 5, 2017 Reply

Thanks for the clarification. (I had misunderstood this, which is a little unfortunate because I explained it to my students wrong. I had thought the initial progress prizes were going to be based on the leaderboard-quiz-set and the prizes at the end were based on the held-out-test-set-sample.)

Louis - May 10, 2017 Reply


When is the (final) deadline planned for the fragile families challenge?
(Postponing is always fine, but anteponing is more difficult to cope with)
Best, louis

Matt Salganik - May 12, 2017 Reply

Hi Louis,

We have just announced that the final submission deadline is Tuesday, August 1 at 2pm ET:

Take care,

Steve McKay - May 26, 2017 Reply

Do well on predicting the life chances of children – win an academic trip.
Do well predicting house prices slightly better – $1.2million (https://www.kaggle.com/c/zillow-prize-1)

