As we near the midpoint of the Challenge, we are excited to report on the progress of our first cluster of participants: student teams in COS424, the machine learning fundamentals course at Princeton. You can find some schematic analyses of their performance over time, modeling strategies, and more here. Some of the students have open-sourced their code for all participants to use and learn from; you can find that code here.
Thanks to all the COS424 students for their awesome contributions!
We were glad to receive many submissions in time for the progress prizes! As described below, we have downloaded these submissions and look forward to evaluating them and determining the best submissions at the end of the Challenge.
We are excited to announce that progress prizes will be given based on the best-performing models on Wednesday May 10, 2017 at 2pm Eastern time. We will not announce the winners, however, until after the Challenge is complete.
Here’s how it will work. On May 10, 2017 at 2pm Eastern time, we will download all the submissions on the leaderboard. However, we will not calculate which submission has the lowest error on the held-out test data until after the Challenge is complete. The reason for this delay is that we don’t want to reveal any information at all about the held-out test data until after the Challenge is over.
From the submissions that we have received by May 10, 2017 at 2pm Eastern Time, we will pick the ones that have the lowest mean-squared error on the held-out test data for each of the six outcome variables. In other words, there will be one prize for the submission that performs best for grit, and there will be another prize for the submission that performs best for grade point average, and so on.
All prize winners will be invited to participate in the post-Challenge scientific workshop at Princeton University, and we will cover all travel expenses for invited participants. If the prize-winning submission is created by a team, we will cover all travel expenses for one representative from that team.
Provide food and a friendly collaborative environment
Work together to produce your first submission!
When: 10am – 2pm, Thursday, April 27 Where: Hilton Chicago, Conference Room 4G (DIRECTIONS: Come to the 4th floor and we’re the room way down at the end.) Who: You! Anyone involved in social science and/or data science can make an important contribution. RSVP: Mention you’re coming to our PAA workshop when you apply to participate!
As part of the challenge, we’re interested in understanding and learning from the strategies participants are using to predict outcomes in the Fragile Families data. One major goal of the challenge is to learn how these strategies evolve and develop over time. We think that a more systematic understanding of how social scientists and data scientists think with data has the potential to better inform how statistical analysis is done. To do this analysis, we use the code and narrative analysis included with each submission.
Recently, we updated the code that evaluates predictions to ensure that groups don’t forget to include their code in their submissions.
What does this mean for me?
Make sure that your directory contains all of the code you used to generate your predictions.
It’s not a problem if the code is in multiple/un-executable scripts. When we look over code submissions, we don’t execute the code.
If you run into an error when you submit your predictions that says you’ve forgotten your code, but your submission does actually contain the code you’ve been using, let us know as soon as possible!