CS 100 Lab 8: Titanic & Data Science

Objective

For this lab, you'll need to complete a submission for the Titanic competition based on the provided notebook that yields a score of at least 78%.

Step 1: Fork the template notebook

You'll need an account on Kaggle to do this step --- you should be able to do this by clicking on "Register with Google" on the Kaggle registration page.

After creating an account, visit the CS 100 Data Science Notebook and click on the "Copy and Edit" button at the top right of the page. After agreeing to the rules of the competition, you should be dropped into the notebook editor.

Step 2: Evaluate the notebook and submit results

Once in the notebook, select the first cell in the notebook and evaluate it by clicking on the "Run" button (the triangular play button in the toolbar). You can evaluate each cell (in succession) in this manner, or click on "Run All" to evaluate all the cells in the notebook at once.

To submit the results computed by the notebook, you first need to "commit" it. Do this by clicking on the "Save Version" button at the top right of the page, selecting "Save & Run All (Commit)", then clicking "Save". The resulting process will take a short while, during which the system re-runs all your code and generates an output file.

After the process completes, you can click on the number next to the "Save Version" button to show all the versions of the notebook. Next to the most recent version (in the "Version History" column), you can click on the ellipsis ("...") to find the "Submit to Competition" link. Click on that to bring up a dialog box where you can confirm your submission.

You can visit the submissions page for the Titanic competition to see your best score so far. Note that you can submit make up to 10 submissions daily, so don't submit carelessly!

The provided notebook (which uses a straightforward sex-based prediction model) should yield an accuracy score of 76.55%. In the next step, you will try to get this score past 78%.

Step 3: Improving the score

The only code you will need to touch in your notebook is in the cell just under the heading "Making predictions". Specifically, you'll be adding additional if-else clauses in the for loop which iterates over all the rows in the test data. Of course, you can fiddle with the cells in the "Basic analysis" and "Working with rows manually" sections to print out / visualize information to help you form hypotheses about how to better make predictions.

If you're not familiar with the Python language (you're not expected to be!) and get stuck on syntax, please reach out to your TA for help and advice.

After changing the cell, make sure that the number of predictions you created matches the number of rows in test_data --- the second cell in the "Making predictions" section contains an assertion that tests this for you.

When you're ready to test your updated results, just make a new submission as explained in the previous section. Remember, your goal is to make a submission with a score of at least 78%!

Step 4: Submission (to us)

To let us view your results, the first thing you'll need to do is make sure your notebook is public. You can update the access settings from the notebook editor by clicking on "Share" on the top right and selecting "Public" in the drop-down menu. This will give you a public URL that you can submit using the lab submission form. Before submitting the URL make sure to check it yourself, and look for a valid public score!