CANSSI National Case Study Competition 2019

> français

Creating a better customer experience for BC Ferries with sailing analysis and delay prediction

Expand your statistical, collaboration and problem-solving skills in this Canadian Statistical Sciences Institute National Case Study Competition (CANSSI NCSC)[. Students will apply their knowledge to solve a real-world problem using a dataset about BC Ferries. As organizations look for students who have real-world problem-solving skills, you’ll gain valuable experience that better prepares you for a successful career in statistics.


The CANSSI NCSC is a project for students enrolled in undergraduate and graduate programs at Canadian Universities. Students will compete in a statistical prediction task. The data for this competition will be made available on September 3rd, and students will be able to submit their solutions online until October 3rd. Students may register for the CANSSI NCSC starting September 3rd. Registration for the regional competitions will remain open until September 29. Carleton University, Concordia University, MacEwan University, Simon Fraser University and the University of New Brunswick will host competitions with cash prizes to judge the solutions of their participating students. Winners of the regional competitions will be invited to compete in a final national poster championship at Simon Fraser University in Burnaby, BC at the CANSSI Headquarters on November 2nd .

Register here:
and then join the competition on Kaggle:

Why Participate:

This case study competition provides a unique opportunity to develop your problem-solving skills and allows you to build creative solutions to a real-world problem — skills that are highly desirable in all organizations. On top of being able to work collaboratively with your team, you’ll hone your presentation skills as you present your solutions to our judges. Not to mention winners get a guaranteed interview with Statistics Canada for a full-time or co-op position and a cash prize.

The Prizes:

  • The top two teams in each region will receive $300.
  • At the national final competition, cash prizes will be awarded to 1st place – $600, 2nd place – $300 and 3rd place – $150.
  • Winners will get guaranteed job interviews with Statistics Canada for full-time and co-op positions.

Important Dates:


Students enrolled in undergraduate and graduate programs at a Canadian University or College may participate in this competition. People that are not enrolled in an undergraduate or graduate program may still participate, but they will not be eligible for the cash prizes or judging in the regional competition or national poster championship.

The Challenge:

This national case study competition is about predicting ferry delays in BC Ferry sailings around Vancouver harbours. The dataset consists of 61,880 sailings occurring between August 2016 and March 2018. The dataset is split into a training dataset including 80% of the sailings (49,504 sailings between August 2016 and November 2017) and a testing dataset including 20% of the sailings (12,376 sailings between November 2017 and March 2018). The task is to predict whether or not each sailing described in the testing dataset was delayed. A variety of covariates are provided for each sailing (date, time of departure, departure terminal, arrival terminal, the name of the vessel, and so on). These covariates are described more fully in the Data section below. In addition to these covariates, some weather data and traffic data is provided.

In the regional competitions and national poster championship, students will be judged based on the accuracy of their delay predictions (percent correct), and also a poster in which they discuss their methods and results and additional insight about the data provided by their analysis.

Poster preparation info and tips


The ferry dataset involves records about the sailing of 61,880 sailings occurring between August 2016 and March 2018 for routes starting or ending at one of Horseshoe Bay, Swartz Bay, Tsawwassen and Departure Bay. For each sailing the following information is provided:

  • Name of vessel
  • Scheduled departure time
  • Departure harbour
  • Arrival harbour
  • Date (including day of week and day of year)

For the 49,504 sailings among training data, the actual duration of the sailing is provided and an indicator is provided describing whether or not the sailing was delayed. For the 12,376 sailings in the testing data, the actual duration of the sailing and the delay indicator are not provided and instead the delay indicator must be predicted.

A time series of temperature and humidity from Vancouver Harbour is also provided, along with a time series of temperature, humidity, pressure, wind speed, and wind direction from Victoria Harbour. A time series of ordinal traffic volume data from the Lions Gate bridge is also provided (in which traffic is ranked on a scale between 1 and 5). This bridge links downtown Vancouver and North Vancouver, a major arterial route towards the Horseshoe bay Ferry terminal.

Further detail about these data will be provided simultaneously with the data release on September 3rd. These data are in the public domain and may be redistributed or modified.


  • You may work in teams of up to three people.
  • You may use any libraries, software, programming languages or methods in this contest.
  • You may use any code you find on the internet provided that
    • The code is available under an open license (e.g.: anything from is fine).
    • You note the outside sources that you’ve used as a comment in your code.
  • You may use code written by other participants that aren’t on your team provided that:
    • You have their permission to use it.
    • You note them as a source that you’ve used as a comment in your code.
  • You may ask professors or supervisors or other people outside the contest for help and advice, but all your work must be done by your team.


Thank you also to BC Ferries and Transport Canada!

Comments are closed.