2018 CANSSI Annual Report

Message from the Director

A lot has happened in 2018 to solidify CANSSI’s place in the research landscape in Canada!

On May 25 we incorporated the Canadian Statistical Sciences Institute, also known as Institut canadien des sciences statistiques, under the Canada Not-for-profit Corporation Act. The first meeting of the Directors of the Corporation was held on June 2, followed by a meeting of the members. General Operating Bylaw No. 1 was enacted by the Directors and confirmed by the members of the Corporation on June 2. An initial notice was also filed with the Ontario Ministry of Corporations. We also presented our first Strategic Plan to the Board on June 2.

On December 7 we opened our official headquarters at Simon Fraser University This exciting event was preceded by a Public Lecture at SFU’s downtown campus on the evening of December 6. Jeffrey Rosenthal kept the audience spellbound on the topic of “Luck, Chance and the Meaning of Life”. The CANSSI headquarters in the Big Data Hub on the Burnaby campus was opened by President Andrew Petter of SFU. The event showcased our newly renovated space and featured lightning talks from across the country. The Big Data Hub event planning team did a wonderful job of coordinating the several presentations.

CANSSI is a national institute, and drawing on statistical science across the country is key to our success. To strengthen this, we are establishing a set of Regional Centres for CANSSI. We have signed a collaboration agreement with Concordia University for CANSSI Québec, and are currently in the final stages of negotiating an agreement with the University of Toronto for CANSSI Ontario. Discussions with the University of Manitoba and Dalhousie University are underway.

Scientific Activities

Our second set of three CRTs ended their three-year projects in 2018, and CRTs 10–12 were launched on April 1. In December the Board approved Teams 13 and 14 to begin in 2019. Details of their activities and successes are provided in the following pages—the scope of research and collaboration represented by these groups is remarkable.

David Haziza, leader of Team 5, won the CRM-SSC award, which recognizes outstanding research during the first 15 years from a doctorate. The citation notes his “outstanding contributions to survey sampling theory and practice, …  and their impact on the practice of national statistical agencies.”

CANSSI was a co-sponsor at two invited paper sessions and two topic contributed paper sessions at the Joint Statistical Meetings in Vancouver; these sessions featured research of teams 4 and 10, as well as our support through the workshops program of the Simon Fraser sports analytics group.

Team 7, led by Farouk Nathoo, U Vic, and Linglong Kong, U Alberta, had a remarkable 10 papers published or accepted in 2018, with 10 more submitted for publication, in leading statistical journals such as the Journal of the Royal Statistical Society, and in important applied outlets, including Statistics in Medicine.

Three graduated CANSSI postdoctoral fellows showcased their research at the annual meeting of the Statistical Society of Canada. Keiran Campbell, a CANSSI PDF at UBC and BC Cancer Agency, was awarded a prestigious Banting Postdoctoral Fellowship

A special issue of the Canadian Journal of Statistics featuring CRT research was published online, with print publication to be V1 of 2019.

Health Sciences

Our network of Health Science Collaborating Centres across the country expanded by 4, and we now have 11 HSCCs across the country. These are designed to facilitate research collaborations between health scientists and statistical scientists, to offer experiential learning to graduate students, and to initiate projects on emerging issues in health and statistics. Trainees in their programs across the country in 2017-2018 included at least 12 postdocs, 16 PhD students, 46 master’s students and 4 undergraduates.  Seven special events or workshops have been held under the HSCC auspices.  The committee organized a panel discussion at the 2018 SSC meeting, to continue the focus on encouraging statistical scientists to apply for funding from CIHR.  Four CIHR-funded researchers talked about their research and provided advice to future applicants. 

The Health Sciences Committee also provided advice on briefing notes for CIHR leadership, which were prepared by the chair, Mary Thompson, and the Scientific Director. These have been sent to the President of CIHR and the Director of CIHR’s Institute of Genetics, with a view to follow-up conversations about the role CANSSI can play in supporting CIHR objectives.

Data Science

Thanks to the efforts of former Board member Chad Gaffield we offer a very successful course on “Introduction to Machine Learning in the Digital Humanities” at the Digital Humanities Summer Institute in Victoria. Course instructors Paul Barrett and Nathan Taback gave the week-long course in 2018.

A week-long data science workshop took place at the Fields Institute September 24 to 28; this was a retrospective workshop for the program on Big Data held there in 2015. The fifth day of the September workshop featured Data Science in Industry and was held at the MaRS Discovery District. The very successful day included keynote presentations from Mark Girolami, Director of the Data-Centric Engineering Program at the Alan Turing Institute and Garth Gibson, CEO of the Vector Institute. Plenary talks were presented by Ruslan Salakhutdinov, Director of AI at Apple, Hilary Parker, Data Scientist at Stitch-Fix,  Peter Misek venture capitalist at the Business Development Bank of Canada, Madalin Mihailescu, CTO of Georgian Partners, and Jonathan Briggs of the CPPIB. Thanks are due to Helen Kontozopoulos from the Department of Computer Science Innovation Lab and Nathan Taback form the Department of Statistical Sciences for making Industry Day a sold-out success.

Industrial Innovation Committee

This committee, chaired by Thierry Duchesne, works closely with the industrial innovation platform at the mathematical sciences institutes, and with relevant industrial partners. In 2018 the committee held a series of “Connect” workshops, supported by NSERC, in Waterloo, Ottawa, Montreal and Calgary. The also organized an invited paper session at the annual meeting of the SSC, on “Making Connections with Industry: Resources and Advice”. Dave Campbell and Nancy Reid were invited to present CANSSI to the American Statistical Association’s Statistical Partnerships Among Academe, Industry and Government Committee at a speakers’ luncheon at the Joint Statistical Meetings in Vancouver.

Future

CANSSI’s current NSERC funding has been extended from the initial five-year award for two additional years, so we expect to submit a proposal to the CTRMS program in fall, 2020, with funding to flow in 2021. Of course we are hoping that we can substantially increase our funding in the next cycle, in line with our aspirations, opportunities, and the importance of data science to Canada. A search for the next Scientific Director is underway, and Regional Centre Directors will be appointed across the country, filling the role of the current Associate Directors for their regions.

Thanks

The names mentioned above are just a few of the many colleagues who dedicate time and effort to building CANSSI to its full potential. Our Scientific Advisory Committee reviews CRT proposals very thoroughly, and provides excellent and timely advice to CANSSI about research directions and organization.  The Board of Directors is a very active group that works hard to oversee all the aspects of CANSSI’s operations, and to guide us as we grow.

The Deputy Director, John Braun, and the Associate Directors Joanna Mills Flemming, Erica Moodie, Mary Thompson, Paul McNicholas, Mohammad Jafari Jozani, and Karen Buro provide invaluable assistance with the day-to-day management of CANSSI.

Angela Plagemann’s role expanded considerably with the establishment of the headquarters at SFU. She is now coordinating three part-time staff provided to CANSSI by SFU’s Big Data Hub, who help with communications, event planning, and business development. In addition, she designed and managed the renovation of our lovely space in the Big Data Hub. The last of the furniture is arriving as I write this; if you find yourself in Vancouver, please drop in to say hello!   

Overview

CANSSI is the national institute that advances the development, application, and communication of cutting-edge statistical and data science research and training.

Our Strategic Plan 2018 outlines six strategic priorities for fulfilling our mission, in multi-disciplinary research, in multi-disciplinary training, in leadership and in knowledge translation.

Our flagship program is the Collaborative Research Team Program: in 2018, we supported nine such teams. Teams 4, 5, and 6 completed their projects; Teams 7 and 8 will finish in 2019; Team 9 was in its second year; Teams 10–12 started in April.

CANSSI’s Workshops and Conferences Program saw a 13% increase in attendance with participants from across Canada as well as from an additional 33 countries. Our Postdoctoral Program offered six partial fellowships in 2017. The Distinguished Visitor Program supported four visitors across the country. We continue to support undergraduate datathons, such as the ASA DataFest and VanSASH. Six students took advantage of the support to attend SAMSI Undergraduate Workshops. There are currently eleven CANSSI Health Science Collaborating Centres across the country. Overall CANSSI’s programs and activities reached over 2300 users.

CANSSI Programs

Collaborative Research Teams

Joint Analysis of Neuroimaging Data: High-Dimensional Problems, Spatiotemporal Models and Computation

Project leader(s): Farouk Nathoo (University of Victoria) and Linglong Kong (University of Alberta)

This team was quite active in 2018. Team members gave 17 talks at conferences and workshops in Canada and abroad. Topics included quantile estimation with incomplete data, statistical methods in imaging genetics, spatial models in imaging genetics, and empirical likelihood and robust regression in diffusion tensor imaging data analysis.

Notable achievements include:

  • Team members gave 17 talks at conferences and workshops in Canada and abroad.
  • Postdoctoral fellow Peng Liu will take a lecturer position at University of Kent in the UK starting in February 2019.
  • New student Cailin Harris began an MSc in Bioinformatics.
  • PhD student Yin Song is scheduled to defend his PhD thesis in March, 2019.
  • MSc student Laila Yasmin is scheduled to defend her MSc thesis in March, 2019.
  • We have invited sessions at the SSC and JSM meetings in 2019.

This is the final year of the project. There are plans to continue the work through additional other funding mechanisms. The complete list of publications for this project is available here.

Rare DNA variants and human complex traits: improving analyses of family studies by better modeling the dependence structures

Project leader(s): Alexandre Bureau (Université Laval) and Karim Oualkacha (Université du Québec à Montréal)

This team gave an invited session at the SSC Meetings in June, held a one-week summer school on statistical genetics at the Université Laval in July, and hosted a workshop at BIRS in Banff in August. Two new graduate students joined the team. CRT trainee Christina Nieuwoudt submitted a manuscript describing her research with team Collaborators from the BC Cancer Agency’s Lymphoid Cancer Families Study on Simulating pedigrees ascertained for multiple disease-affected relatives. PhD student, Roland Dossa, visited Laval University in order to continue investigation of a new copula-based method to test for association between a set of rare variants and a dichotomous trait in presence of familial data with CRT team member L. Lakhal-Chaieb. CRT trainee C. Nieuwoudt’s manuscript describing her research with team collaborators from the BC Cancer Agency’s Lymphoid Cancer Families Study was published and her research was profiled by SFU Faculty of Science online. The team published two papers and another is in progress.

Statistical Analysis of Large Administrative Health Databases: Emerging Challenges and Strategies

Project leader(s): Grace Y. Yi (University of Waterloo), Robert Platt (McGill University) and X. Joan Hu (Simon Fraser University)

Grace Yi organized a weekly Data Science Research Group which began on Feb. 1, 2018. The group would include 5 Ph.D. students, 2 postdoctoral fellows, 2 Master’s students, 1 undergraduate student, 1 visitor, and 3 faculty members (Grace Yi, Wenqing He and Liqun Diao).  Discussions during the meetings were related to this project. In addition, team members gave 7 talks at workshops and conferences around the world.

Other highlights from this team in 2018 include:

  • Di Shu (Ph.D. student, Waterloo) worked on two research projects on causal inference with outcome variable subject to missingness or/and misclassification.
  • Menglan Pang (PhD student, jointly supervised by Platt and Abrahamowicz) completed coursework and work on thesis (developing flexible accelerated failure time models).
    Steve Ferreira Guerra (PhD student, jointly supervised by Platt and Abrahamowicz) completed coursework.
  • Gabrielle Simoneau (PhD student, Platt) continued research (paper under review at JASA).
  • Dongdong Li (PhD student, SFU) worked on thesis research, multiple event times with informative censoring, submitted two research papers, and defended in December.
  • Steve Ferreira Guerra (PhD student, jointly supervised by Platt and Abrahamowicz) completed coursework and PhD comp exam.
  • Xin (Shane) Liu (Postdoc, jointly supervised by Yi and Hu – 2018/09/01 to 2019/01/31) started a new research project, using CCHS data to assist prediction for federal elections.
  • A joint postdoctoral fellowship (partially funded by this CANSSI CRT and the NSERC grants from Grace Yi and Joan Hu) was offered to Xin Liu. For the period Feb. 1-July 31, 2018, Dr. Liu was working with Grace Yi and Wenqing He (Western University). The project was completed and the results were wrapped up as the paper, “X. Liu, G. Y. Yi, G. Bauman, and W. He (2018). Boosting Imbalanced-Spatial-Structured Support Vector Machine”, which was submitted for publication.

Towards Sustainable Fisheries: State Space Assessment Models (SSAMs) for Complex Fisheries and Biological Data

Project leader:Joanna Mills Flemming (Dalhousie University)

It was an exciting first year for this new CRT. Two collaborators, Anders Nielsen and William Aeberhard, kicked things off by offering a course titled ‘Stock Assessment via TMB’ from January 29/18 – February 2/18 in Halifax. It was funded by the Technical Expertise in Stock Assessment (TESA) program of the Department of Fisheries and Oceans Canada (DFO) and at capacity with over 30 participants. Shortly after the course was over, one of the participants, Danny Ings (Stock Assessment Biologist, DFO, Northwest Atlantic Fisheries Centre, St. John’s, Newfoundland) reached out to JMF in search of a student trainee to assist with a framework review for the cod stock off the south coast of Newfoundland (3PS cod). This stock is currently assessed with a survey-based (SURBA) model, but there is a need to develop a SSAM to deal with catch uncertainty and integrate more of the available data. The stock is co-managed with France (with respect to St. Pierre and Miquelon), and IFREMER (DFO’s equivalent in France) is willing to provide some analytical support. Mills Flemming’s new PhD student trainee, Jonathan Babyn, relocated to St. John’s in July 2018 and is now working with Ings on this initiative. Babyn will return to Dalhousie in May 2019 to write his thesis proposal (of which this research will form one component). 

PhD student Ethan Lawler was awarded a highly prestigious Vanier Canada Graduate Scholarship in April 2018. Vanier scholars must demonstrate a high standard of scholarly achievement, and leadership skills. The title of his research proposal was “Improving fisheries stock assessment models through spatio-temporal, multi-species, and integrated data models.” Ethan attended the International Statistical Ecology Conference in St. Andrews, Scotland in July 2018. Mills Flemming is Lawler’s supervisor, Aeberhard and Chris Field serve on his supervisory committee.

MSc student Raphael McDonald commenced his program in May 2018. He is being co-supervised by Mills Flemming and Jeff Hutchings (Dalhousie Department of Biology). He successfully passed his Admission to Candidacy Examination in December 2018. His proposal was titled “Incorporating fine scale spatial information into stock assessment modelling frameworks: a case study using the Nova Scotia Inshore Scallop Fishery”. Collaborators Aaron MacNeil, Jessica Sameoto, and David Keith are all members of his supervisory committee.

The team’s inaugural meeting was held at Centre de Researches Mathématiques, Université de Montréal from June 5/18 – June 7/18. It was well attended (11 collaborators and 4 student trainees). On the first day, Nielsen provided a nice update on the latest enhancements to TMB. Noel Cadigan followed with a presentation describing new methodological developments for SSAMs.  Sameoto and Field described the NS Inshore Scallop Fishery case study and Field led the subsequent group discussion. The second day was comprised of interesting presentations by statistical collaborators, fisheries collaborators  and a student trainee. The meeting concluded with a discussion of next steps.

During summer 2018, Mills Flemming gave an invited talk at the International Biometrics Conference in Barcelona, Spain and organized an invited session sponsored by CANSSI, titled “State Space Assessment Models for Complex Fisheries and Biological Data” at the Joint Statistical Meetings in Vancouver. Nielsen, Louis-Paul Rivest and Sean Cox (Simon Fraser University) all gave talks in this session, chaired by Aeberhard. A new postdoctoral fellow, Yuan Yan, (funded by this project and to be based at Dalhousie) was recruited in late August.

MSc student trainee Jiaxin Luo (co-supervised by Mills Flemming and Bruce Smith at Dalhousie) commenced her program in September 2018 and is examining the halibut longline survey standardization of catch for an index of abundance.  The lead assessment biologist, Brendan Wringe (Bedford Institute of Oceanography – BIO, DFO), has been working on implementing a multinomial model (to address hook competition) that was suggested by our former CRT fisheries collaborator Stephen Smith (now deceased).  There are some interesting and difficult statistical features to be resolved.

Rivest and Hugues Benoit developed a project and recruited a MSc student who began in September 2018 but quit in December 2018 after failing her qualifying courses. The project can be described as follows: Reliable fishery independent indices of relative abundance obtained from scientific surveys are very important to robust stock assessment modelling. This project will develop spatial modelling tools for inferring relative abundance in areas un-sampled by surveys, which is a very common problem in fisheries science that can produce important biases in abundance estimates if left unaddressed. Rivest has since recruited three senior statistics students to move this project along in anticipation of recruiting a replacement graduate student. These senior students are undertaking empirical descriptive analyses of the data for the case study used in the project. These analyses will allow for the subsequent modelling to move ahead more rapidly and efficiently once a graduate student is recruited.

A third MSc student was recruited in September 2018. This student is currently working part-time with Marie Auger-Méthé and Andrew Edwards to familiarize himself with the SSAMs used for Pacific Ocean Perch. In April 2019 he will be full-time on this project. Edwards visited Auger-Méthé at UBC in December 2018.

In October 2018, Nielsen was successful at recruiting the International Council for the Exploration of the Sea (ICES) to host our Year 2 Meetings and Workshop in Copenhagen, Denmark (April 15/19 –April 17/19). Later that month Mills Flemming participated in the 3PS Cod Science Regional Peer Review Meeting.

Research Assistant Yihao Yin returned to Dalhousie in late November from a six month visit to China. He is currently working under the direction of Mills Flemming, Sameoto and Keith to explore spatial relationships between meat weight and shell height for the NS Inshore Scallop Fishery case study. Sameoto and Keith are very confident that BIO will have a position for Yihao in Spring 2019.

The calendar year ended on a particularly high note with the CANSSI National Headquarters Grand Opening (and Board Meeting). Mills Flemming gave one of the Lightning talks (focused on this CRT) and led the local celebrations.

Spatial Modeling of Infectious Diseases: Environment and Health

Project leader:  Mahmoud Torabi (University of Manitoba)

This project began in 2018. Team collaborators are Charmaine Dean (University of Waterloo), Mike Pickles (University of Manitoba), Rob Deardon (University of Calgary), Rhonda Rosychuk (University of Calgary), Cindy Feng (University of Saskatchewan), Subhash Lele (University of Alberta), and Erin Rees (Université de Montréal).

Two postdoctoral fellows were recruited: Leila Amiri at the University of Manitoba and Alisha Albert-Green at the University of Waterloo. The team also includes Md Mahsin, a PhD student at the University of Calgary.

Mahsin and Deardon gave a presentation called Geo-dependent individual-level models for infectious diseases transmission at the SSC Meetings in Montreal. They traveled to Mexico to present Spatial infectious disease models incorporating aggregate-level spatial structure at the International Environmetrics Society Meeting.

Statistical Methods for Challenging Problems in Public Health Microbiology

Project leader(s): Leonid Chindelevitch (Simon Fraser University), Alexandre Bouchard-Côté (University of British Columbia), and Jesse Shapiro (Université de Montréal)

This project also began in 2018. Its collaborators are Cedric Chauve (Simon Fraser University), Liangliang Wang (Simon Fraser University), Art Poon (Western University), Luis Barreiro (University of Chicago), Joseph Bielawski (Dalhousie University), David Buckeridge (McGill University), Kelly Burkett (University of Ottawa), Linda Wahl (Western University), William Hsiao (University of British Columbia), Jianhong Wu (York University), Maha Farhat (Harvard University), Caroline Colijn (Simon Fraser University), and Shamil Sunyaev (Harvard University).

Chindelevitch and Shapiro, in collaboration with Maxwell Libbrecht (SFU), have secured a large 3-year grant from Genome Canada to work on a closely related topic: Machine learning to predict drug resistance in pathogenic bacteria, in an amount of $1,000,000.

Chindelevitch has been invited to speak about his work at the Conference on Antimicrobial Resistance organized by the Wellcome Trust, as well as at the launch of the CANSSI headquarters at SFU, about which he was also interviewed in the local press.

Poon has been awarded two CIHR grants, both on topics related to the CRT’s project: “Development, evaluation and implementation of genetic clustering methods for the real-time molecular surveillance of HIV outbreaks” and “Phylodynamics of HIV within hosts”.

Barreiro has been offered – and accepted – a position at the Department of Human Genetics at the University of Chicago. Colijn has been offered – and accepted – a Canada 150 Research Chair and moved to SFU.

Farhat co-organized a day-long seminar on the topic of “Data-Powered Strategies to Counteract Antibiotic Resistance”.

Conferences and Workshops

R and Spark: Tools for Data Science Workflows, April 12-13, Toronto, ON

This joint workshop with the National Institute of Statistical Sciences (NISS) was instructed by E. James Harner, Professor Emeritus of Statistics at West Virginia University. The course introduced data structures in R and their use in functional programming workflows relevant to data science. Participants began with the initial steps in the data science process: extracting data from source systems, transforming data into tidy form and loading data into distributed file systems, distributed data warehouses, and NoSQL databases. This workflow was illustrated by using the SparkR and sparklyr package frontends to Spark from R. SparkR and sparklyr were then used as interfaces for modeling big data using regression and classification supervised learning methods. Unsupervised learning methods, such as clustering and dimension reduction, were also covered. Additional methods, such as gradient boosting and deep learning, were illustrated using the h2o and rsparkling R packages. Finally, methods for analyzing streaming data were presented. The course finished with an in-depth example. The workshop attracted 16 participants from Manitoba, Ontario and Quebec.

Statistics Graduate Student Research Day, April 27, Toronto, ON

The 2018 Statistics Graduate Student Research Day focused on theoretical and methodological questions associated with spatial statistics, as well as novel applications of these methods. The 57 participants enjoyed a full day including 7 talks. Invited speakers were Murali Haran of Pennsylvania State University, Mikyoung Jun of Texas A&M University, Alex Schmidt of McGIll University, Joe Watson and Jim Zidek, both of the University of British Columbia.

Quantitative Risk Management & Financial Analytics, May 10, Ottawa, ON

Hosted by the Department of Mathematics and Statistics and the Telfer School of Management at the University of Ottawa, this workshop brought together researchers from academia and government working in the field as well as practitioners from the financial industry. Possibly the first meeting of this kind in Ottawa, the event attracted 54 participants, including 4 invited speakers:

  • Thomas Coleman of the University of Waterloo presented his recent work on option hedging using a novel approach based on machine learning techniques thus highlighting importance of big data techniques in finance.
  • Matt Davison of Western University discussed an application of data science to risk management. A financial institution grants loans based on some classification criteria. An important question is whether the customer should be issued a loan or not. Although the question is very simple, an answer can depend on a number of factors. Again, big data techniques come to the rescue.
  • Henry Lam of Columbia University is a leading expert in convex optimization. He discussed a novel approach to estimate tail probabilities in a nonparametric way, based on a partial information available to the risk manager. An answer to such questions is very important from a point of view of extreme risk measures.
  • Luis Seco of University of Toronto discussed the future of financial mathematics. At present, fund managers take very little risk while investing clients’ money which led to a number of issues in the past. In the new scheme, fund managers will need to invest their own money, along with the clients. This calls for new mathematical models for evaluation of contracts.

In addition, six contributed talks were delivered by researchers from academia and the Bank of Canada.

The workshop organizers, Rafal Kulik (Mathematics and Statistics) and Jonathan Yu-Meg Li (Telfer), expect to expand the collaboration between academia, the government and the financial industry in Ottawa. They are planning two half-day workshops in the next academic year, as well as multi-day workshops and summer schools for graduate students in the near future.

Risk Analytics Day 2018, May 14, Toronto, ON

This workshop brought top financial risk analytics researchers to Toronto to discuss

leading-edge developments in the statistical and computational tools needed to attack current and future problems facing financial institutions. There were 77 participants from UBC, McGill, Ryerson, Waterloo and Western as well as Scotiabank, TD Bank, Deloitte, Bank of Canada, and Sun Life Financial Asia. The 6 speakers were Matt Davison of Western University, Marius Hofert of the University of Waterloo, Johanna Nešlehová of McGill University, Hansheng Sun and Qiming Wang, both of Internal Ratings Management, Scotiabank, and Chen Zhou of Erasmus University and De Nederlandsche Bank.

Statistical Society of Canada Student Conference, June 2, Montreal, QC

Now at its sixth Edition, the Canadian Statistics Student Conference (CSSC) is a one-day event that allows students and recent graduates with an interest in statistics to gather together, network, learn about others’ research, develop new skills, and be inspired by mentors. During the CSSC, students have the chance to present their research in the format of a talk or a poster and to attend several sessions including a skills session (focussed on developing skills related to a job in statistics), a career session (typically a career panel with experienced statisticians from various backgrounds) and a statistical workshop. This year’s keynote speaker James Hanley from McGill University ended the day with a light-hearted speech containing many anecdotes on statisticians’ struggles in their communications with collaborators. Hanley also reviewed a few of his favorite statistical problems, classic errors and student misunderstandings. He also presented some funny applets that he uses regularly to help students understand different concepts.

139 students registered for the conference. There were 18 student talks and 12 students presented posters. The exit survey for this conference showed that the students really appreciated the event. 45% of the attendees rated the event with 5 out of 5, and 45% with 4 out of 5, for the question “How much did you enjoy the whole day?”.

SSC Annual Meeting, June 3-6, Montreal, QC

CANSSI is proud to sponsor this annual event. It brings together over 500 participants from the statistics and data science community. The event includes talks, workshops, meetings, a poster session, a job fair, an exhibitor hall and social events.

CANSSI sponsored a Postdoctoral Showcase on the first day. It featured 4 CANSSI Postdoctoral Fellows:

  • Chien-Lin Mark Su (McGill University), working with Russell Steele (McGill University) and Ian Shrier (Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital) talked about his project entitled Doubly Robust Estimation and Causal Inference for Recurrent Event Data.
  • David Soave (Ontario Institute for Cancer Research) working with Jerry Lawless (University of Waterloo) and Philip Awadalla (Ontario Institute for Cancer Research) discussed his project called Population Health Cohorts and Two-Phase Studies.
  • Oren Mangoubi (École polytechnique fédérale de Lausanne) working with Aaron Smith (University of Ottawa) and Nisheeth Vishnoi (École Polytechnique Fédérale de Lausanne) explained his project Mixing of Hamiltonian Monte Carlo Under Strong Log-concavity and High-dimensional Data.
  • Yun-Hee Choi (Western University) spoke on behalf of Agnieszka Krol (Lunenfeld-Tanenbaum Research Institute) who could not be there. Krol is working with Choi, Shelley Bull (Lunenfeld-Tanenbaum Research Institute), Razvan Romanescu (Lunenfeld-Tanenbaum Research Institute), Virginie Rondeau (INSERM U1219, Biostatistics team, University of Bordeaux), and Laurent Briollais (LunenfeldTanenbaum Research Institute) on a project called Correlated Frailty Model for Analysis of Genetic Association in Family Studies.

On the second day, CANSSI sponsored a panel session entitled CIHR and Statistical Science. Mary Thompson (University of Waterloo), Mireille Schnitzer (Université de Montréal), Yan Yuan (University of Alberta), and Mark Oremus (University of Waterloo) discussed CIHR-funded collaborative research and methods development. They showcased some recent successful applications. They also described how they became involved in the research, the aims of their projects, and some key plans and achievements. This session included invaluable advice to future applicants.

The third CANSSI-sponsored session was called Statistical Research and Applications in Data Science. Organized by Jean-François Plante (HEC Montréal), this session included three talks:

  • Lysiane Charest (Outerminds) gave a talk entitled Statistics Applied to Video Games Development.
  • Jiannan Lu (Microsoft), Alex Deng (Microsoft), and Jonathan Litz (Microsoft) spoke about Trustworthy Analysis of Online A/B Tests: Pitfalls, Challenges and Solutions.
  • Sylvie Makhzoum (TD Assurance) and Catherine Paradis-Therrien (TD Assurance) discussed Uplift Models and Data Governance at TD Insurance.

Digital Humanities Summer Institute, June 4-8, Victoria, BC

This year, CANSSI supported the workshop “Introduction to Machine Learning in the Digital Humanities” given by Paul Barrett and Nathan Taback. This workshop took an introductory approach to machine learning in digital humanities topics. Participants studied essential concepts in statistical and machine learning and used these concepts to collect and analyze literary, historical, and social media data sets. 18 students from all over North America took part. CANSSI offered 5 full tuition scholarships and 3 discounted registrations.

AARMS Summer School, June 4-29, Charlottetown, PE

The summer school was intended for graduate students and promising undergraduate students from all parts of the world. 40 students and 4 professionals were admitted and registered for at least two of the four courses. Participants hailed from coast to coast in Canada as well as China, Sweden, Ukraine and the USA. CANSSI sponsored the participation of these five students: Justin Kamerman of the University of New Brunswick, , Yuxuan Liu of Carleton University, Jacob Morehouse of the University of New Brunswick, Peng Tang of the University of Victoria, and Yingqi Wang of the University of Calgary.

The course offerings in 2018 were:

  • Functional Data Analysis for Big Data taught by Jiguo Cao of Simon Fraser University,
  • Machine Learning and Data Mining taught by Mark Schmidt of the University of British Columbia,
  • Foundations in Data Science and Applications taught by Osmar Zaiane of the University of Alberta,
  • Statistical Learning for High Dimensional Data taught by Wenqing He of Western University.

Causal Inference in the Presence of Dependence and Network Structure Thematic Month, June 11-July 6, Montréal, QC

This program was organized by Erica Moodie, David Stephens and Alexandra Schmidt, all of McGill University. Three workshops were held during this thematic month and included speakers and participants from a wide variety of career stages and geographical locations in North and South America and Europe.

  • Causal Adjustment in the Presence of Spatial Dependence, June 11-13
  • Causal Inference for Complex Graphical Structures, June 20-22
  • Discovery of Causal Structure in High Dimensions, June 25-27  

The goal of most, if not all, statistical inference is to uncover causal relationships, however it is not generally possible to infer causality from standard statistical procedures. In the last three decades, the field of causal inference research has grown at a rapid pace, and yet much of the literature is devoted to relatively simple settings. In this month-long program, we aimed to push the frontiers of causal inference beyond simple settings to situations where data are complex, with features such as network or spatial structure. We put on a series of three three-day workshops that addressed current and novel aspects of causal inference, which involves the uncovering of relationships between variables in an observationally-derived data collection setting. Throughout the month, high-profile and up and coming researchers presented and discussed new and challenging settings that have been studied in the conventional statistical literature, but not viewed through the lens of causal inference. The unifying theme of the program is that of complex dependence, with a particular focus on spatial, network, and graphical structures as well as high dimensionality.

The meetings also featured three Scholars in Residence:

  • Jim Zidek of the University of British Columbia is a leading expert in Bayesian decision analysis and spatial prediction. His current research includes the development of methods of designing networks for monitoring spatial pollution fields.
  • Thomas Richardson of the University of Washington is a leader in the field of causal inference and graphical models.
  • Nicolai Meinshausen’s (ETH Zurich) research interests include computational statistics, high dimensional data, and causal discovery.

The Scholars and several other speakers stayed beyond the workshop in which they were speaking, and several collaborations were formed. Several students from Montreal noted how the venue, the coffee breaks, and the smaller size of the workshops all provided for exciting opportunities for them to meet and mix with some of the leading researchers in the field.

Geospatial Methods for Closing Global Mortality Data Divide, June 14-15, Toronto, ON

The workshop brought together statistical methodologists, applied statisticians, and health science researchers working in the area of global health.  Talks covered methods for exposure modelling, estimating mortality trends, assessing the effects of risk factors, and collecting data in conflict settings.  As a result of the workshop, several new collaborations and research projects have been initiated:

  • A collaboration between the Fio Cruz foundation in Brazil, the Centre for Global Health Research in Toronto, and a PhD student at McGill has begun on quantifying the spatio-temporal pattern of road traffic accidents in Brazil. 
  • Jon Wakefield and Patrick Brown are developing new methodology for spatio-temporal modeling of health survey data, and the Million Deaths Study data housed in Toronto in particular. 
  • Alex Stringer, a first year PhD student in Statistics at Toronto, has taken as a thesis topic inferential methods suited to assessing environmental risk factors using data from low- and middle- income countries.
  • Methods presented by Howard Chang at Emory are being used by Laura Feldman at the Hospital for Sick Children to quantify the burden of disease from flu.

ICMP2018 – International Congress of Mathematical Physics, July 23-28, Montreal, QC

This was the 19th iteration of this conference. 572 participants from all over the world took part. The event included two free Public Lectures which were intended for a wider audience from the Montreal area. ICMP is the most important conference of the International Association of Mathematical Physics.  It provided an interdisciplinary platform for researchers, practitioners and educators to present and discuss the most recent innovations, trends, and concerns as well as practical challenges encountered and solutions adopted in the fields of Mathematical Physics.

R at Montreal 2018, July 4-6, Montreal, QC

Summer School on Mathematical and Statistical Model Uncertainty, July 23-27, Burnaby, BC

A joint CANSSI/SAMSI effort, this summer at Simon Fraser University (SFU). The organizers were Derek Bingham – SFU; Paul Constantine – University of Colorado, Boulder; David Higdon – Virginia Tech.; and Leanna House – Virginia Tech.  There were 35 attendees (undergraduate and graduate students) from Canada and the USA.

The emerging area of uncertainty quantification (UQ) for computer models is the focus of the current SAMSI program Model Uncertainty: Mathematical and Statistical (MUMS). The aim of the MUMS program is to bring together statistical and mathematical scientists to tackle important common research problems in UQ and to train the next generation of UQ researchers. To get this started, CANSSI and SAMSI jointly sponsored this Summer School on UQ.

The 5-day summer school taught the statistical and mathematical foundations for UQ in a wide variety of applications.  Topics included understanding the sources of variability (statistical and mathematical) when using computational models, sensitivity analysis and methods for combining simulation data with real-world observations to make inferences about physical systems. 

20th IMS New Researchers Conference, July 26-28, Burnaby, BC

The highlights of this conference included 6 plenary talks given by five senior professors and one IMS Tweedie Award winner, as well as panel discussions on funding and grant writing, mentoring and teaching, publishing, and collaborations. The conference also had 32 contributed short talks and a poster session. This conference promoted interaction and networking between new researchers and invited senior researchers in important statistical fields. It provide a unique opportunity for these new researchers to form collaborations with other new researchers and with more experienced colleagues. In total, 52 participants took part.

CIFAR Deep Learning and Reinforcement Learning Summer School, July 25-August 3, Toronto, ON

Since 2005, this summer school has been training graduate students from all over the world. Over 1200 applied, but only 200 were accepted. Students learned from top AI researchers like Yoshua Bengio (MILA), Geoff Hinton (UofT), and Richard Sutton (University of Alberta, Google Deepmind). This event was actually two summer schools back-to-back. The Deep Learning Summer School from July 25-July 31 covered such topics as machine learning, neural networks, generative models, optimization, language understanding, multimodal learning, computational neuroscience, and Bayesian neural nets. The Reinforcement Learning Summer school from August 1-3 covered an introduction to reinforcement learning (RL) and temporal-difference learning, policy search, off-policy learning, prediction machines, temporal abstraction, imitation learning, multi-tasking and transfer in RK, and safety in RL. In addition to the courses, students can take advantage of dinners and social events sponsored by AI companies to network and form collaborations.

Statistical Inference, Learning and Models for Data Science, September 24-28, Toronto, ON

This was a retrospective workshop for the thematic program “Statistical Models, Learning and Inference for Big Data”, held from January to June 2015. The new title reflected the shift in emphasis from the size of the data to the science of collecting, storing, querying, modeling, drawing conclusions, visualizing and communicating the insights derived.

The landscape is changing rapidly – there are now many new undergraduate and graduate programs in data science, there have been extensive developments in computing platforms for data carpentry, there is renewed emphasis on workflow for reproducible research, and there has been a substantial investment in deep learning through the Pan-Canadian artificial intelligence strategy. This workshop provided an opportunity to assess how these and other changes are impacting the research areas discussed during the thematic program. The event gathered close to 150 participants.

The workshop talks developed around two complementary strands; one on inference and data in particular contexts, and one on cross-cutting areas of mathematical, computational and statistical sciences.  The cross-cutting areas included deep learning and statistical modelling, visualization, optimization, and inference.  The substantive areas emphasized were social policy, health policy, networks, and environmental science.  

The fifth day of the workshop focused on data science in industry and took place at the MaRS Discovery District. The opening keynote by Mark Girolami described the new data-centric engineering program at the Alan Turing Institute. The highlight of this talk was a description of the first 3-D printed stainless steel pedestrian bridge. The closing keynote was given by Garth Gibson, CEO of the Vector Institute. The plenary talk by Hilary Parker on her career path from statistics student to data scientist at the design company StitchFix was especially inspiring for the students in the audience. Plenary talks from lead scientists at Framework Venture Partners, Georgian Partners, and Apple also provided interesting views on the vast range of data science opportunities in industry. Panel discussions on training talent for data science and challenges for data science startups rounded out the day.

40th Annual Alberta Statisticians Meeting, September 29, Edmonton, AB

The 40th  Annual Meeting of Alberta Statisticians featured talks from a number of subject areas, specifically experimental design, statistical genetics, and generalized linear models. Talks discussing more specialized research problems like sparse covariance estimation in regression, the recovery of low rank matrices using M-estimation, and the development of a quantile-optimal treatment regime were provided. Other talks discussed the application of statistical tools to study the effect of toxicants with various initial concentrations on cell populations and to investigate the relationship between indigenous suicide in Canada.

The annual meeting is an important tradition that offers practitioners in Alberta the opportunity to share their current research. One of its objectives was to offer undergraduate and graduate students from Alberta the opportunity to share their work and get feedback from experts in their respective fields. This was achieved through the lecture-style sessions provided by each invited speaker. Another objective was to provide practitioners in Alberta the opportunity to share their work and network at two different social events. Interestingly, this 40th Annual Meeting attracted 40 participants. Coincidence?

Postdoctoral Fellows

In 2018, Luc Villandre began his fellowship at HEC Montréal, Bouchra Nasri began her fellowship at McGill University, and Nanwei Wang began his fellowship at the Ontario Institute for Cancer Research. Myriam Brossard completed her fellowship at the Lunenfeld-Tanenbaum Research Institute in Toronto, under the supervision of Shelley Bull and Radu Craiu, and Kieran Campbell completed his fellowship at the BC Cancer Agency, under the supervision of Alexandre Bouchard-Côté and Sohrab Shah.

In 2017-2018 we partnered with CRM-Statlab to support Mohamed Belalia at the Université du Québec à Montréal; he has since taken up a tenure track position at the University of Windsor. The 2018-2019 CRM-Statlab award went to Mhalla-Marchand, who began her fellowship in September at HEC Montréal.

Our first joint postdoctoral fellow with the NSF-funded Institute SAMSI, Whitney Huang, moved to the Pacific Climate Impacts Consortium at the University of Victoria for the second year of his two-year position; the costs for this were shared with his supervisor, Adam Monahan, CANSSI, and SAMSI. In 2017-2018, Huang participated in SAMSI’s thematic program on Mathematical and Statistical Methods for Climate and the Earth System (CLIM).

Distinguished Visitor Program

In 2018, this program supported 4 Distinguished Visitor Lectures in Calgary, Edmonton, Montreal and Toronto.

Frank Harrell from the Vanderbilt University School of Medicine spoke at the University of Calgary on August 2 and 3. His public lecture was entitled “Reflections on Machine Learning vs. Statistical Models, and Precision Medicine”. In addition, he gave a 1.5-day-long workshop on Regression Modelling Strategies following the public lecture. The workshop included these broad topics: general aspects of fitting regression models, multivariate modeling, longitudinal models, logistic models and application of R software. During the reception, Harrell met with many students, researchers, faculty members and other attendees. About 100 people attended the lecture and workshop.

Grace Yi from the University of Waterloo visited the University of Alberta on September 6. She gave a public lecture called Making Sense of Noisy Data: Some Issues and Methods. She discussed the challenges presented by noisy data with measurement error, missing observations and high dimensionality. Since noisy data is common to so many fields, this talk attracted audience members from several disciplines as well as a few from industry. Yi also spent time having discussions with graduate students and postdoctoral fellows during her visit.

Jamie Robins from the Harvard T.H. Chan School of Public Health spoke at McGill University, September 20-21. His public lecture was entitled “Causal Inference and Machine Learning: Improved Inference under No Assumptions”. His talk was part of the McGill 2018 (Bio)Statistics Research Day which was organized by graduate students of the Department of Epidemiology, Biostatistics, and Occupational Health and the Department of Mathematics and Statistics. He presented a new method for improving confidence intervals for causal effects when the outcome regression and propensity score function have been fit using machine learning algorithms with completely unknown statistical properties. Higher order influence function based tests and estimators are the basis of the methodology. Robins’ talk attracted around 90 students and faculty members from McGill as well as other universities in Montreal. He participated in the whole research day, which consisted of student and postdoc presentations, two other keynote speakers, and a career panel. In the career panel, Robins shared experiences from his career path and academic life. He also shared his thoughts on the current job markets in biostatistics. Susan Murphy from Harvard University visited the Fields Institute. On October 29, she gave a general lecture called “Improving Health: Mobile Interventions”. Critical questions in the optimization of mobile health interventions include: “Does the user benefit from a particular
type of mobile health notification or text message?” and “Does the user’s current context such as location, time, mood impact the usefulness of the mobile health notification?” In this talk, she discussed the micro-randomized trial design and associated data analyses for use in optimizing mobile health interventions. The next day, a specialized lecture entitled “Challenges in Developing Learning Algorithms to Personalize Treatment in Real Time” covered her work in designing online “bandit” learning algorithms for use in personalizing mobile health interventions. These algorithms can help design a treatment plan that is adapted to the individual’s context; the context may include current health status, current level of social support and current level of adherence for example. This is particularly useful for those with chronic health conditions. Both lectures attracted a total of 116 participants.

Kick Start Program

This program supported two young researchers who sought to explore new collaborative research opportunities.

In late March and early April, Yan Yuan from the University of Alberta visited Shanghai to meet with epidemiology researchers, clinicians and students at the Shanghai Jiao Tong University and the Xin Hua Hospital. At the university, she began a collaboration to generate new knowledge that can be used to improve children’s health in China and globally. She also looked to form a collaboration the clinicians at the hospital to identifying critical needs and working together to benefit childhood cancer survivors. In addition, she visited Shanghai Fu Dan University where she began a third collaboration with the cancer epidemiology researchers.

Alex Gao attended the Data Study Group at the Alan Turing Institute in December. This week-long event brought together groups of researchers aiming to solve proposed real-world problems with a data-driven approach. Gao, a PhD student at the University of Toronto, used this opportunity to jump-start his research work in applied Bayesian modelling, and also connect with potential future collaborators in the data science community.

Student Support for SAMSI Workshops

CANSSI supported 6 students who attended Undergraduate Workshops at SAMSI. Three attended in May and three in October.

The focus for the SAMSI Undergraduate Modelling Workshop in May was Climate Change modelling – specifically how spatial statistics and applied mathematics models can be used improve our understanding of the dynamic changes in Earth’s climate. This year’s group topics included modelling Arctic Sea Ice, rainforest coverage, rainfall, and ocean temperatures and were largely lead by postdoc statisticians who have worked closely with some of the nation’s leading professors in climate change modelling. The students reported that this was an “amazing opportunity for math and statistics students to transform what they learned in class into practical solutions for some real-world problems.” They learned to apply multivariate analysis to temporal and spatial date and noted their appreciation of the mentors who helped them each day.

The SAMSI Undergraduate Workshop in October focused on precision medicine. Students reported that a highlight of the workshop was the career panel with grad students and postdocs because they found it very informative. Speakers included Eric Laber (NC State), Lisa Lavange (UNC), Peter Mucha (UNC) and Eric Rutter (NC State).

Undergraduate Datathons

VanSASH, September 22, Vancouver, BC

Organized by students, the ​​Vancouver Sports Analytics Symposium and Hackathon 2018 is an extremely successful event. This year they partnered with the Vancouver Whitecaps soccer team who provided the data. Judges and mentors from the Whitecaps, Best Buy, Cardinal Path, Keela, MLS, Boeing, Flipboard, the Canadian Sport Institute, EA Sports, Dialpad, the University of Washington and Simon Fraser University helped over 90 student participants.

There were 4 divisions; a Soccer Analytics stream and a Business Case stream, both split into Beginner and Advanced divisions. All divisions were expected to model data surrounding their problem area to gain insights into behavioural tendencies in the Whitecaps organization. The Soccer streams were given tagged event data from Major League Soccer games to investigate passing, shooting and corner kicks. The Business streams were given data about arena entrance scans and merchandise data to answer questions about conversion rates, fan behaviour within the arena, and creating a better in game atmosphere. Finally, all contestants in the beginner stream were required to attend a morning of workshops teaching them everything from intro coding skills in R to sports marketing tips, to plotting and graphics pointers. These workshops were a new addition to the event, based on last year’s feedback. Hearing success stories from participants who had limited coding experience showed that these workshops were much appreciated. Based on this year’s comments, the organizers plan to include a workshop on Python next year. Many students from the hackathon joined the SFU Sports Analytics Club and have already begun working on projects within the club. In addition, the mentors and judges formed some cross industry and academia connections. Finally, the event was able to improve in 3 of the most important areas that were identified after the first iteration: Venue and Food, Including Beginners, and Better Mentorship. Increased sponsorship this year allowed the organizers to not only avoid the negative feedback about food that they received last year but provide “the best food I’ve ever seen at a hackathon before”. Additionally, they were able to rent a better room for the hackathon which encouraged collaboration and communication in the open space.

AGM Highlights

The Annual Meeting of CANSSI takes place the weekend before the Annual Meeting of the Statistical Society of Canada. In 2018 the meeting was at McGill University.

This year’s AGM was the first meeting of the members of the corporation, as the Canadian Institute of Statistical Sciences, also known as l’institute canadien des sciences statistiques was incorporated on May 25.  The members of the corporation include all our institutional members, currently 32 Departments of Statistics or of Mathematics and Statistics across the country. At the meeting the members confirmed Bylaw No. 1, which has been filed with the government, fixed the number of Directors at 12, and elected the first Board of Directors of CANSSI, Inc.

The members at the AGM also received the financial statements, the Annual Report for 2017, and our new Strategic Plan 2018.  

Comments are closed.