On Friday, April 29, 2022, at 2:00 p.m. EDT (Ottawa time), Jean François Beaumont of Statistics Canada will present a Zoom webinar as part of a series organized by the CANSSI-funded Collaborative Research Team on Modern Techniques for Survey Sampling and Complex Data. We invite you to join us:

**This event is past** | Watch the recording

Resources from the webinar:

## Event details

**Handling Non-probability Samples through Inverse Probability Weighting with an Application to Statistics Canada’s Crowdsourcing Data**

Jean François Beaumont, Statistics Canada

Friday, April 29, 2022 | 2:00 p.m. EDT (Ottawa time)

**Abstract:** Non-probability samples are being increasingly explored in National Statistical Offices as an alternative to probability samples. However, it is well known that the use of a non-probability sample alone may produce estimates with significant bias due to the unknown nature of the underlying selection mechanism. To reduce this bias, data from a non-probability sample can be integrated with data from a probability sample that contains auxiliary variables in common with the non-probability sample. We focus on inverse probability weighting methods, which involve modelling the probability of participation in the non-probability sample. First, we consider the logistic model along with the pseudo maximum likelihood method of Chen, Li and Wu (2020). We propose a variable selection procedure based on a modified Akaike Information Criterion (AIC) that properly accounts for the data structure and the probability sampling design. Then, we extend the Classification and Regression Trees (CART) algorithm to this data integration scenario, while again properly accounting for the probability sampling design. A bootstrap variance estimator is proposed that reflects two sources of variability: the probability sampling design and the participation model. Our methods are illustrated using Statistics Canada’s crowdsourcing and survey data.