Surveys from large datasets of functional data and estimation of the mean and median curve with full and missing data

October 23, 2015, 1:30pm
Location: Statistics Canada, Ottawa
Room: Jean Talon meeting room
Speakers: Camelia Goga and Hervé Cardot, Université de Bourgogne

Abstract:  In the near future, millions of electricity load curves of French households measured at a very fine scale will be available. All these collected load curves represent a huge amount of information difficult to store due to technical and budgetary constraints. In these situations, survey sampling techniques are attractive alternatives to signal compression techniques since they can offer an interesting trade-off between size of the data and accuracy of estimators of simple indicators such as the mean or the median curves of the electricity consumption. I will present a panorama of different strategies considered to estimate the mean or the median with application on a population test of French electricity consumption curves. Unfortunately, data collection may undergo technical problems resulting in missing values. This problem reduces the accuracy of the estimators and may generate bias. Different approaches can be adapted to deal with missing data in this functional framework: nearest neighbor imputation, kernel smoothing of the discretized trajectories or linear interpolation to the differences around the mean. I will give a comparison of these methods on the estimation of the mean curve of French electricity consumption curves with different scenario of missing data.

This talk is sponsored by the CANSSI CRT Project “Statistical Inference for Complex Surveys with Missing Observations”.

Back to Seminar Series Schedule.

Comments are closed.