The Price Indices team within the Time Series Research and Analysis Centre (TSRAC) is looking for a graduate student to work on research and development for price indexes. The Consumer Price Index (CPI) is currently exploring numerous opportunities related to big data, more specifically scanner data. Scanner data are electronic records of transactions collected by reading the product code at store check-outs. These data include a product description, however they are generally vague, incomplete and vary in time. In order to use the scanner data for statistical purposes, they have to be classified by product type, according to established standards, which is a challenge given the large volume of records.
The student will have to innovate to help the classification process of the scanner data into the current CPI structure. In particular, the student will have to test different machine learning techniques in order to classify the scanner data, evaluate the results of these techniques, identify one (or many) technique (if any) leading to results of a sufficient quality in order to be used in regular production. This project offers a unique opportunity to work on an innovative subject and of high visibility. The contribution of the student on this project will be essential to the progress of the project. The student working on this project will be responsible for researching and applying machine learning techniques, so it is important to master the appropriate knowledge in this area. The student will be expected to contribute to documentation explaining the techniques used and the results obtained. The work will be done within a multidisciplinary team with statisticians, economists and programmers.
If time permits, the student could be asked to explore different formulas to compute indexes with scanner data. In doing so, the student will be able to familiarize himself with the price index theory and recent developments in the field.
- Advanced skills with programming languages to do machine learning, such as R, Python, or JAVA as well as other statistical programming languages such as SAS;
- Knowledge of machine learning techniques.
4 months, with possibility of renewal. The term will begin in early 2017, according to student’s availability.
Interested candidates should send a current CV including contact information one of a reference person to Catherine Deshaies-Moreault (firstname.lastname@example.org), Senior Methodologist at Statistics Canada, by January 31, 2017.