Calibrating Reviews of NSERC Discovery Grant Applications

> français

This is a reprint of an article by Duncan Murdoch in SSC’s Liaison Volume 29 No. 4 in November of 2015.

Last year around this time (Liaison 28.4 pp 41-43) I wrote an article with advice on how to write external reviews of NSERC Discovery Grant applications.
As I said last year,
If you receive one of these requests, you will be asked to confirm that you have no conflict of interest to prevent you from completing the review. If that’s the case, and you believe that you have sufficient expertise to carry it out, I hope that you agree to do the review. There are only about 10 members of the Evaluation Group (EG) with expertise in statistics, probability and related fields, with 5 assigned to each application, and they need your help to give proper depth to their review.

In that article I gave general advice on how to produce a useful review; this year I will be more specific on how to “calibrate” your review so that the committee can score it properly. There’s anecdotal evidence that statisticians are too hard on each other, but think we understand how to calibrate measurement systems: so that’s what this article is trying to do.

Recall that the EG will score the application in each of three categories: excellence of the researcher (EoR, based on the past record), merit of the proposal (MoP, a prediction of future outcomes), and training of highly qualified personnel (HQP, both the past record and future plans).

Each of these categories is scored on a six-point scale. NSERC uses descriptive labels for the scores rather than numbers, but treats them exactly as if they were evenly spaced numeric scores: 1=Insufficient, 2=Moderate, 3=Strong, 4=Very Strong, 5=Outstanding, 6=Exceptional. The numeric values are summed to give a score from 3 to 18; these are described as “bins”. The grant value is determined
by the bin. The high score is 18 (bin A); this year someone in bin A would have received a grant of $52,000. The most frequent scores in the Mathematics and Statistics EG were 9 or 10 (bins J and I); they received grants of $11,000 and $14,000 respectively. Established researchers with scores below 9 received no funding. (It’s also possible to be denied funding by a score of 1 in any category, or 2 in EoR.)

Although the bin value determines the funding, it’s important to think about the individual scores as you write your review, as that is how the discussion goes when evaluating each grant. The EG members work hard to find appropriate scores in each category, without knowing (except in broad terms) what the resulting funding will be, because the funding is
set after all the evaluations are done. So how are you as an external referee able to help them find these scores?

Last year I described the criteria used to determine the three scores. This year I’d like to describe the distribution of scores to help you to calibrate your own review, based on newly released data from NSERC.

In the past two years, there have been 501 applicants evaluated by the Mathematics and Statistics Evaluation Group. (Some SSC members apply to other evaluation groups; I haven’t seen corresponding data from them.) Table 1 shows the distribution of scores in aggregate.

Before discussing the calibration, a note on the goal.
The applications are assigned to one of three subsections: Statistics (n=208), Pure Math (n=181), and Applied Math (n=112). There is variation in scoring between the subsections and between years; I haven’t attempted any analysis of the sources of variation, but in the interest of fairness, it seems desirable to reduce it. I hope that by reporting the distribution of scores it will help us as external reviewers to calibrate our reviews: we can’t tell the EG how to score an application, but we can use descriptive terms that fit the category where we think the application belongs.

Now, on to the discussion.
The EoR score is based on the record and impact of the
researcher in the last six years. Does the applicant you are reviewing fall within the top 2% of researchers in their area? If so, a rating of “Exceptional”is warranted. This judgment shouldn’t just be based on their reputation; you should carefully describe your evaluation of the past contributions, including the sample contributions. Looking at NSERC’s Merit Indicators grid (www.nserc-crsng.gc.ca/_doc/Professors-Professeurs/DG_Merit_Indicators_eng.pdf) we see that what
distinguishes “Exceptional” from “Outstanding” is that the researcher is a leader in the community making contributions at the highest level. Use words like those in your review, and the EG will find it easier to correctly place the application. Similarly, applicants in the top 15% in terms of record and impact should be rated “Outstanding”. Using the words “far superior” and “impact to a broad community” will communicate this rating to the EG.

The other categories tend to have fewer high scores. Less than 1% of researchers put together a proposal that was rated “Exceptional” in MoP. Recall that this category predicts
future outcomes: is the described approach near the very top in terms of innovation and likely impact? NSERC sees just over 100 applications in statistics each year, so think about whether this is such a good proposal that the whole community might only produce one like it in a year; that’s “Exceptional”. NSERC might see 7 or 8 proposals each year from statistics that deserve a rating of “Outstanding” in MoP.

Just less than 1% of ratings in HQP were “Exceptional” in the last two years. Thus this rating would indicate a statistician whose record and plans for their students are probably the best in the country in the year they apply. Around 7 statisticians in Canada should get an “Outstanding” HQP rating each year.

In all three categories, both the median and mode of the ratings is “Strong”. Use words to signal that rating if you think it is appropriate. Very few applications should end up as “Insufficient” in any category, but each year that’s an appropriate rating in a few cases. I hope that this article will help reviewers in deciding how to word their reviews so that the EG can score the applicants at the level they deserve.

Duncan Murdoch

Comments are closed.