Quality Assurance and Quality Control Estimates
for the Production Ageing of Northwest Atlantic Species

Data Presentation

For each species, a table is given summarizing the results of all tests which have been conducted. Within this table, the source of samples for each test is listed first, and is linked to more detailed results for each test. If the test samples were from a specific part of the species' range, the stock area is shown; if no stock area is listed, the test was for all management areas combined. Detailed results for each test include an agreement plot, an age-frequency table, and a summary of the test results for each production (or reference) age, in addition to the measures shown in the species table.

Visual inspection of the agreement plots and age-frequency tables will reveal the presence or absence of bias, though these are not quantitative. The Bowker's test of symmetry can be useful in quantifying bias when the sample size is large and variability is high.

Variability is measured via both percent agreement and the mean coefficient of variation (CV). These measures are inflated when a bias is present, and thus will not accurately reflect variability if there is a bias. Variability levels are related to various factors inherent in the samples, including the fish species, the age reader’s experience, and the structure used for age determination. Some species/structures are easier to age than others.

Statistical Measures

The following measures are used to characterize the results of tests of ageing consistency at the Fishery Biology Program at the Northeast Fisheries Science Center:

Coefficient of Variation (CV)

The mean coefficient of variation (CV, Campana et al. 1995, Chang 1982) is a relatively robust approach to quantifying agreement in fish ages. It yields results which are easier to compare between species and structures. Also, the contribution each fish makes to the CV is relative to the average age assigned to that fish; i.e., a 2-year error in ageing a young fish would increase the measure more than would a 2-year error in an older fish, as the percentage change in age is greater for younger ages.

The CV is based on the differences between the mean age and each given age for each fish, and then these values are averaged over the entire sample set. When two ages are assigned to each fish, the CV is calculated as follows:

CV equation

where Xij is the ith age for the jth fish, Xj is the mean age of the jth fish, and N is the sample size.

Campana (2001) indicates that many ageing laboratories around the world view CVs under 5% to be acceptable among species of moderate longevity and ageing complexity. His description applies to most of the species considered here.

Percent Agreement

The Fishery Biology Program has used this measure since the group’s inception, and considers levels of over 80% to be adequate. It is calculated based on the percentage of ages agreed upon relative to the total number aged:

PA equation

For this measure, an error in ageing a young fish changes the measure by the same amount as would a similar error for an old fish. Therefore, this statistic is harder to compare between samples sets with different age distributions or across species.

Bowker's Test of Symmetry

For both types of precision test, a Bowker’s test (Hoenig et al. 1995, Bowker 1948) was used to test for departures from symmetry within the age-frequency table. Such asymmetries indicate the presence of a bias, although the test has low sensitivity when few disagreements exist. Where ages differ from one another, the Bowker's test compares values on the age-frequency table which represent symmetric errors, such as the paired ages (3,4) and (4,3). If all such values are dissimilar, the test will return a significant P value.

This test statistic is calculated as a chi-square variable, as follows:

Bowker's equation

where m is the maximum age in the data set, and nij is the number of fish in the ith row and jth column (Hoenig et al. 1995, Bowker 1948). The value of the degrees of freedom is equal to the number of non-zero nij-nji comparisons in this calculation, to a maximum of m(m-1)/2.

Because this test compares only ages which differ from each other, it is less informative where few differences exist. Therefore, this test was not applied to cases where the percent agreement was 90% or above. Tests of symmetry also are not conducted for accuracy tests, as the error is assumed to be entirely within the test age and therefore would be visible in the agreement plot and the age-frequency table.

Agreement Plot

The agreement plot graphically shows all age pairs in each test, usually with the production (or reference) ages on the x-axis. Data are jittered so as to improve visibility of overlapping data points. Jittering was accomplished by adding a random number in the range (-0.1, 0.1) to each age within the test. Zero ages were jittered in the range (0.0, 0.1). While not all points may be visible, the exact counts of age pairs may be seen in the age-frequency table below. The diagonal line indicates 1:1 agreement; ideally, all age pairs should fall along this line. This format is similar to that used by Robillard et al. (2009).

A common assumption in statistical presentation is that the x-axis portrays 'better' data than the y-axis. This is a disadvantage of the age-bias plot, and why we have opted to use agreement plots rather than the more typical age-bias plot. We aim to portray paired ages as equally likely within most tests, with neither set of ages expected to be more reliable. While the agreement plot is not perfect, it should be less prone to misinterpretation than the age-bias plot.

The only tests in which one set of ages is expected to be more reliable are (a) accuracy tests, where the reference age has been reviewed & agreed upon by multiple age readers, and (b) training situations, where one person is being trained by a more experienced person and inter-reader precision tests are used to measure the trainee's progress.

Age-Frequency Table

The age-frequency matrix shows the numbers of samples at each age for both the production (or reference) age across the top, and the test age on the left. The grey boxes along the main diagonal of the matrix indicate the number of samples for which both ages are in agreement; fewer samples falling outside these boxes indicate better consistency. Numbers above this diagonal indicate fish which were given a lower age during the test, while numbers below this were given a higher test age; greater distance from the main diagonal indicates a greater difference between the two ages. Totals (at the right & bottom) indicate the age distribution within the test for both set of ages.

When a test compares ages between two readers, one reader's ages are listed across the top; the other is on the left. No assumption is made in these tests as to which reader is expected to be more accurate or precise, except when one reader is listed as a trainee.

Results Summary

This table shows a breakdown of the test results for each production (or reference) age. It gives the total number at each age, the number agreed upon during the test, the percentage of agreements at that age, and the average test age. The number of samples agreed upon is is the same as in the main diagonal of the age-frequency table. Again, for inter-reader precision tests, one person's age is chosen to be the basis for the other's results; aside from training exercises, this is not intended to indicate that either set of ages is expected to be more reliable.




Return to main QA/QC page
Return to the FBP Homepage

Link disclaimer | Email webmaster | Privacy policy |     File Modified Jan 24, 2017