CONTENTS Introduction Methods Results and Discussion References
Northeast Fisheries Science Center Reference Document 07-01
Sandra J. Sutherland, Nina L. Shepherd, Sarah E. Pregracke, and John M. Burnett
Accuracy and Precision Exercises Associated with 2006 TRAC Production Aging
National Marine Fisheries Serv., Woods Hole Lab., 166 Water St., Woods Hole MA 02543-1026
Web version posted January 23, 2007Citation: Sutherland SJ, Shepherd NL, Pregracke SE, Burnett JM. 2007. Accuracy and precision exercises associated with 2006 TRAC production aging. US Dep Commer, Northeast Fish Sci Cent Ref Doc 07-01; 20 p.
Information Quality Act Compliance: In accordance with section 515 of Public Law 106-554, the Northeast Fisheries Science Center completed both technical and policy reviews for this report. These predissemination reviews are on file at the NEFSC Editorial Office.
In production aging programs, age reader accuracy can be thought of as how often the “right” age is obtained, and precision as how often the “same” age is obtained (Campana 2001). It is possible that, over time, an age reader may inadvertently change the criteria that are used for determining ages, thereby introducing a bias into the age data. This bias can be measured with accuracy tests, which consist of the age reader blindly examining known- or consensus-aged fish from established reference collections. An age reader may also make periodic mistakes, which introduces random errors into the data. The degree of this error can be measured with precision tests, which consist of the age reader blindly re-aging fish which they have already aged. Both accuracy and precision must be considered within a quality-control monitoring program.
Acceptable levels of aging accuracy and precision are influenced by factors such as species, age structure, and age reader experience. Although percent agreement is strongly affected by these differences, the staff of the Fishery Biology Program at the Northeast Fisheries Science Center (NEFSC) have long considered levels above 80% to be acceptable. The total coefficient of variation (CV) is less affected by these differences and, thus, is a better measure of aging error. In many aging labs around the world, total CVs of under 5% are considered acceptable among species of moderate longevity and aging complexity (Campana 2001), such as the species considered here.
For over 35 years, scientists at the NEFSC Fishery Biology Program have regularly conducted production aging, determining the ages for large numbers of samples over a short period of time using established methods (Penttila and Dery 1988), for the species assessed by the Transboundary Resources Assessment Committee (TRAC). Historically, our approach to age-data quality control and assurance has been a two-reader system. In this approach, there are both a primary and a secondary age reader for each species. The primary age reader conducts all production aging, and the secondary age reader then ages a portion of those same samples using similar methods. The ages determined by the two readers are compared, and if they agree sufficiently (above 80% agreement), the production ages are considered valid. If not, the sources of disagreement must first be resolved. This interreader approach is still used in the course of training new readers in order to ensure consistency in application of aging criteria and in inter-laboratory sample exchanges. Budgetary and staffing constraints have made this approach less feasible, however, by reducing the number of species for which there are two competent age readers at this laboratory.
In the past few years, the NEFSC Fishery Biology Program has updated our approach to quality control and assurance. Intrareader tests of aging accuracy and precision, as described above, allow us to quantify the amount of inherent aging error and bias in the ages determined by each of our staff members. These values provide a measure of the reliability of the production age data used in stock assessments, and they may be directly incorporated into population models as a source of variability.
In conjunction with implementation of these tests, we have begun to establish reference collections of age samples for each species. These collections are necessary to evaluate aging accuracy. Fish of known age are difficult to obtain, so we have focused on assembling collections from age samples which have been included in aging exchanges with other laboratories. From those samples, we have selected those fish for which multiple experienced age readers agree on the age (see Silva et al. 2004 for more details).
As in past years, exercises were undertaken to estimate the accuracy and/or precision of U.S. production aging for the 2006 TRAC assessments (Legault et al. 2006; Gavaris et al. 2006; Van Eeckhaute and Brodziak [in press]) of Georges Bank stocks of cod (Gadus morhua), haddock (Melanogrammus aeglefinus), and yellowtail flounder (Limanda ferruginea). This report lists the results of those exercises.
In all cases, the primary age reader for each species conducted the production aging and completed all accuracy and precision exercises. Subsamples were randomly selected to be re-aged in order to test age-reader accuracy (versus the reference collections) or precision (versus samples previously aged by that reader). When re-aging fish, the age reader had knowledge of the same data as during production aging (i.e. fish length, date captured, and area captured) but no knowledge of previous age estimates. During age-testing exercises, no attempts were made to improve results with repeated readings. There was also no attempt to revise the production ages in cases where differences occurred.
Results are presented in terms of percentage agreement, total coefficient of variation (CV), age-bias plots, and age-frequency tables (Campana et al. 1995; Campana 2001). In the precision exercises, a Bowker’s test (Bowker 1948; Hoenig et al. 1995) was also used to test for deviations from symmetry in any case where the percent agreement fell below 90%. This test can be used to objectively detect a strong bias when comparing two sets of ages.
Age-reader accuracy was determined for both cod and haddock, from a random subsample drawn from the corresponding NEFSC otolith reference collection. For cod, this exercise was done after the completion of production aging. For haddock, exercises were completed both before and after production aging. Accuracy for yellowtail flounder aging was not assessed at present, because the reference collection for that species is not yet complete.
For all three species, age-reader precision was estimated from blind second readings of subsamples from each NEFSC survey (autumn 2005 and spring 2006). Similar precision tests were conducted for samples from the 2005 NEFSC commercial port samples, but the haddock samples were further broken down by commercial quarter.
RESULTS AND DISCUSSION
The total sample sizes associated with the accuracy and precision exercises were N = 225, 483, and 183 for cod, haddock, and yellowtail flounder, respectively. Results for cod are presented in Figures 1, 2, 3, and 4, haddock in Figures 5, 6, 7, 8, 9, 10, 11, and 12, and yellowtail flounder in Figures 13, 14, and 15. Results of the three accuracy tests are summarized in Table 1, while all precision exercise results are shown in Table 2. The Bowker’s test was run for three of the haddock precision exercises and two of the exercises for yellowtail flounder; in no case did this test reveal a significant deviation from symmetry (Table 2).
For cod, the accuracy estimate was high (87% agreement), and the total CV (3.9%) was low. There was a mild tendency toward overaging (Figure 1). This accuracy has dropped slightly from last year (91% agreement and 1.5% CV, Sutherland et al. 2006), when another age reader conducted the production aging. Cod precision levels were high, ranging from 94 to 98% agreement and from 0.2 to 1.2% CV (Figures 2, 3, and 4). No bias was apparent in these exercises. Both the high accuracy and precision levels indicate that the cod age reader has maintained a reliable level of aging capability.
For haddock, both accuracy estimates were high (96 and 92% agreement, total CVs of 1.0 and 1.1%, Figures 5 and 6), indicating that the application of aging criteria has not changed in the past year. Precision levels were between 85–97% agreement and 0.6–2.2% CV (Figures 7, 8, 9, 10, 11, and 12), indicating that age determinations were consistent. No bias was apparent in any of these exercises. Although this year’s results are lower than those in 2005 (median of 95% agreement and 0.7% CV, Sutherland et al. 2006), these precision levels are well within accepted limits. The high accuracy estimates and consistently high precision results indicate that the haddock age reader is continuing to provide reliable ages.
Precision levels for yellowtail flounder were between 82–90% agreement and 1.6–5.1% CV (Figures 13, 14, and 15). In no case was the difference between the production and test ages greater than one year. There may have been a weak bias toward underaging during the precision exercise on autumn survey samples, but this was not found to be significant (P < 0.05, Bowker’s test). Overall, these precision levels are higher than they were last year, when the current age reader was still in training (73% agreement and 6.1% CV for U.S. samples, Sutherland et al. 2006). These high precision levels, combined with an increase since last year , indicate that the new age reader has attained a reliable level of aging capability.
Among these three species, U.S. precision measures did not fall below acceptable in-house precision or accuracy levels in the past year’s production aging. In most cases, these levels were exceeded. Therefore, U.S. age determinations are considered to be reliable during recent production aging.
Bowker AH. 1948. A test for symmetry in contingency tables. J Am Statistical Assoc. 43:572–574.
Campana SE. 2001. Accuracy, precision, and quality control in age determination, including a review of the use and abuse of age validation methods. J Fish Biol. 59:197-242.
Campana SE, Annand MC, McMillan JI. 1995. Graphical and statistical methods for determining the consistency of age determinations. Trans Am Fish Soc. 124:131-138.
Gavaris S, O'Brien L, Hatt B, and Clark K. 2006. Assessment of eastern Georges Bank cod for 2006. TRAC Ref Doc. 2006/05; 48 p. Available at http://www.mar.dfo-mpo.gc.ca/science/trac/trac.html.
Hoenig JM, Morgan MJ, Brown CA. 1995. Analysing differences between two age determination methods by tests of symmetry. Can J Fish Aquat Sci. 52:364–368.
Legault CM, Stone HH, and Clark KJ. 2006. Stock assessment of Georges Bank yellowtail flounder for 2006. TRAC Ref Doc. 2006/01; 66 p. Available at http://www.mar.dfo-mpo.gc.ca/science/trac/trac.html.
Penttila J, Dery LM. 1988. Age determination methods for northwest Atlantic species. NOAA Tech Rep NMFS 72; 135 p. Available at http://www.nefsc.noaa.gov/fbi/age-man.html
Silva V, Munroe N, Pregracke SE, Burnett J. 2004. Age structure reference collections: the importance of being earnest. In: Johnson DL, Finneran TW, Phelan BA, Deshpande AD, Noonan CL, Fromm S, Dowds DM, compilers. Current fisheries research and future ecosystems science in the Northeast Center: collected abstracts of Northeast Fisheries Science Center's Eighth Science Symposium, Atlantic City, New Jersey, February 3-5, 2004. Northeast Fish Sci Cent Ref Doc. 04-01; p. 60.
Sutherland SJ, Munroe N, Silva V, Pregracke S, Burnett J. 2006. Accuracy and precision exercises associated with 2005 TRAC production aging. Northeast Fish Sci Cent Ref Doc. 06-27; 17 p.
Van Eeckhaute L and Brodziak J. (in press). Assessment of haddock on eastern Georges Bank. TRAC Ref Doc. 2006/06. Available at http://www.mar.dfo-mpo.gc.ca/science/trac/trac.html.