Prostate-specific antigen testing accuracy in community practice

Background Most data on prostate-specific antigen (PSA) testing come from urologic cohorts comprised of volunteers for screening programs. We evaluated the diagnostic accuracy of PSA testing for detecting prostate cancer in community practice. Methods PSA testing results were compared with a reference standard of prostate biopsy. Subjects were 2,620 men 40 years and older undergoing (PSA) testing and biopsy from 1/1/95 through 12/31/98 in the Albuquerque, New Mexico metropolitan area. Diagnostic measures included the area under the receiver-operating characteristic curve, sensitivity, specificity, and likelihood ratios. Results Cancer was detected in 930 subjects (35%). The area under the ROC curve was 0.67 and the PSA cutpoint of 4 ng/ml had a sensitivity of 86% and a specificity of 33%. The likelihood ratio for a positive test (LR+) was 1.28 and 0.42 for a negative test (LR-). PSA testing was most sensitive (90%) but least specific (27%) in older men. Age-specific reference ranges improved specificity in older men (49%) but decreased sensitivity (70%), with an LR+ of 1.38. Lowering the PSA cutpoint to 2 ng/ml resulted in a sensitivity of 95%, a specificity of 20%, and an LR+ of 1.19. Conclusions PSA testing had fair discriminating power for detecting prostate cancer in community practice. The PSA cutpoint of 4 ng/ml was sensitive but relatively non-specific and associated likelihood ratios only moderately revised probabilities for cancer. Using age-specific reference ranges and a PSA cutpoint below 4 ng/ml improved test specificity and sensitivity, respectively, but did not improve the overall accuracy of PSA testing.


Background
Prostate cancer is the most frequently diagnosed visceral cancer in the United States and the second leading cause of cancer death in men [1]. Unfortunately, there are no proven primary prevention strategies for prostate cancer and no curative treatments for distant-stage cancers [2,3].
Consequently, cancer control efforts have focused on detecting early-stage prostate cancer with screening tests and then aggressively treating the cancer with surgery or radiation. The most effective screening test is the prostate-specific antigen (PSA) assay, which in combination with digital rectal examination (DRE) substantially enhances the cancer detection rate [4]. The American Cancer Society and the American Urologic Association recommend annual cancer screening with PSA testing and digital rectal examination for men with life expectancies greater than 10 years [5,6]. However, the United States Preventive Services Task Force and the American College of Physicians have not endorsed routine screening because there is no conclusive evidence that screening and treatment reduce morbidity and mortality from prostate cancer [7,8]. Another concern about prostate cancer screening is uncertainty about the diagnostic performance of PSA. The available data on PSA testing generally come from urologic case series comprised of volunteers responding to advertisements for screening [9][10][11]. However, PSA screening recommendations encompass the entire population of men at risk for prostate cancer and results from the urologic literature may not be fully generalizable. We have not found any large community-based studies evaluating the accuracy of PSA testing.
In this report we link PSA testing and prostate biopsy data from the Albuquerque, New Mexico metropolitan area with population-based cancer registry data collected by the New Mexico Tumor Registry (NMTR), a participant in the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program. The objective of our study was to evaluate the diagnostic accuracy of PSA testing for detecting prostate cancer in community practice.

Data collection
We collected computerized data from four major clinical laboratories in Albuquerque, New Mexico on PSA testing occurring from January 1, 1995 through December 31, 1997. These laboratories provided testing services for university, Veterans Affairs, Health Maintenance Organizations, and private plan patients within the four-county Albuquerque metropolitan area. Data included test date, PSA level, and patient demographics, including birth date, age at testing, and personal identifiers. Additionally, we used the GUESS program, a validated algorithm developed at the University of New Mexico, to identify ethnic background based on surname [12]. We evaluated only men age 40 years and older at the time of initial testing. The PSA testing data were matched with the NMTR database to exclude PSA tests ordered for cancer surveillance and to identify incident cases of prostate cancer diagnosed between January 1, 1995 and December 31,1998. The NMTR database provided information on cancer stage, us-ing the SEER categories of local, regional, and distant. Trained medical record abstractors from the NMTR also collected data from area laboratories on all benign prostate biopsies during the same time period. The human subjects committees of the participating hospitals and laboratories approved the study protocol.

Data analysis
We evaluated the diagnostic accuracy of PSA testing using subjects in the PSA-tested cohort who had a confirmed diagnosis of incident prostate cancer and using subjects who underwent at least one prostate biopsy and were not diagnosed with prostate cancer during the study period. For a subject to be included in this analysis, we required that a PSA result be obtained within 12 months before a cancer diagnosis or a negative biopsy result. If a subject had multiple negative biopsies, we analyzed the first biopsy that could be linked to a PSA test within the preceding 12 months. If subjects had multiple PSA tests within 12 months preceding a negative biopsy or cancer diagnosis, we analyzed the first PSA test. Clinical characteristics of cases and controls were compared with chi-square tests for categorical variables and either t-tests or the Mann-Whitney U test for continuous variables. Linear regression analyses were used to test for linear trends. Statistical tests were performed with the software program Statistica [13].
We constructed receiver operating characteristic (ROC) curves by plotting sensitivity against 1 -specificity. We estimated the discriminating power of PSA testing by determining the area under the ROC curve using the method of Hanley and McNeil [14]. ROC curves were constructed for the entire cohort, for 10-year age ranges, and for non-Hispanic whites and Hispanics. We also constructed an ROC curve using only cases with localized cancers, the target of PSA screening.
PSA accuracy was evaluated according to standard epidemiologic definitions for specificity, sensitivity, likelihood ratios, and predictive values [15]. Briefly, we defined sensitivity as the proportion of cancer cases with an elevated PSA; specificity is the proportion of non-cancer controls with a normal PSA. The positive predictive value of a test is the proportion of subjects with an abnormal test result who have the target disorder. The negative predictive value is the proportion of subjects with a normal test result who do not have the target disorder. A likelihood ratio compares the proportion of people with and without the target disorder within a stratum of diagnostic test results. Likelihood ratios provide a magnitude of probability revision using a version of Bayes' theorem: Post-test odds for the target disorder = Pre-test odds for the target disorder × Likelihood ratio for diagnostic test results The diagnostic accuracy of PSA testing was further evaluated by examining different PSA cutpoints, by stratifying analyses into five age ranges (40 to 49, 50 to 59, 60 to 69, 70 to 79, and ≥ 80 years), and by using age-specific PSA reference ranges [16]. We also looked at stratum-specific likelihood ratios and predictive values for the following PSA strata: < 2 ng/ml, ≥ 2 -4 ng/ml, > 4 -10 ng/ml, > 10 -20 ng/ml, and > 20 ng/ml. An Excel spreadsheet developed by Peirce and Cornell was used to compute likelihood ratios and 95% confidence intervals for different PSA cutpoints and test-result strata [17].

Subject characteristics
We obtained data on 41,261 men without a previous diagnosis of prostate cancer who underwent PSA testing at Albuquerque, New Mexico laboratories between January 1, 1995 and December 31, 1997. By the end of 1998, 2,620 (6.3%) of the testing cohort had undergone a prostate biopsy within 12 months following an initial PSA test and 930 (2.3%) of these men were diagnosed with prostate cancer. The median age at testing was 61 years (25 th percentile 52, 75 th percentile 69); 63.4% of the men were non-Hispanic white and 28.3% were Hispanic. The median PSA value for cancer patients (7.8 ng/ml, 25 th percentile 4.9, 75 th percentile 14.2) was significantly higher than the median value for patients without cancer (5.4 ng/ml, 25 th percentile 2.7 ng/ml, 75 th percentile 8.1), P < 0.0001. Cancer patients were also significantly older, with a median age of 68 years (25 th percentile 63, 75 th percentile 67) vs. 66 years (25 th percentile 60, 75 th percentile 71), P < 0.0001.

Diagnostic accuracy
The discriminating power of PSA testing for detecting prostate cancer, as estimated by the area under the ROC curve ( Figure. 1), was 0.67 (SE 0.02). When we analyzed the ROC curve just using the 796 cases with localized cancers, we found a similar area of 0.64 (SE 0.01). The discriminating power remained relatively constant across age ranges, with areas of 0.70, 0.68, 0.63, 0.65, and 0.69 for men in their 40s, 50s, 60s, 70s, and 80s, respectively. The area under the ROC curve was 0.66 for non-Hispanic whites compared to 0.69 for Hispanics.
Estimates for sensitivity, specificity, and predictive values for different PSA cutpoints, stratified by age range, are reported in Tables 1 and 2. Data are presented for men in their 50s and 60s in Table 1, and for men in their 70s and all age groups combined (including men in their 40s and men 80 years and older) in Table 2. For the standard PSA cutpoint of 4 ng/ml, test sensitivity was 86% and specificity was 33%. With this cutpoint, the likelihood ratio for a positive test was 1.28 (95% CI 1.23 to 1.34) and 0.42 (95% CI 0.36 to 0.50) for a negative test.
Raising the cutpoint to 10 ng/ml decreased the sensitivity to 38% while specificity increased to 84%. The associated likelihood ratio for a positive test was 2.38 (95% CI 2.08 to 2.72) and 0.74 (95% CI 0.70 to 0.78) for a negative test. Lowering the cutpoint to 2 ng/ml increased the sensitivity to 95% but dropped specificity to 20%. At this cutpoint the likelihood ratio for a positive test was 1.19 (95% CI 1.15 to 1.22) and 0.25 (95% CI 0.19 to 0.34) for a negative test. The predictive value for PSA was significantly correlated with PSA cutpoint level, P = 0.01 for linear trend, ranging from 39% for PSA levels ≥ 2 ng/ml to 78% for PSA levels ≥ 20 ng/ml. Stratum-specific likelihood ratios and predictive values are presented in Table 3. We found that the likelihood ratio for PSA levels between 4 and 10 ng/ml was statistically equivalent to 1, indicating that no significant probability revision occurred with testing. PSA values less than 2 ng/ ml or greater than 20 ng/ml produced the largest probability revisions for detecting prostate cancer. Table 4 shows the diagnostic accuracy for PSA levels ≥ 4 ng/ml stratified by age. The sensitivity of PSA significantly increased with age, going from 75% for men in their 40s to 90% in men 70 years and older, P = 0.03 for linear trend. However, specificity significantly decreased from 56% in the younger men to 27% in older men, P = 0.03

Figure 1
Receiver-operating characteristic curve for PSA testing in detecting prostate cancer. Numbers on curve represent PSA cutpoints. for linear trend. With age-specific references ranges ( Table  5) we found that, compared to the traditional cutpoint of 4 ng/ml, sensitivity was higher in the younger age ranges and specificity was higher in the older age ranges. Overall, however, the magnitudes of the likelihood ratios with agespecific reference ranges were similar to those found with the 4 ng/ml cutpoint, except for higher likelihood ratios following negative tests in men 70 years and older. The sensitivity and specificity of PSA, using either a cutpoint of 4 ng/ml or age-specific reference ranges, did not differ significantly between non-Hispanic white and Hispanic men (data not shown). Abbreviations: PSA = prostate-specific antigen. Sens = sensitivity. Spec = specificity. PPV = positive predictive value. NPV = negative predictive value

Discussion
We evaluated the diagnostic performance of PSA testing using a community-based analysis of men who underwent prostate biopsy within 12 months of PSA testing. Data were analyzed for 930 prostate cancer cases and 1690 controls ages 40 years and older. The area under the ROC curve was 0.67, indicating fair discriminating power for detecting prostate cancer. PSA testing performed equally well in detecting localized cancers and in detecting cancers across all age ranges and in non-Hispanic white and Hispanic men. The standard cutpoint of 4 ng/ ml had a sensitivity of 86% and a specificity of 33% and was most sensitive -but least specific -for older men. The 4 ng/ml cutpoint was associated with a likelihood ratio for a positive test of 1.28 and 0.42 for a negative test, representing only moderate probability revisions [18]. PSA values < 2 ng/ml or greater than 20 ng/ml were associated with large probability revisions. Likelihood ratios did not change substantially when we used age-specific reference ranges, though test sensitivity decreased with increasing age while specificity increased. Lowering the PSA cutpoint to 2 ng/ml raised the sensitivity to 97% but led to an 80% false positive rate.
Most previous reports from the urologic literature provided similar estimates for the discriminating power of the PSA test. Areas under the ROC curve have been reported to range from 0.65 to 0.77 in case series comprised of patients enrolled in screening trials [4,[19][20][21] or followed in urologic practice [22]. Among urologic studies, we found only Labrie and colleagues reporting a substantially higher area under the ROC curve: 0.88 (SE 0.03) [23]. However, biopsies were performed only when digital rectal or transrectal ultrasound examinations were abnormal, which would inflate the apparent sensitivity of an elevated PSA level. Gann and colleagues reported an area under the ROC curve of 0.83 in a nested case-control study of Physicians Health Study participants with 10 years of follow-up [24]. Stored serum from cases clinically diagnosed with prostate cancer and age-matched controls were assayed for PSA. However, the specificity of PSA was probably overes-timated because asymptomatic men were unlikely to be biopsied.
We identified only three population-based studies evaluating PSA testing performance [23,25,26]. The two urologic studies [23,25] randomly selected men from either electoral rolls or census records and invited them to have prostate examinations. However, neither study used PSA levels as a criterion for biopsy thus confounding the reported predictive values with results from digital rectal examinations and transrectal ultrasonography. Jacobsen and colleagues conducted a retrospective, case-control study analyzing 177 prostate cancer cases diagnosed in Olmsted County, Minnesota in the early 1990s [26]. PSA was highly discriminating with an area under the ROC curve of 0.94 (SE, 0.01) for all patients. Age-stratified analyses showed that the discriminating power remained high across all age groups, even for men in their 70s. Test sensitivity was approximately 85% for all age groups, though specificity decreased from 98% among men in their 50s to 81% among men in their 70s.
Methodologic differences in study design may explain the disparities in the results between the New Mexico and Minnesota cohorts. Controls in Olmsted County were drawn from a longitudinal Mayo Clinic study on the natural history of lower urinary tract symptoms. Men with initial PSA elevations > 4 ng/ml or an abnormal DRE were biopsied and cancer cases were excluded. However, men with normal PSA and DRE results did not undergo biopsy, thus potentially inflating estimates for specificity. Sensitivity may have been higher if urologists at the Mayo Clinic had a lower false negative biopsy rate than did New Mexico urologists.
Our estimates for the sensitivity (86%), specificity (33%), and positive predictive value (41%) for PSA levels ≥ 4 ng/ ml were similar to previously reported values. In the urologic literature, sensitivities ranged from 67% to 90%, specificities ranged from 28% to 59%, and positive predictive values ranged from 30% to 43% [9-11,19,20,27,28]. However, almost all of the published studies, including our own, are flawed by potential work-up bias because men with elevated PSA levels were significantly more likely to be biopsied. In our cohort, men with a PSA level ≥ 4 ng/ml had a 15-fold increased rate of biopsy compared to men with normal values.
Accurately estimating the true and false negative rates for PSA requires that men with normal PSA values undergo biopsy, but we found only one small urologic series where all PSA-tested subjects were subsequently biopsied. Vallencian and colleagues biopsied 100 consecutive men with normal or non-suspicious digital rectal examinations and detected only 14 cancers, none with PSA levels below 10 ng/ml [29]. The Gann study provided the least biased estimate of sensitivity and specificity, but even these results were limited because asymptomatic cancers would not have been detected [24]. Additionally, serum was stored for about 10 years and PSA is not completely stable [30,31].
Modifications of the PSA level have been proposed to improve the discriminating power of the test. Oesterling and colleagues developed age-specific PSA reference ranges that lowered the cutpoint in younger men, to increase sensitivity, and raised the cutpoint in older men in order to increase specificity [16]. We found that using age-specific reference ranges did not substantially change likelihood ratios for prostate cancer, though we confirmed that sensitivity would increase in younger men and specificity would increase in older men.
The age-specific reference ranges have been further modified for racial differences in PSA and cancer risk [32]. Because African-Americans have an increased incidence of prostate cancer and higher PSA levels at diagnosis, the agespecific reference ranges have been adjusted to maintain a high sensitivity [32][33][34]. Our study cohort had too few African-Americans for a subgroup analysis, but we were able to compare non-Hispanic white with Hispanic men. We found that PSA testing discriminated equally well for Hispanics and non-Hispanic whites and that PSA cutpoints do not need to be adjusted for Hispanics. We are unaware of any other studies comparing the performance of PSA testing between non-Hispanic white and Hispanic men, though Abdalla and colleagues have reported similar PSA levels in non-Hispanic white and Hispanic men with and without prostate cancer [35,36].
Some investigators are now recommending that lower PSA cutpoints should be used as an indication for prostate biopsy [37][38][39]. Catalona and colleagues detected cancer in 22% of men biopsied with PSA levels between 2.6 to 4 ng/ml [37] and Lodding and colleagues detected cancers Abbreviations: PSA = prostate-specific antigen. LR+ = likelihood ratio for a positive test. LR-= likelihood ratio for a negative test. CI = confidence interval. in 13% of men with PSA values between 3 to 4 ng/ml [39]. We found that lowering the cutpoint to 2 ng/ml, while greatly increasing sensitivity, led to an 80% false positive rate.
Aside from work-up bias, there were some other important limitations in our study. We do not know the indications for testing or results from digital rectal examinations. The positive predictive value of 41% that we found for a PSA cutpoint of 4 ng/ml was at the high end of values reported in screening studies and cancer was detected in 24% of men in our cohort with normal PSA levels. These findings suggest that our estimates for sensitivity and specificity may be less applicable to a true screening population. However, we believe that our results more accurately reflect community testing practices than the data reported by urologic series of volunteer subjects. Finally, our study cohort was largely comprised of non-Hispanic white and Hispanic men. Data suggest that the PSA assay may perform differently in African-Americans and our results may not be generalizable to other populations [32].

Conclusions
Our community-based study showed that PSA testing had fair discriminating power for detecting prostate cancer with an area under the ROC curve of 0.67. PSA testing had similar discriminating power for detecting localized cancers, and it performed equally well across age ranges and in different ethnic groups. The PSA cutpoint of 4 ng/ml was sensitive but relatively non-specific and likelihood ratios for this cutpoint demonstrated only moderate probability revisions. Although age-specific reference ranges improved sensitivity in younger men and specificity in older men, they did not substantially change likelihood ratios for cancer. Lowering the PSA cutpoint below 4 ng/ ml increased test sensitivity but markedly decreased specificity.