Large-scale external validation and comparison of prognostic models: an application to chronic obstructive pulmonary disease
Guerra B (1), Haile SR (1), Lamprecht B (2,3), Ramírez AS (4), Martinez-Camblor P (5), Kaiser B (6), Alfageme I (7), Almagro P (8), Casanova C (9), Esteban-González C (10), Soler-Cataluña JJ (11), de-Torres JP (12), Miravitlles M (13), Celli BR (14), Marin JM (15), Ter Riet G (16), Sobradillo P (17), Lange P (18), Garcia-Aymerich J (19), Antó JM (20), Turner AM (21), Han MK (22), Langhammer A (23), Leivseth L (24), Bakke P (25), Johannessen A (26), Oga T (27), Cosio B (28), Ancochea-Bermúdez J (29), Echazarreta A (30), Roche N (31), Burgel PR (32), Sin DD (33), Soriano JB (34,35), Puhan MA (36,37); 3CIA collaboration.
External validations and comparisons of prognostic models or scores are a prerequisite for their use in routine clinical care but are lacking in most medical fields including chronic obstructive pulmonary disease (COPD). Our aim was to externally validate and concurrently compare prognostic scores for 3-year all-cause mortality in mostly multimorbid patients with COPD.
We relied on 24 cohort studies of the COPD Cohorts Collaborative International Assessment consortium, corresponding to primary, secondary, and tertiary care in Europe, the Americas, and Japan. These studies include globally 15,762 patients with COPD (1871 deaths and 42,203 person years of follow-up).
We used network meta-analysis adapted to multiple score comparison (MSC), following a frequentist two-stage approach; thus, we were able to compare all scores in a single analytical framework accounting for correlations among scores within cohorts. We assessed transitivity, heterogeneity, and inconsistency and provided a performance ranking of the prognostic scores.
Depending on data availability, between two and nine prognostic scores could be calculated for each cohort. The BODE score (body mass index, airflow obstruction, dyspnea, and exercise capacity) had a median area under the curve (AUC) of 0.679 [1st quartile-3rd quartile = 0.655-0.733] across cohorts.
The ADO score (age, dyspnea, and airflow obstruction) showed the best performance for predicting mortality (difference AUCADO - AUCBODE = 0.015 [95% confidence interval (CI) = -0.002 to 0.032]; p = 0.08) followed by the updated BODE (AUCBODE updated - AUCBODE = 0.008 [95% CI = -0.005 to +0.022]; p = 0.23). The assumption of transitivity was not violated. Heterogeneity across direct comparisons was small, and we did not identify any local or global inconsistency.
Our analyses showed best discriminatory performance for the ADO and updated BODE scores in patients with COPD. A limitation to be addressed in future studies is the extension of MSC network meta-analysis to measures of calibration. MSC network meta-analysis can be applied to prognostic scores in any medical field to identify the best scores, possibly paving the way for stratified medicine, public health, and research.