Surgical risk scores are widely used to identify patients at high surgical risk who may benefit from transcatheter aortic valve implantation (TAVI). A multiparametric TAVI mortality risk score based on a French registry (FRANCE-2) has recently been developed. The aim of our study was to compare the 30-day mortality prediction performance of the FRANCE-2, EuroSCORE II and STS scores.
MethodsWe retrospectively studied 240 patients from a single-center prospective registry who underwent TAVI between January 2008 and December 2015. All scores were assessed for calibration and discrimination using calibration-in-the-large and ROC curve analysis, respectively.
ResultsThe observed mortality was 5.8% (n=14). The median EuroSCORE II, STS and FRANCE-2 scores were 5.0 (IQR 3.2-8.3), 5.1 (IQR 3.6-7.1) and 2.0 (IQR 1.0-3.0), respectively. Discriminative power was greater for EuroSCORE II (C-statistic 0.67) and STS (C-statistic 0.67) than for FRANCE-2 (C-statistic 0.53), but this was not statistically significant (p=0.26). All scores showed adequate calibration.
ConclusionsAll scores showed modest performance in early mortality prediction after TAVI. Despite being derived from a TAVI population, FRANCE-2 was no better than surgical risk scores in our population.
Os scores de risco cirúrgico têm sido amplamente usados para identificar doentes com alto risco cirúrgico que podem beneficiar da implantação de válvula aórtica por via percutânea (TAVI). Foi recentemente apresentado um «score de risco» multiparamétrico de mortalidade por TAVI com base num registo francês – FRANCE 2. O objetivo do nosso estudo foi comparar o desempenho do FRANCE 2, EuroSCORE II (ES II) e STS-Prom (STS) em prever a mortalidade a 30 dias nos doentes submetidos a TAVI.
MétodosForam estudados retrospetivamente 240 doentes de um registo prospetivo de centro único que foram submetidos a TAVI entre janeiro de 2008 e dezembro de 2015. Todos os scores foram avaliados para discriminação e calibração, com o uso da análise de curvas ROC e da análise de calibration-in-the-large, respetivamente.
ResultadosA mortalidade observada foi de 5,8% (n = 14). A mediana do ES II, STS II e FRANCE II foi de 5,0 (IQR 3,2-8,3), 5,1 (IQR 3,6-7,1) e 2,0 (IQR 1,0-3,0), respetivamente. O poder discriminatório foi maior para ES II (C-statistic 0,67) e STS (C-statistic 0,67) quando comparado com o FRANCE 2 (C-statistic 0,53), embora não fosse estatisticamente significativo (p = 0,26). Todos os scores apresentaram calibração adequada.
ConclusõesTodos os scores apresentaram um desempenho modesto em prever a mortalidade precoce após TAVI. Apesar de ser derivado de uma população de doentes submetidos a TAVI, o FRANCE-2 não mostrou ser melhor do que os scores de risco cirúrgicos na nossa população.
Transcatheter aortic valve implantation (TAVI) has emerged as a less invasive treatment alternative for patients with severe symptomatic aortic stenosis at high or very high surgical risk.1,2 Surgical risk scores are established tools for assisting in the decision-making process for these patients. The Society of Thoracic Surgeons Predicted Risk of Mortality3 (STS) score and the European System for Cardiac Operative Risk Evaluation4 (EuroSCORE II) are the most commonly used.
A multiparametric risk score for early mortality prediction has recently been derived based on a TAVI population from a French registry (FRANCE-2).5 In this registry, early mortality after TAVI was mainly related to age, severity of symptoms, comorbidities and access (transapical or other). The FRANCE-2 score is a simple additive score (ranging from 0 to 21) that can be used to predict early mortality after TAVI. In the internal validation, it showed only moderate discriminative ability,5 reflecting limited accuracy in the identification of high-risk patients.
In this study, we sought to externally validate the STS, EuroSCORE II and FRANCE II scores and to compare their performance in a TAVI population.
MethodsPatient population and data collectionThe Valve Catheter Restorative Operation on Santa cruz hoSpital (VCROSS) was a single-center, prospective, observational study that included 240 consecutive patients who underwent TAVI between January 2008 and December 2015. The interventional strategy was decided after multidisciplinary discussion. Acceptance of a patient for TAVI required consensus of the heart team. All data on demographic, clinical, and procedural characteristics were prospectively entered in our institutional cathlab-based dedicated database. Outcome data during hospital admission and during the first 30 days were entered in the same database. The EuroSCORE II and STS scores were calculated using the online calculators. The FRANCE-2 score was calculated manually in each patient by matching the sum of points of the variables with the corresponding prediction, using the published nomogram.5 The study was approved by the local ethics committee and informed consent was obtained from all patients.
Statistical analysisData were tested for normal distribution using the Kolmogorov-Smirnov test and/or visual assessment of Q-Q plots. Continuous variables were expressed as median and interquartile range (IQR) and categorical variables were expressed as percentages. Statistical analyses of categorical and continuous variables were performed using chi-square statistics and Fisher's exact test and the Mann-Whitney test, respectively.
The performance of the three models was analyzed focusing on discriminative power and calibration. Discrimination indicates the extent to which the model distinguishes between patients who will or will not die within the first 30 days. It was assessed by constructing receiver operating characteristic (ROC) curves for each model. Comparison between curves was assessed with the method described by DeLong et al.6 Calibration refers to the agreement between observed outcomes and predictions, and was assessed by calibration-in-the-large (which compares the mean observed frequency of 30-day death with the mean predicted probability) and percent discordance ([expected percentage-observed percentage]/observed percentage). Calibration-in-the-large refers to the difference between mean observed frequency and mean predicted probability. A statistically significant result indicates significant miscalibration, whereas a non-significant result supports the validity of the prediction model.
All tests were two-sided and differences were considered statistically significant at a p-value of 0.05. Statistical analysis was performed with IBM SPSS 21.0 software (IBM SPSS Inc., Chicago, IL, USA) and MedCalc version 9.3.8.0 (MedCalc Software, Acacialaan, Ostend, Belgium).
ResultsThe mean age of the study population was 81±7 years, 57% were female and 72% presented with New York Heart Association class III or IV. Mean aortic gradient and valve area were 51.3 ± 15.7mmHg and 0.68±0.18cm2, respectively, 53 patients (22%) had reduced left ventricular ejection fraction (LVEF) (<40%), 65 (27%) had previous cardiac surgery and 58 (24%) had moderate to severe renal failure (six were on dialysis). The baseline characteristics are shown in Table 1. Transfemoral access was chosen in two-thirds of patients; alternative approaches were transapical, and less commonly transaortic and subclavian. Four types of devices were implanted: the balloon-expandable Sapien (Edwards Lifesciences®), the first-generation self-expandable CoreValve (Medtronic®), Portico (St. Jude Medical®), and Lotus (Boston Scientific®).
Baseline characteristics of the study population.
Characteristic | n=240 | Alive at 30 days (n=226) | Dead at 30 days (n=14) | p |
---|---|---|---|---|
Age, years | 83 (78-87) | 83 (78-87) | 82 (74-87) | 0.875 |
Female | 57% | 57% | 43% | 0.298 |
BMI, kg/m2 | 26.1 (23.4-28.8) | 26.1 (23.6-29.1) | 26.6 (25.9-26.9) | 0.641 |
Diabetes | 31% | 30% | 43% | 0.327 |
Coronary artery disease | 46% | 45% | 71% | 0.051 |
Previous cardiac surgery | 27% | 25% | 43% | 0.140 |
Atrial fibrillation | 23% | 23% | 21% | 0.914 |
Cerebrovascular disease | 11% | 11% | 14% | 0.721 |
Respiratory insufficiencya | 17% | 17% | 7.1 | 0.319 |
Moderate to severe RF | 24% | 24% | 29% | 0.683 |
NYHA III or IV | 73% | 72% | 79% | 0.600 |
LVEF<40% | 22% | 19% | 73% | <0.001 |
LVEF, % | 55 (44-68) | 59 (45-68) | 37 (30-44) | <0.001 |
Transfemoral access | 66% | 66% | 71% | 0.673 |
EuroSCORE II | 5.0 (3.2-8.3) | 5.0 (3.4-7.5) | 9.7 (5.6-21.6) | 0.027 |
STS | 5.1 (3.6-7.1) | 4.9 (3.4-6.8) | 5.8 (5.1-7.8) | 0.033 |
FRANCE-2 | 2.0 (1.0-3.0) | 2.0 (1.5-3.0) | 2.5 (1.0-3.0) | 0.701 |
Aortic valve area, cm2 | 0.70 (0.50-0.80) | 0.70 (0.60-0.80) | 0.55 (0.50-0.75) | 0.390 |
Mean gradient, mmHg | 50 (41-60) | 50 (41-60) | 34 (32-38) | <0.001 |
Values are median (interquartile range) unless stated otherwise.
There were 14 deaths in the first 30 days (5.8%). Patients who died more frequently had lower LVEF and mean transaortic gradient. There was a trend for higher mortality in patients with coronary artery disease (Table 1).
Performance of mortality prediction scoresDiscriminative powerThe median EuroSCORE II, STS and FRANCE-2 scores were 5.0 (IQR 3.2-8.3), 5.1 (IQR 3.6-7.1) and 2.0 (IQR 1.0-3.0), respectively, with a corresponding 30-day mortality prediction of 8%.
EuroSCORE II and STS discriminated patients who died from those who did not numerically but not statistically better than FRANCE-2 (C-statistic for EuroSCORE II: 0.67, p=0.029; C-statistic for STS: 0.67, p=0.029, C-statistic for FRANCE-2: 0.53, p=0.724; p=0.26 for the comparison between areas under the curve using the DeLong method) (Figure 1).
CalibrationOverall, EuroSCORE II, STS and FRANCE-2 overestimated early mortality ([expected percentage-observed percentage]/observed percentage) by 11.2%, 7.2% and 38.8%, respectively. The calibration plots for EuroSCORE II, STS and FRANCE-2 in quartiles are shown in Figure 2. Despite the higher discordance for FRANCE-2, all scores showed adequate calibration (calibration-in-the-large for EuroSCORE II -0.03, p=0.51; STS -0.25, p=0.54; and FRANCE-2 -0.38, p=0.29) (Figures 3-5).
In our study, all scores showed low discriminative power for prediction of early mortality, though with adequate calibration.
The early mortality rate of our population (5.8%) is within the range of other contemporary multicenter registries and one recent meta-analysis of more than 16 000 procedures (5.4-12.4%).7-11
There are conflicting data regarding the role of surgical risk scores for early mortality prediction in TAVI patients. In a Swiss study,12 EuroSCORE II performed better in predicting short- and long-term mortality compared with STS and logistic EuroSCORE. A French study showed that EuroSCORE II had moderate discrimination for 30-day mortality after TAVI.13 Watanabe et al.,14 in another French study, demonstrated that EuroSCORE II had low accuracy in predicting 30-day mortality in 435 patients undergoing TAVI.
The low accuracy of the STS score in predicting short-term mortality after TAVI was also demonstrated in a Canadian study involving 399 patients15 and in an Italian study that assessed 663 patients.16 The STS score was an independent predictor of mortality after surgical aortic valve replacement but not after TAVI in the PARTNER trial.17 In addition, in a real-world Brazilian registry, the surgical risk scores were also inaccurate in predicting mortality after TAVI.18 However, in a German registry19 including 36% transapical procedures, the STS score proved to be a good predictor of 30-day mortality after TAVI.
Major differences between the FRANCE-2, EuroSCORE and STS scores in terms of included comorbidities are shown in Table 2. New prediction scores aim to improve their accuracy by including features specific to TAVI, as in the FRANCE-2 score, which was derived from a large TAVI population and is designed to predict 30-day mortality by combining nine variables (age, body mass index, functional class, previous pulmonary edema, pulmonary hypertension, respiratory insufficiency, critical hemodynamic state, dialysis and approach). Nevertheless, FRANCE-2 had the worst performance of the three scores in our population. This may be because, being derived from a registry, it does not take in account prognostically important variables like LVEF, mitral regurgitation, obstructive pulmonary disease, cerebrovascular disease, chronic kidney failure, pulmonary hypertension, coronary artery disease or frailty.20–23 Additionally, procedural success indicators that influence prognosis, particularly perivalvular regurgitation and thrombotic and bleeding events, are not included, which limits its predictive ability. In our population, left ventricular function and mean gradient were independent predictors of death at 30 days (odds ratio [OR] 0.95, 95% CI 0.90-1.00; p=0.039 and OR 0.89, 95% CI 0.83-0.96; p=0.001). These variables are directly or indirectly included in the EuroSCORE II and STS scores.
Main comorbidities included in each of the three models.
STS | EuroSCORE II | FRANCE-2 | |
---|---|---|---|
Peripheral vascular disease | yes | yes | no |
Renal failure | yes | yes | no |
Dialysis | yes | no | yes |
Pulmonary hypertension | no | yes | yes |
Neurological dysfunction | yes | yes | no |
Redo cardiac surgery | yes | yes | no |
Diabetes | yes | no | no |
Atrial fibrillation | yes | no | no |
COPD | yes | yes | yes |
NYHA class | yes | yes | yes |
LVEF | yes | yes | no |
Coronary artery disease | yes | no | no |
COPD: chronic obstructive pulmonary disease; LVEF: left ventricular ejection fraction; NYHA: New York Heart Association; STS: Society of Thoracic Surgeons.
These results suggest that risk prediction tools should not be used in isolation, although they can help in deciding on the best therapeutic options for the individual patient, as procedural success still has a major role in this complex, albeit increasingly simple, technique. The role of heart teams is crucial, as they offer a better kind of collaboration between cardiologists and cardiac surgeons in many centers, from screening to procedural planning and success.24
To our knowledge, this is the first study to compare the performance of the FRANCE-2, EuroSCORE II and STS risk scores in predicting 30-day mortality after TAVI. It is also one of the first external validations of the FRANCE-2 score.
LimitationsSome limitations in our study should be pointed out. First is the inherent limitations of a single-center retrospective study. Secondly, the small number of subjects may have limited the power of the statistical analysis and the ability to find statistical significance for many of the comparisons. Thirdly, the time span of the registry renders the group highly heterogeneous, especially considering that it included the first part of the learning curve of the TAVI program (from patient selection to valve implantation and postprocedural care) in our center. Fourthly, it is not possible to ascertain the extent to which confounders inherent to specific selection criteria for TAVI may have influenced mortality rates and thus the predictive ability of the scores. Finally, full validation of FRANCE-2 score would require random assignment to either surgical valve replacement or TAVI in a prospective study.
ConclusionsA score derived from a TAVI registry, FRANCE-2, did not improve early mortality prediction after TAVI in comparison to the EuroSCORE II and STS surgical scores. Prospective studies are needed for further validation.
Conflicts of interestThe authors have no conflicts of interest to declare.