Background Prediction of clinical outcomes following allogeneic hematopoietic stem cell transplantation (Allo-HSCT) may promote informed decisions and treatment personalization. We sought to evaluate the predictive performance of six commonly used models in the prediction of transplantation outcomes. Methods This was a single-center retrospective study including patients undergoing Allo-HSCT between 2011 and 2015. We calculated the following scores: Hematopoietic Cell Transplantation-Comorbidity Index (HCTCI), Comorbidity-Age index (AGE-HCTCI), Endothelial Activation and Stress Index (EASIx), Pretransplant Assessment of Mortality (PAM) and revised PAM (rPAM) scores, and European Group for Blood and Marrow Transplantation (EBMT) score. Predictive accuracy was measured by discrimination (AUC), a measure ranging from 0.5 to 1, corresponding with poor to excellent performance. Outcomes evaluation were overall survival (OS), non-relapse mortality (NRM), and relapse. Results Among 528 patients included, indications for transplantation varied, with acute myeloid leukemia (44%) being most common. Non-malignant causes accounted for 3% of cases. Most patients received myeloablative conditioning (74%, MAC) and grafts from HLA matched-sibling donors (46%). Median follow-up was 2.5 years (intra-quantile range 1.7-3.9). The rPAM, which includes patient, disease, donor, and transplantation characteristics, had the highest discrimination for OS, NRM, and relapse across all time-points (Figure 1). EASIx, a biomarker based-model integrating pre-transplantation creatinine, lactate dehydrogenase, and thrombocyte levels, had among the highest AUCs for day-100 OS. Across outcomes, AUCs were mostly 0.6 or below for the HCT-CI, AGE-HCTCI, and EBMT score. In a sub-analysis, the rPAM, EASIx, and AGE-HCTCI were more accurate in the prediction of 2-year OS in patients receiving MAC, while the EBMT score did better in patients receiving reduced-intensity conditioning (Figure 2). Conclusions Prognostic models in Allo-HSCT have limited predictive performance. Among models evaluated, the rPAM had the highest discrimination overall. Biomarker-based prediction with EASIx is promising for the prediction of short-term outcomes.