A novel workflow based on machine learning that integrates TCRβ sequencing data for the identification and ranking of CRC neoantigens.

(A) Tumor biopsies and peripheral blood from CRC patients were subjected to targeted DNA-seq, RNA-seq, and TCR-seq. (B) The prediction of peptide-HLA binding and peptide-HLA-TCR binding by indicated tools using the DNA-seq, RNA-seq, and TCR-seq data was performed. (C) Machine learning models were subsequently constructed based on the analysis of the peptide-HLA binding and peptide-HLA-TCR binding features to distinguish immunogenic antigens from non-immunogenic peptides. The immunogenicity of predicted neoantigen candidates prioritized by the model was validated by ELISpot to evaluate the effectiveness of this approach.

Tumor-infiltrating TCRβ profiles in 27 colorectal cancer patients.

(A) A bar plot depicting the distribution of TCR clonotypes among 27 CRC patients, categorized into two groups: those with a unique read count and those with read counts greater than or equal to 2 for each TCR clonotype. (B) The pie chart displays the recurrence rates of TCR clones, V segments, and J segments when the read count of TCR clones exceeds 01. The graph illustrates the uniqueness of TCR clones and the shared presence of both V and J segments. (C) The heatmap depicts the Z-scored read counts of V segments or (D) J segments across 27 samples. Some V and J segments were found to be dominant in all samples. (E) The chord diagram illustrates the rearrangement of V and J segments, revealing random V and J combinations, with a few combinations exhibiting high frequencies.

Peptide-TCR and peptide-HLA interactions are two complementary determinants of neoantigen immunogenicity.

(A) The histogram displays the HLA percentile distribution of immunogenic antigens (red bar) and non-immunogenic peptides (grey bar). (B) The percentage of immunogenic antigens (red bar) and non-immunogenic peptides (grey bar) is compared between two groups based on HLA percentile: <2% and >= 2% (Chi-square test, p< 0.00001). (C) The histogram displays the TCR ranking distribution of immunogenic antigens (red bar) and non-immunogenic peptides (grey bar). (D) The percentage of immunogenic antigens (red bar) and non-immunogenic peptides (grey bar) is compared between two groups based on TCR ranking: <2% and >= 2% (Chi-square test, p=0.086). (E) The scatter plot illustrates the relationship between the HLA percentile distribution and TCR ranking of immunogenic antigens (red bar) and non-immunogenic peptides (grey bar). (F) The percentage of immunogenic antigens (red bar) and non-immunogenic peptides (grey bar) is analyzed in four distinct groups based on cutoffs of HLA percentile and TCR ranking. (G) The bar plot illustrates the sensitivity and specificity of three neoantigen prioritization approaches: based on neoantigen-HLA binding affinity alone (yellow bar), neoantigen-TCR binding ranking alone (blue bar), and the combined method using both features (red bar).

The combined model demonstrates improved sensitivity and specificity for neoantigen prioritization.

(A) The workflow for constructing the model. (B) The ROC curves demonstrate the performance of both the combined model and individual models in both the discovery and validation cohorts. The bar graphs illustrate the sensitivity (C), negative predictive value (NPV) (D), and positive predictive value (PPV) (E) at specificity levels of at least 95% or 99% for both the combined and individual models in both the discovery and validation cohorts. (F) Ranking coverage scores for the specified models in either the discovery or validation cohorts.

Validation of neoantigens identified in silico from the novel workflow through ELISpot assays conducted on four CRC patients.

(A) A schematic diagram illustrates the procedural steps of neoantigen prioritization and the ELISpot assay. (B) The count of neoantigens identified from each pipeline. (C) The fold change in IFN-γ spots, relative to the wildtype peptides, is shown for 21 long peptides. Note: Only the mutants that result in a positive value in ELISpot are depicted, along with their corresponding amino acid changes and their associated rankings. (D) ELISpot assays on six long peptides resulting in at least a 2-fold change in IFN-γ spots. (E) The bar graphs display rank coverage scores of validated long peptides identified from the NetMHCpan tool (blue bar) or the combined method (red bar) for individual patients and all patients.

Quality control metrics for Tumor-Infiltrating Lymphocyte (TIL) TCRβ analysis.

(A) Distribution of CDR3β lengths in total TCR clones. (B) The scatter plot illustrates the relationship between Shannon-index and clonality.

Association between TIL TCRβ profiles and patients’ characteristics.

The bar plot and dot plot compare TCR clones, Shannon-index, and clonality between MSI-H and MSS (A, B, C), stage II and III (D, E, F), female and male gender (G, H, I), and distal and proximal tumor locations (K, L, M).

The performance of three machine learning models with three different algorithms is evaluated using receiver operating characteristic (ROC) curves.

The curves depict the performance of the combined model in the discovery cohort (A) and the validation cohort (B)