Schematic diagram of BinaryClust method. Semi-supervised classification is first performed on selected markers in the user-defined marker expression matrix to classify and annotate major cell types. Population-of-interest can be further extracted and explored using unsupervised clustering methods followed by differential analysis. Figure was generated by BioRender (https://www.biorender.com/).

Agreement evaluation comparing manual gating and BinaryClust in MPN cohort(n=9). Manual gating of B cells, CD4 T cells, CD8 T cells, dendritic cells, NK cells, monocytes and gamma delta T cells were performed by two independent experts using Cytobank, and mean values of the population percentages were calculated to compare with BinaryClust results. Each dot represents one patient sample. (A) Scatter plot showing the correlation between the two methods, with the red line indicating perfect agreement (correlation coefficient = 1); (B) Bland-Altman plots of the two measurement methods among all the cell populations, with the black line suggesting the mean observed difference and red dotted lines indicating limits of agreement (1.96 x standard deviations).

Comparison of manual gating (manual1 and manual2), BinaryClust and flowSOM clustering results in MPN cohort(n=9). Interaction plots showing the individual measurement (percentage) of each study participant with indicated colors by different methods across main cell lineages (B cells, CD8 T cells, Gamma Delta T cells, NK cells, Dendritic cells, monocytes and CD4 T cells).

Precision, recall, F-measure, and ARI of indicated clustering methods.

Comparison of BinaryClust, flowSOM and LDA on speed. Bar chart showing runtime (in seconds) of the three methods in three different datasets.

Cell type characterization and visualisation using ImmCellTyper pipeline in MPN dataset(n=9). (A)Intensity distribution of selected phenotypic markers used for BinaryClust classification, coloured by sample_id;(B) Pre-defined expression classification matrix for the MPN dataset, ‘+’ indicates positive, ‘-’ indicates negative and ‘A ‘suggests ‘any’;(C) Proportion of the main cell lineages of all cells in the concatenated FCS files after classification; (D) Median marker expression heatmap of BinaryClust classification results; (E)UMAP plot of random downsample of 2000 cells per patient coloured by main cell types based on BinaryClust classification(left) and manual gating results(right). (F) UMAP plots coloured by normalized expression of indicated markers (CD3, CD4, CD8a, CD20, CD19, CD14, and CD56) across 2000 cells per sample.

Applying ImmCellTyper pipeline on COVID-19 patient dataset(n=82) published by Chevrier et.al. (A) Marker intensity distribution of selected phenotypic markers used for BinaryClust classification, coloured by disease severity(n=22 healthy individuals, 28 mild COVID-19 patients and 38 severe COVID-19 patients);(B)Pre-defined marker expression classification matrix used for BinaryClust; (C) T-SNE plots, with 1000 cells per sample, were coloured by the main cell types generated by BinaryClust and faceted by different study groups;(D)The corresponding median marker expression heatmap of BinaryClust results for the COVID-19 dataset.

Quantification and statistical analysis comparing the study conditions in COVID-19 dataset(n=82). (A)Stacked histogram of main cell type composition per individual generated by BinaryClust, and grouped by study conditions (healthy, mild and severe); (B)Boxplots representing cell abundance frequencies among the study conditions, faceted by different main cell types;(C) State marker expression intensities with comparison of the study groups across the main cell types; (D)Clusters of monocytes and neutrophils were extracted from the whole cells for downstream interrogation. T-SNE plots with random downsample of 1000 monocyte cells and (E) neutrophils per sample were coloured by study conditions and Phenograph clustering results(k=60), respectively. Statistical significance was marked by asterisk. * P<0.05, **P<0.01, ***P<0.001, ****P<0.0001

Overall schematic outline of the ImmCellTyper workflow with description for each step.

Manual hierarchical gating strategy for main cell linages from human PBMC samples (MPN dataset, n=9). All .FCS files were cleaned up to remove doublets, normalisation beads and debris (please refer to the methods for standard clean-up procedure), and pre-gated for CD45+ leukocytes. (A) Serial bi-axial scatter plots representing the gating diagram for T cell subsets (CD4 T cells, CD8 T cells and gamma-delta T cells), NK cells and dendritic cells based on the indicated phenotypic markers; (B) Serial bi-axial scatter plot indicating the gating strategy to isolate monocytes and B cells from leukocytes. All manual gating was done using Cytobank platform(https://premium.cytobank.org/cytobank/).

Agreement evaluation between ImmCellTyper and manual gating in influenza dataset(n=11). Manual gating was performed using Cytobank and exported manually. (A) Correlation plots between ImmCellTyper results and manual gating results concerning percentages in CD4 T cells, Gamma Delta T cells, dendritic cells, NK cells, CD8 T cells and B cells, with red line indicating perfect agreement (correlation coefficient =1); (B)Bland-Altman plots of the two measurements in the indicated populations, with black line suggesting mean difference between measurements and dotted red line indicating limits of agreement (1.96 x standard deviations). (C) Calculation of precision, recall, F-measure for ImmCellTyper method in comparison to manual gating in the indicated cell populations.

Boxplots of the indicated cell percentages generated by different methods. Statistical significance was marked by asterisk. * P<0.05, **P<0.01, ***P<0.001.

(A) UMAP plots of normalized expression of indicated markers (CD66b, HLADR, TCRgd, CD20, CD16, CD161) across 2000 cells per sample in MPN dataset;(B) FlowSOM clustering was performed on the same dataset to compare with BinarClust, k=20 was chosen followed by manual annotation of each cluster. UMAP plot was projected with merged flowSOM clusters with biological annotation (downsample 2000 cells per sample) ;(B) The corresponding median marker expression heatmap after flowSOM clustering and annotation.

Marker expression heatmap of (A)monocytes and (B) neutrophils generated by Phenograph clustering.

Clean-up procedure of CyTOF data using Cytobank.

CyTOF antibody panel for MPN cohort