Deciphering the Genetic Code of Neuronal Type Connectivity: A Bilinear Modeling Approach

Mu Qiao

doi:10.7554/eLife.91532.2

Abstract

Understanding how different neuronal types connect and communicate is critical to interpreting brain function and behavior. However, it has remained a formidable challenge to decipher the genetic underpinnings that dictate the specific connections formed between neuronal types. To address this, we propose a novel bilinear modeling approach that leverages the architecture similar to that of recommendation systems. Our model transforms the gene expressions of presynaptic and postsynaptic neuronal types, obtained from single-cell transcriptomics, into a crosscorrelation matrix. The objective is to construct this cross-correlation matrix that closely mirrors a connectivity matrix, derived from connectomic data, reflecting the known anatomical connections between these neuronal types. When tested on a dataset of Caenorhabditis elegans, our model achieved a performance comparable to, if slightly better than, the previously proposed spatial connectome model (SCM) in reconstructing electrical synaptic connectivity based on gene expressions. Through a comparative analysis, our model not only captured all genetic interactions identified by the SCM but also inferred additional ones. Applied to a mouse retinal neuronal dataset, the bilinear model successfully recapticulated recognized connectivity motifs between bipolar cells and retinal ganglion cells, and provided interpretable insights into genetic interactions shaping the connectivity. Specifically, it identified unique genetic signatures associated with different connectivity motifs, including genes important to cell-cell adhesion and synapse formation, highlighting their role in orchestrating specific synaptic connections between these neurons. Our work establishes an innovative computational strategy for decoding the genetic programming of neuronal type connectivity. It not only sets a new benchmark for single-cell transcriptomic analysis of synaptic connections but also paves the way for mechanistic studies of neural circuit assembly and genetic manipulation of circuit wiring.

eLife assessment

This is an important computational study that applies the machine learning method of bilinear modeling to the problem of relating gene expression to connectivity. Specifically, the author attempts to use transcriptomic data from mouse retinal neurons to predict their known connectivity with promising results. On revision, the approach was tested against a second data set from C. Elegans. A limited number of genes studied in this second dataset may have resulted in performance that matched but did not exceed prior models, however, taken together, the results were felt to provide solid evidence for the value of the approach.

https://doi.org/10.7554/eLife.91532.2.sa2

1 Introduction

One of the fundamental objectives in neuroscience is understanding how diverse neuronal cell types establish connections to form functional circuits. This understanding serves as a cornerstone for decoding how the nervous system processes information and coordinates responses to stimuli [1]. Despite this, the genetic mechanisms determining the specific connections between distinct neuronal types, especially within complex brain structures, remains elusive [2, 3].

Recent advances in transcriptomics and connectomics provide opportunities to probe this. Single-cell transcriptomics enables high-resolution profiling of gene expressions across neuronal types [4, 5], while connectomic data offers detailed maps quantifying connections between neuronal cell types [6, 7, 8]. However, the challenge of linking gene expressions derived from single-cell transcriptomics to neuronal type connectivity evident from connectomic data to uncover the genetic underpinnings has yet to be fully addressed.

Drawing inspiration from the field of machine learning, particularly recommendation systems, we introduce a bilinear model to bridge this gap. This model, in the context of recommendation systems, has been successful in capturing intricate user-item interactions [9]. By treating the gene expressions of pre- and post-synaptic neurons and their connectivity akin to users, items, and their ratings, we adapt the architecture of recommendation systems to the neurobiological domain. We hypothesize that a similar model could capture the complex relationships between genetic patterns of presynaptic and postsynaptic neurons and their connectivity.

This bilinear modeling approach was first applied to a Caenorhabditis elegans (C. elegans) neuronal dataset, where it not only matched but slightly outperformed the spatial connectome model (SCM) in reconstructing the connectivity of electrical synapses or gap junctions from innexin gene expressions. Notably, it revealed additional genetic interactions beyond those uncovered by the SCM. When extended to mouse retinal neurons, we demonstrate that it could effectively reconstruct synaptic connectivity between bipolar cells (BCs) and retinal ganglion cells (RGCs) from their gene expressions. The model not only unveils connectivity motifs between BCs and RGCs but also provides biologically meaningful insights into candidate genes and the genetic interactions that orchestrate this connectivity. Furthermore, our model predicts potential BC partners for RGC transcriptomic types, with these predictions aligned substantially with functional descriptions of these cell types from previous studies. Collectively, this work significantly contributes to the ongoing exploration of the genetic code underlying neuronal connectivity and suggests a potential paradigm shift in the analysis of single-cell transcriptomic data in neuroscience.

2 Background and Related Work

2.1 Synaptic Specificity

The intricate neural networks that form the basis of our nervous system are a product of specific synaptic connections between different types of neurons. This specificity is not a mere coincidence but a meticulously orchestrated process that underpins the functionality of the entire network [3, 10]. Each neuron can form thousands of connections, or synapses, with other neurons, and the specificity of these connections determines the neuron’s function and, by extension, the network’s function as a whole.

Synaptic specificity encompasses both chemical synapses, which rely on neurotransmitter-mediated communication between pre- and post-synaptic neurons [3], and electrical synapses, where direct transmission of ions or small molecules occurs via gap junctions [10]. A classic example of chemical synaptic specificity is observed in the retina, where different types of BCs form specific synaptic connections with various types of RGCs [7, 11, 12]. These connections create parallel pathways that transform visual signals from photoreceptors to RGCs, which subsequently transmit the information to the brain [13, 14]. Meanwhile, specific gap junction connections, composed of connexins in vertebrates and innexins in invertebrates, has been observed between C. elegans neurons [15, 16, 17, 18, 19]. They function broadly in neural circuits of sensory processing and behavioral output [10, 20].

The genetic principles guiding the formation of these specific connections, particularly in complex brain structures, remains elusive. The brain’s complexity, with its billions of neurons and trillions of synapses, poses significant challenges in identifying the specific genes and genetic mechanisms that guide the formation of these connections. Despite advances in genetic and neurobiological research, such as understanding the roles of certain recognition molecules and adhesion molecules in synaptic specificity, the genetic foundation of connectivity between neuronal types is still largely unknown [3, 21, 10].

Emerging tools and technologies offer unprecedented opportunities to unravel these mysteries. Among these, transcriptome and connectome are particularly promising [3, 22]. Transcriptome, the complete set of RNA transcripts produced by the genome, can provide valuable insights into the genes that are active in different types of neurons and at different stages of neuronal development. This can help identify candidate genes that may play a role in guiding neuronal connectivity. Connectome, on the other hand, provides a detailed map of the connections between neurons. By combining information from transcriptome and connectome, it is possible to link specific genes to specific connections, thereby shedding light on the genetic basis of synaptic connectivity.

2.2 Previous Approaches

Prior research has reported several methodologies to unravel the genetic underpinnings of neuronal connectivity. For instance, Kaufman et al. showed a correlation between gene expression of C. elegans neurons and their connectivity [23], and Varadan et al. developed an entropy minimization approach for understanding the molecular logic of synaptic connectivity in C. elegans [24]. These models, however, did not fully account for spatial constraints for synaptic formation.

In response, subsequent studies proposed methodologies that integrate gene expressions with neuronal connectivity, taking into consideration physical contacts between neurons [25, 26, 27]. Specifically, the Spatial Connectome Model (SCM) in Kovács et al. correlates the gene expression of neurons with their connectivity via a rule matrix. This model aims to minimize the discrepancy between predicted connectivity based on gene expression, and observed connectivity. By restricting the analysis to neuron pairs that are in physical contact, the SCM transforms the original problem into a regression between the Kronecker product of the gene expression matrix and an edge list that captures neuronal connectivity [25].

Additionally, Taylor et al. introduced the network differential gene expression analysis (nDGE), a statistical method that expands upon traditional differential gene expression analysis by examining the co-expression of gene pairs between neuron pairs, comparing synaptic versus non-synaptic neuronal groups through t-tests. It incorporates physical contacts between neurons through the generation of “pseudoconnectomes” for null distribution estimation. Unlike multivariate methods such as the SCM, nDGE operates as a mass-univariate method, focusing on single gene pairs’ contributions to synaptic formation without considering the complex interactions among multiple co-expressed genes. This makes nDGE’s findings inherently conservative, ensuring strict control over type 1 errors but potentially underestimating the multifaceted nature of synaptic connectivity [27].

While the SCM and nDGE models have focused on the connectivity of individual neurons and were tested using C. elegans datasets, their generalization to neuronal cell types has not been explored. As we move from the invertebrate nervous systems to the neural architectures of vertebrates, such as those in mice or macaques, we need methodologies capable of unraveling the genetic basis of neuronal type connectivity [4, 28].

2.3 Collaborative Filtering

Our strategy draws inspiration from the concept of collaborative filtering using bilinear models, a technique fundamental to recommendation systems [29, 30]. These systems predict a user’s preference for an item (e.g., a movie or product) based on user-item interaction data.

Bilinear models capture the interaction between users and items via low-dimensional latent features [9, 31]. Mathematically, for user i and item j, we denote their original features as x_i ∈ R^1×p and y_j ∈ R^1×q, respectively. These features are then projected into a shared latent space with dimension d via transformations x_iA (where A ∈ R^p×d) and y_jB (where B ∈ R^q×d). The predicted rating of the user for the item is then formulated as:

In the context of collaborative filtering, the goal is to optimize the transformation matrices A and B to align the predicted rating r_ij with the ground-truth z_ij. This is expressed as the following optimization problem:

Or in the matrix form:

Here, the objective is to minimize the Frobenius norm of the residual matrix Z − (XA)(Y B)^T.

In our study, we interpret neuronal connectivity through the lens of recommendation systems, viewing presynaptic neurons as “users”, postsynaptic neurons as “items”, and the synapses formed between them as “ratings”. Our chosen bilinear model extracts latent features of pre- and post-synaptic neurons from their respective gene expressions. One key advantage of the bilinear model is its capacity to assign different weights to the gene expressions of pre- and post-synaptic neurons, enabling the model to capture not just homogeneous but also complex, heterogeneous interactions fundamental to understanding neuronal connectivity. Prior studies have highlighted such heterogeneous interactions, noting the formation of connections between pre- and post-synaptic neurons expressing different cadherins, indicative of a heterogeneous adhesion process [32, 33].

3 Bilinear Model for Neuronal Type Connectivity

We discuss the bilinear model for neuronal type connectivity in the following two scenarios: the first in which gene expression and connectivity of each cell are known simultaneously and the second where connectivity and gene expressions of neuronal types are from different sources. The bilinear models for these two situations are illustrated in Figure 1.

Illustration of our approach. (a) In an ideal scenario where gene expression profiles and connectivity data of individual cells are available simultaneously, we establish the relationship between connectivity and gene expression profiles via two transformation matrices A and B (b) In practical situations where the gene expression profiles are derived from distinct sources, such as single-cell transcriptomic and connectomic data, we propose that the connectivity of individual cells and their latent gene expression features can be approximated by the averages of their corresponding cell types, and establish their relationship through transformation matrices Â and .

3.1 Gene Expression and Connectivity of Each Cell are Known Simultaneously

3.1.1 Objective Function

We begin with an ideal scenario where both the gene expression profiles and connectivity of individual cells are known concurrently. In this setting, we have a presynaptic neuronal types and b postsynaptic neuronal types, indexed by i and j, respectively. Each type contains a number of neurons, signified as n_i for presynaptic and n_j for postsynaptic types. The gene expression vector for the k^th cell in the presynaptic type i is designated as x_(ik), where k ∈ 1, 2, …, n_i, while for the l^th cell in postsynaptic type j, it is y_(jl) with l ∈ 1, 2, …, n_j. We depict the connectivity metric between a presynaptic neuron and a postsynaptic neuron as z_(ik)(jl).

Drawing from the principles of collaborative filtering, we develop the following optimization objective:

Here, A and B denote the transformation matrices we aim to learn. This formula can also be expressed in its matrix form as:

In this equation, W symbolizes a weight matrix where each element . As our study focuses on the genetic code of pre- and post-synaptic neuronal types rather than individual neurons, this weight matrix ensures that the model does not disproportionately favor neuronal types with a greater number of neurons over rarer types. Note that this formulation can be generalized to individual cell level analysis by treating each cell as a type and setting n_i = n_j = 1, thus allowing exploration of genetic underpinnings of connectivity at the single-cell resolution.

In the context of high dimensionality of gene expressions, the bilinear model may face a common issue in machine learning called multicollinearity, a condition where one or more predictor variables are highly correlated. To mitigate this, we can perform principal component analysis (PCA) on the gene expression vectors, transforming them into a new coordinate system and removing components with negligible eigenvalues to reduce redundant information. Alternatively, we can apply regularization techniques, such as L2 regularization (Ridge) or L1 regularization (Lasso) to effectively manage the multicollinearity. These regularization methods work by imposing a penalty on the size of the linear coefficients in the model, thereby shrinking the coefficients and stabilizing their estimates.

3.1.2 Optimization Algorithm

Incorporating L2 regularization, we minimize the following loss function with regularization hyperparameters λ_A and λ_B:

To optimize this function, we propose an alternative gradient descent algorithm. This algorithm alternates between updating the transformation matrices A and B, using the gradient descent optimization method.

The algorithm begins by initializing transformation matrices A and B using random values drawn from a standard normal distribution. The central aspect of the algorithm is an iterative loop that alternates the updates of A and B. During each iteration, the algorithm first computes the predicted connectivity metric Z using the current estimates of A and B. Subsequently, the gradient of the loss function with respect to the transformation matrices is calculated, and the matrices are updated by moving in the negative gradient’s direction. This iterative process is repeated until the transformation matrices A and B converge to a steady solution. Upon completion, the algorithm yields the optimized transformation matrices.

This gradient descent-based algorithm provides a computationally efficient solution to the bilinear mapping problem between gene expression profiles and connectivity metrics. As a result, it produces associations between gene expression profiles of cell types and their connectivity.

Algorithm 1 Alternative Gradient Descent (AGD) for 4.1 Algorithm 1 Alternative Gradient Descent (AGD) for 4.1

3.2 Connectivity and Gene Expressions of Neuronal Types are from Different Sources

3.2.1 Objective Function

In real scenarios, gene expression profiles and connectivity information are often derived from separate sources, such as single-cell sequencing [34, 35] and connectome data [7, 36, 37]. Bridging these datasets requires classifying neurons into cell types based on their gene expression profiles and morphological characteristics. These cell types from different sources are subsequently aligned according to established biological knowledge (e.g., specific gene markers are known to be expressed in certain morphologically-defined cell types [38]).

The primary challenge in this scenario is that, while we can align cell types (denoted by indices i and j in equation 4), we are unable to associate individual cells (represented by indices k and l in equation 4). To tackle this issue, we adopt a simplifying assumption that the connectivity and latent gene expression features of individual cells can be approximated by the averages of their corresponding cell types. This premise hinges on the notion that the connectivity metrics and latent gene expression features of individual cells are close enough to the mean value of their corresponding cell types.

As a result, our optimization objective in equation 4 becomes:

In this equation, z_(i.)(j.) denotes the mean connectivity metric between presynaptic cell type i and postsynaptic cell type j. Meanwhile, x_(i.) and y_(j.) represent the average gene expressions of cell types i and j respectively.

While optimizing the transformation matrices A and B, we impose constraints on these matrices to ensure that the variance of latent gene expression features within each neuronal type is minimized. Specifically, we define ϵ as a small enough value and impose the following constraints on A:

Where

and B:

Where

These conditions assure that the latent gene expression features of individual cells are proximate enough to the average value within their respective cell types. With these constraints in mind, we formulate the optimization problem as follows:

In this equation, denotes the average gene expressions of the a presynaptic cell types, wherein each element is indicative of the average gene expression feature m within cell type i. Likewise, represents the average gene expressions of the b postsynaptic cell types, with each element signifying the average gene expression feature m in cell type j.

In practical application, we approximate Σ_x and Σ_y with their diagonal estimates and [39, 40]. We then transform the initial optimization problem into the following:

Here, elements in are defined as and elements in Ŷ ∈ R^b×q are given by . The optimization of this formulation tends to be computationally more tractable.

In summary, our methodology adapts when gene expression profiles and the connectivity matrix originate from distinct sources. Instead of aligning at the level of individual cells, we focus on the alignment of neuronal types. We achieve this by mapping gene expressions into a latent space via transformation matrices Â and , with the optimization process aiming to minimize the discrepancies between these two sources of information while maintaining consistency of the gene expression features within individual neuronal types.

3.2.2 Optimization Algorithm

To solve the optimization problem as outlined in equation 13, we construct the following loss function:

where λ_A and λ_B are hyperparameters whose optimal values are determined through a grid search.

To optimize this loss function, we employ an alternative gradient descent algorithm analogous to that described in section 3.1.2, by iteratively updating the transformation matrices Â and .

Algorithm 2 Alternative Gradient Descent (AGD) for 4.2 Algorithm 2 Alternative Gradient Descent (AGD) for 4.2

4 Datasets and Pre-processing

To validate and assess the efficacy of our bilinear model, we utilized two distinct datasets available from previous studies:

4.1 Gap Junction Connectivity and Innexin Expression Data of C. elegans Neurons

We first used a dataset of gap junction connectivity and innexin expressions of individual C. elegans neurons. Derived from the work of Cook et al. [41] and subsequently analyzed by Kovács et al. [25], this dataset included expression profiles of 18 innexin genes across 184 neurons, alongside detailed gap junction connectivity between these neurons. We followed the same procedure outlined by Kovács et al. to obtain the innexin expression matrix X and Y (in this case X = Y with the dimensions of 184 × 18), and the connectivity matrix between individual C. elegans neurons Z.

To incorporate spatial constraints by considering only neuron pairs in physical contact, we extracted a contact matrix from the dataset. This was transcribed into the weight matrix W in our model, with values set to 0 for neuron pairs without physical contact and 1 for those with contact. This enabled our bilinear model to focus on the 5,592 neuron pairs that exhibit physical contacts, restricting the analysis to biologically plausible connections.

The utilization of this dataset serves a dual purpose. It not only provides a validation for our bilinear model but also enables a direct comparison with the model employed by Kovács et al., offering a comprehensive evaluation of the bilinear model in the context of established connectomic research.

4.2 Single-cell Transcriptomic and Connectomic Data of Mouse Retinal Neurons

The second dataset encompassed data of mouse retinal neurons, integrating single-cell transcriptomic data from various studies with connectomic data obtained from the EyeWire project. The data provide us with connectivity information and gene expression profiles of mouse BCs and RGCs, and are important for applying our proposed bilinear model and testing its effectiveness in a more complex neuronal environment compared to the C. elegans dataset.

4.2.1 Single-cell Transcriptomic Data

The single-cell transcriptomic data include the gene expression profiles for two classes of mouse retinal neurons - presynaptic BCs as reported by Shekhar et al. [34], and postsynaptic RGCs as reported by Tran et al. [35].

Preprocessing of this data adhered to previously documented procedures [34, 35, 42]. The transcript counts within each cell were first normalized to align with the median number of the transcripts per cell, followed by a log-transformation of the normalized counts. High variable genes (HVGs) were then selected using an approach based on establishing a relationship between mean expression level and the coefficient of variance [43, 44, 45]. We focused on those cells whose types correspond with the neuronal types outlined in the connectomic data, as delineated later in Table S1, Table S2, and Table S3. This yielded two matrices, X and Y, representing presynaptic BCs and postsynaptic RGCs, where each row pertains to a cell and each column represents an HVG. The dimensions of X and Y are 22453 × 17144 and 3779 × 12926, respectively.

Next, we performed a principal component analysis (PCA) on these matrices to transform the gene expression data into the principal component (PC) space. We retained only the PCs that account for a cumulative 95% of explained variance. Consequently, the gene expression of the BCs in X and the RGCs in Y were featurized by their respective PCs, resulting in matrices of dimensions 22453 × 11323 and 3779 × 3142, respectively.

Based on each cell’s neuronal type, we computed the variance of gene expression features within these types. Mathematically, the variance of gene expression feature m within the BC types and the RGC types are expressed as:

Taking and to represent the average gene expression feature m of the BC type i and the RGC type j, we were able construct matrices, and Ŷ, in which and . In these matrics, each row represents a cell type, with the dimensions of being 25 × 11323 and Ŷ being 12 × 3142. These matrices serve to bridge the gene expression of BC types and RGC types with the connectivity matrix of these neuronal types derived from the connectomic data.

4.2.2 Connectivity Data

The connectivity matrix of neuronal types is derived from connectomic data acquired through the process of serial electron microscopy (EM)-based reconstruction of brain tissues [6, 7, 8]. From these reconstructed tissues, connectivity measurements are usually expressed as either the contact area or the number of synapses between neurons [7, 46]. When normalized to the total contact area or total number of synapses of each neuron, the resulting metric, ranging from 0 to 1, signifies the percentage of contact area or synapses formed between neurons. This normalized metric provides a quantitative connectivity measure, where 0 indicates no connectivity and 1 implies complete connectivity between two neurons.

Our analysis utilized the neural reconstruction data of mouse retinal neurons, courtesy of the EyeWire project, a crowd-sourced initiative that generates 3D reconstructions of neurons from serial section EM images [47]. This extensive dataset facilitated the derivation of a comprehensive connectivity matrix between two classes of mouse retinal neurons - BCs [37] and RGCs [36]. The data were sourced from the EyeWire Museum (https://museum.eyewire.org/), which offers detailed information for each cell in a JSON file, including attributes like “cell id”, “cell type”, “cell class”, and “stratification”. The stratification profile describes the linear density of voxel volume as a function of the inner plexiform layer (IPL) depth [47, 37, 36].

We approximated the connectivity metric between a BC and a RGC using the cosine similarity of their stratification profiles. Let v_ik and v_jl denote the stratification profiles of the k^th cell in BC type i and the l^th cell in RGC type j, respectively. The connectivity metric z_(ik)(jl) between these two neurons can be expressed as:

This equation represents the degree of overlap in their voxel volume profile within the IPL, resulting in the connectivity matrix Z between mouse BCs and RGCs. To allow for both positive and negative values within the matrix, we standardized by subtracting the mean of and then dividing by its standard deviation. Subsequently, the connectivity matrix between mouse BC and RGC neuronal types was calculated, with each element representing the average of the connectivity metrics between cells of BC type i and cells of RGC type j.

4.2.3 Correspondence of Mouse Retinal Cell Types

Aligning neuronal types as annotated in the single-cell transcriptomic data and those identified in the connectomic data was informed by findings from previous studies. Notably, a one-to-one correspondence exists between BC cell types classified by Shekhar et al. [34] and Greene et al. [37]. This correspondence is presented in Table S1.

Regarding RGC types, alignment between cell types annotated in Tran et al. [35] and Bae et al. [36] was established primarily based on the findings from Goetz et al. [38]. This study presents a unified classification of mouse RGC types, based on their functional, morphological, and gene expression features. The corresponding RGC types were mainly obtained from Supplementary Table S3 of Goetz et al. (Table S2), with additions derived from Supplementary Table S1 of Tran et al., based on the expressions of genetic markers of these RGC types (Table S3).

5 Model Training, Validation, and Comparison

Our approach of training and validating the bilinear model involved an iterative optimization of transformation matrices using the AGD algorithm, as outlined in Section 3. The primary goal was to minimize the defined loss function. With the matrices initially generated from a standard normal distribution, the optimization process continued until the loss change was less than a threshold of 10⁻⁶, or a maximum of 10⁶ iterations were completed.

During optimization, we focused on two key hyperparameters: the regularization parameters, λ_A and λ_B, and the latent feature space dimensionality. Preliminary tests indicated that a lower loss was achieved when both regularization parameters were set equally, leading us to consolidate them into a single parameter, λ.

5.1 C. elegans Neuronal Dataset

For the C. elegans dataset, which provides simultaneous gene expression and connectivity data for individual cells, we employed the model configuration described in section 3.1. The model’s hyperparameters, λ and the latent feature space dimensionality, were fine-tuned through 5-fold cross-validation, exploring a range of values for λ and different dimensions for the latent feature space. The optimal hyperparameters were identified based on the lowest validation loss observed during cross-validation (Figure S1).

Given the prior utilization of this dataset in validating the SCM proposed by Kovács et al. [25], our bilinear model was positioned for a direct comparison with the SCM. The SCM introduced a rule matrix O with the aim to minimize the discrepancy between the observed connectivity and the gene expression-based predicted connectivity XOX^T, employing L2 regularization on O. Our bilinear model echoes this approach, where we seek to minimize the divergence between the connectivity matrix and the bilinearly predicted connectivity XA(XB)^T, with L2 regularization imposed on matrices A and B. In essence, the bilinear form decomposes the rule matrix into two lower-dimensional matrices, which represent projections onto latent dimensions.

To quantitatively compare the bilinear model’s transformation matrix product Ô = AB^T with the SCM’s rule matrix O, and to systematically identify the genetic interaction each model uniquely captured, we introduced the discrepancy score (DS). For each pair of corresponding entries in the matrices at indices i and j, the DS is calculated as follows:

This metric, ranging from 0 to 1, quantifies the relative discrepancy between the two matrices, normalizing it in relation to their magnitudes. A score close to 1 indicates a large discrepancy, while a score near 0 suggests a negligible difference between the entries. Through this lens, we can further scrutinize the corresponding entries with the score above a certain threshold to reveal specific genetic interactions captured by one model but potentially missed by the other.

5.2 Mouse Retinal Neuronal Dataset

The model’s application to the mouse retina dataset, which involves gene expression and connectivity data from disparate sources, was facilitated by the approach outlined in section 3.2. Optimal hyperparameters were determined through 5-fold cross-validation, adjusting λ and exploring various dimensionalities for the latent feature space (Figure S2). Notably, the lowest validation loss was achieved with the dimensionality of two. Given the chosen hyperparameters, we performed the final round of training on the entire dataset to yield the definitive transformation matrices Â and .

To assess the consistency of our model under PCA pre-processing across different replicates, we repeated the optimization procedure five times, each time adhering to the previously identified optimal hyperparameters. In the context of our solution, where Â = [u₁ u₂] and , with vectors u₁, v₁ representing coefficients for the first latent dimension and u₂, v₂ for the second, we noted that negating the coefficients of any latent dimension in both matrices (for instance, Â = [−u₁ u₂] and results in an equivalent solution. Therefore, to compare solutions across different repetitions, we calculated the absolute value of cosine similarity for each latent dimension’s coefficient vectors, and reported the similarity between solutions as the average of the values across the two latent dimensions. Moreover, we recognized that swapping the positions of the coefficient vectors (yielding Â = [u₂ u₁] and ) also leads to an equivalent solution. To accommodate this, we evaluated both the original and swapped vector pairings for each repetition. The final measure of consistency was determined by taking the maximum of the two average absolute cosine similarities, ensuring a comprehensive and robust assessment of solution consistency across multiple runs.

We observed a high degree of consistency across multiple repetitions of the solutions under PCA pre-processing (Figure S3). The majority of the average absolute cosine similarity scores are close to 1, and even the minimum observed similarities are well above 0.75, suggesting that the optimization yields stable solutions.

6 Results

6.1 Comparative Analysis using C. elegans Neuronal Data

6.1.1 Reconstruction of C. elegans Gap Junction Connectivity from Innexin Expressions

Utilizing the C. elegans neuronal dataset, we first tried to reconstruct the gap junction connectivity network based solely on the expression profiles of innexin genes. Using A and B generated by the bilinear model, we processed the innexin expression data to predict gap junction connectivity between neuron pairs as XA(Y B)^T (Figure 2a). This approach was then compared to the SCM proposed by Kovács et al. [25], which used a rule matrix O to correlate gene expression with observed connectivity in the form of XOX^T (Figure 2b).

Reconstructed gap junction connectivity from innexin expression data. (a) Connectivity matrix predicted by the bilinear model. (b) Connectivity matrix modeled from Kovács et al.’s SCM. (c) Observed gap junction connectivity matrix, serving as ground truth. The color spectrum from red to gray denotes the spectrum from strong connections to weak or no connections. (d) ROC Curves from both the bilinear model and the SCM. Dashed line indicates the chance level.

The effectiveness of both models was evaluated against the observed gap junction connectivity matrix of C. elegans neurons (Figure 2c). Given the binary nature of the ground truth matrix (where 1 denotes a connection and 0 indicates its absence) and the continuous nature of reconstructed connectivity matrices from both models, we conducted Receiver Operating Characteristic (ROC) analysis. This involves varying a threshold to binarize the continuous predictions, under which the true positive rate is plotted against the false positive rate for each possible cutoff. This process yields the ROC curve, which is a graphical representation of the trade-off between sensitivity and specificity at various thresholds (Figure 2d).

Subsequently, we calculated the Area Under the ROC Curve (AUC), providing a singular value summarizing the overall predictive performance of the model across all thresholds. The ROC-AUC metric is particularly informative as it aggregates the model’s effectiveness over all possible thresholds, with a score of 1 indicating perfect prediction and 0.5 denoting a performance no better than random chance. From the calculation, the bilinear model achieved a ROC-AUC score of 0.6435, slightly surpassing the SCM’s score of 0.6428. While both scores are reasonably close, the slight edge of the bilinear model indicates its nuanced efficiency in mapping gene expressions to connectivity. However, it is noteworthy that both scores, while above 0.5, are substantially distant from the ideal score of 1. This observation suggests that relying exclusively on innexin expression data might be insufficient for fully capturing the detailed gap junction connectivity in C. elegans.

6.1.2 Comparison of Rule Matrix from SCM and Bilinear Transformation Matrices

In light of the challenge in fully capturing the C. elegans gap junction connectivity based on innexin expression data alone, instead of analyzing connectivity motifs between C. elegans neurons, our focus pivoted towards exploring and comparing the genetic rules inferred by both the bilinear model and the SCM, which was also the key discussion presented in Kovács et al. [25]. As mentioned in Section 5.1 and discussed in Section 7, the product of the bilinear transformation matrices, Ô = AB^T, can be interpreted as a lower-dimensional reconstruction of the rule matrix O used in the SCM. This perspective steered us to a meticulous comparative analysis between the two matrices.

The rule matrix solved from the SCM establishes a baseline for the comparison (Figure 3b). Against that, we compared the product of the bilinear transformation matrices (Figure 3a). Visualization of the two matrices suggests a high degree of similarity between them, which is quantitatively supported by a Pearson correlation coefficient of 0.90 (p < 0.001), underscoring a strong alignment.

Genetic rules from the bilinear model and the SCM. (a) The rule matrix AB^T derived from the bilinear model. (b) The rule matrix O from the SCM. Black boxes highlight entries with substantial differences.

To discern specific genetic interactions uniquely characterized by each model, we applied the DS metric to corresponding matrix entries (Figure S4a). This metric, ranging from 0 (no discrepancy) to 1 (maximum discrepancy), was thresholded at 0.5 to highlight entries with substantial differences. Further, to account for the regularization effect that pushes less important coefficients toward zero, we filtered out entry pairs where both values were less than 0.1 (Figure S4b,c). The remaining pairs are highlighted in black boxes in both matrices (Figure 3).

Comparing the values of highlighted entry pairs, we found that the bilinear model not only captured all genetic interactions identified by the SCM but also inferred additional ones: certain innexins (inx-11, inx-8, inx-5, and inx-2) were implicated in co-expression patterns within connected neurons, while another set (inx-11, inx-9, inx-3, inx-5, inx-7) was associated with an avoidance pattern, suggesting a lack of co-expression in neuron pairs forming gap junctions. These findings provide extra candidates to be tested in future experiments.

6.2 Application of Bilinear Model to Mouse Retinal Neuronal Data

6.2.1 Bilinear Model Reconstructs Neuronal Type-Specific Connectivity Map from Gene Expression Profiles

In our application of the bilinear model to the mouse retinal neuronal data, upon completion of the final training process, our optimized bilinear model produced transformation matrices, Â and . We used these matrices to project the normalized single-cell transcriptomic data, and Ŷ, into a shared latent feature space. Consequently, we obtained projected representations for BC and RGC types, and , respectively. With these latent representations, we were able to reconstruct the cell-type-specific connectivity matrix: (Figure 4a).

Reconstruction of connectivity map from gene expression profiles. (a) The reconstructed connectivity matrix, derived from the shared latent feature space projections. (b) The connectivity matrix obtained from connectomic data. Differences in color intensity represent the strength of connections, with dark red indicating strong connections and dark blue indicating weak or no connections.

To evaluate our model, we compared the reconstructed connectivity matrix with the one derived from connectomic data (Figure 4b). We calculated the Pearson correlation coefficient between entries of the two matrices to assess their agreement. The resulting correlation of 0.83 (p < 0.001) demonstrated a robust association between the transformed gene expression features and the connectomic data. This result attests to our model’s capability in capturing the relationship between these two distinct types of biological information.

To gain insights into our model’s reconstruction accuracy, we employed the DS metric to identify entries with substantial deviations between the reconstructed and the actual connectivity matrices (Figure S5a). This examination specifically quantified the extent of connections in the target matrix (positive entries) that were not captured in the model’s reconstruction (negative entries) (Figure S5b,c). Notably, the analysis revealed that only a small fraction, specifically 9 out of 115 connections, were not represented in the reconstructed matrix.

6.2.2 Bilinear Model Recapitulates Recognized Connectivity Motifs

Our cross-validation procedure indicated that the optimal number of latent dimensions was two (Figure S2). This finding suggested that these two dimensions capture the essential connectivity motifs between BC and RGC types. This led us to further investigate what are these motifs and how they are different from each other.

We first reconstructed connectivity using only the first latent dimension. The first dimension appeared to emphasize connectivity patterns between BCs and RGCs that laminate within the IPL’s central region, as well as those that laminate within the marginal region (Figure 5a,d,g). We then reconstructed connectivity using only the second latent dimension. Notably, the spotlight shifted to connections between BCs and RGCs that laminate within the outer and inner regions of the IPL, respectively (Figure 5b,e,i).

Distinct connectivity motifs revealed by the two latent dimensions. (a, b) The reconstructed connectivity using only latent dimension 1 or 2, respectively. Differences in color intensity represent the strength of connections. (c) BC types plotted in the latent feature space, with each point representing a specific BC type. Dashed lines indicate zero values for latent dimensions 1 and 2. (d, e) Stratification profiles of BC types in IPL, color-coded based on their positions along the first (d) or second (e) latent dimension. Red indicates BC types on the positive half, while blue indicates BC types on the negative half. (f) RGC types plotted in the latent feature space, with each point representing a specific RGC type. (g, h) Stratification profiles of RGC types in IPL, color-coded based on their positions along the first (g) or second (h) latent dimension. Dashed lines in (d) and (g) mark the positions of ON and OFF SACs [36]. BCs and RGCs stratifying between them tend to exhibit more transient responses, and those stratifying outside them exhibit more sustained responses. Dashed lines in (e) and (h) denote the boundary of the outer and inner IPL [36]. Synapses between BCs and RGCs in the outer retina mediate OFF responses, while those in the inner retina mediate ON responses.

To confirm these observations, we further visualized BC and RGC types within the two-dimensional latent feature space (Figure 5c,f). Grouping BC and RGC types based on whether they fell within the positive or negative halves of the latent dimensions, we color-coded their stratification profiles within the IPL by group. BCs and RGCs that fell within the positive half of latent dimension 1 tend to stratify within the IPL’s central region, delineated by the boundaries formed by the ON and OFF starburst amacrine cells (SACs) (Figure 5d,g). Conversely, those falling within the negative half of this dimension tend to stratify in the marginal region of the IPL. As for the second latent dimension, BCs and RGCs that fell within the positive half predominantly stratify in the inner region of the IPL (Figure 5e,i), while those within the negative half primarily stratify in the IPL’s outer region.

Interestingly, these distinct connectivity motifs align with two widely recognized properties of retinal neurons: kinetic attributes that reflect the temporal dynamics (transient versus sustained responses) of a neuron responding to visual stimuli, and polarity (ON versus OFF responses) reflecting whether a neuron responds to the initiation or cessation of a stimulus [48, 11, 12, 49]. This correlation implies that our bilinear model has successfully captured key aspects of retinal circuitry from gene expression data.

6.2.3 Bilinear Model Reveals Interpretable Insights into Gene Signatures Associated with Different Connectivity Motifs

The inherent linearity of our bilinear model affords a significant advantage: it enables the direct interpretation of gene expressions by examining their associated weights in the model. These weights signify the importance of each gene in determining the connectivity motifs between the BC and RGC types. We identified the top 50 genes with the largest positive or negative weights for BCs and RGCs across both latent dimensions. We plotted their weights alongside their expression profiles in the respective cell types (Figure 6).

Gene signatures associated with the two latent dimensions. (a, b) Weight vectors of the top 50 genes for latent dimension 1, along with their expression patterns in BC types (a) and RGC types (b). The weight value is indicated in the color bar, with the sign represented by color (red: positive and blue: negative), and the magnitude by saturation. The expression pattern is represented by the size of each dot (indicating the percentage of cells expressing the gene) and the color saturation (representing the gene expression level). BC and RGC types are sorted by their positions along latent dimension 1, as shown in Figure 5c,f, with the dashed line separating the positive category from the negative category. (c, d) Weight vectors of the top 50 genes for latent dimension 2, and their expression patterns in BC types (c) and RGC types (d), depicted in the same manner as in (a) and (b). BC and RGC types are sorted by their positions along latent dimension 2.

Our analysis unveiled distinct gene signatures associated with the connectivity motifs revealed by the two latent dimensions. In the first latent dimension, genes like CDH11 and EPHA3, involved in cell adhesion and axon guidance, carried high weights for BCs forming synapses in the IPL’s central region. In contrast, for BCs synapsing in the marginal region, we observed high weights in the cell adhesion molecule PCDH9 and the axon guidance cue UNC5D (Figure 6a). This pattern was echoed in RGCs but involved a slightly different set of molecules. For example, in RGCs forming synapses in the IPL’s central region, the cell adhesion molecule PCDH7 carried high weights, whereas for RGCs synapsing in the marginal region, cell adhesion molecules PCDH11X and CDH12 were associated with high weights (Figure 6b).

The second latent dimension revealed a comparable pattern, albeit with different gene signatures. For BCs laminating in the IPL’s outer region, high weights were assigned with guidance cues such as SLIT2, NLGN1, EPHA3 and PLXNA4, as well as the adhesion molecule DSCAM. For BCs in the inner region, the adhesion molecule CNTN5 was associated with a high weight (Figure 6c). In RGCs, we noticed that guidance molecules such as PLXNA2, SLITRK6 and PLXNA4 along with adhesion modules CDH8 and LRRC4C were associated with high weights for cells forming synapses in the IPL’s outer region. In contrast, the adhesion molecule SDK2 was among the top genes for RGCs laminating and forming synapses in the IPL’s inner region (Figure 6d). Some of these genes or gene families, such as Plexins (PLXNA2, PLXNA4), Contactin5 (CNTN5), Sidekick2 (SDK2), and Cadherins (CDH8,11,12), are known to play crucial roles in establishing specific synaptic connections [50, 51, 52, 53, 32, 33, 54]. Others, particularly delta-protocadherins (PCDH7,9,11x), emerged as new candidates potentially mediating specific synaptic connections [3].

To elucidate the biological implications of these identified gene sets, we further conducted Gene Ontology (GO) enrichment analysis on the top genes through g:Profiler, a public web server for GO enrichment analysis [55, 56]. This tool allowed us to delve into the molecular functions, cellular pathways, and biological processes associated with these genes. Intriguingly, when we listed the top 10 significant GO terms for each latent dimension based on their adjusted p-values, we found two common themes: neuronal development and synaptic organization (Table S4). Table S4 also highlights the number of the top genes associated with each GO term, revealing that overall about 47% of these genes are involved in neural development and synaptic organization. Such findings underscore the potential roles of these genes in forming and shaping the specific connections between BC and RGC types.

6.2.4 Bilinear Model Predicts Connectivity Partners of Transcriptomically-Defined RGC Types

The success of recommendation systems in accurately predicting the preferences of new users inspired us to leverage the bilinear model for predicting the connectivity partners of RGC types whose interconnections with BC types remain uncharted. There are some RGC types defined from single-cell transcriptomic data [35], which lack clear correspondence with those identified through connectomics studies [36]. This discrepancy leaves the connectivity patterns of these transcriptionally-defined RGC types unknown, providing an opportunity for our model to predict their BC partners.

To accomplish this, we first projected these RGC types into the same latent space as those used to train the model (Figure 7a). We then employed this projection to construct a connectivity matrix between these RGC types and BC types (Figure 7b), facilitating educated estimates about their connectivity partners. For each transcriptionally-defined RGC type, we identified the top three BC types as potential partners, determined by the highest values present in the connectivity matrix. These three BC types could provide insight into the potential synaptic input to each RGC type. Detailed predictions are presented in Table S5.

BC partner prediction of transcriptionally-defined RGC types. (a) Projection of transcriptionally-defined RGC types with unknown connectivity into the same latent space as those with known connectivity. (b) The resulting predicted connectivity matrix between these RGC types and BC types. Transcriptionally-defined RGC types are named according to Tran et al. [35]

Although the ground truth connectivity of these RGC types remains unknown due to the absence of matching types in connectomic data, Goetz et al. [38], via Patch-seq, attempted to match some transcriptomic types with functionally defined RGC types. These functional descriptions may hint at the BC partners of these RGC types. For instance, an RGC exhibiting OFF sustained responses is likely to be synaptically linked with BC types bc1-2, known to mediate OFF sustained pathways. Conversely, an RGC that displays ON sustained responses likely receives synaptic inputs from BC types bc6-9, which oversee ON sustained pathways. We summarized these functional descriptions in Table S5, referencing Figure 5A from Goetz et al. [38], and highlighted whether our predictions were consistent with these functional annotations. Among the ten predictions made, eight aligned with these functional descriptions, lending support to the predictive power of our model.

7 Discussion

7.1 Summary of Study

This study showcased a novel application of the bilinear modeling approach within the realm of gene expression analysis of neuronal type connectivity, drawing inspiration from recommendation systems - a machine learning domain focused on capturing intricate interactions between users and items and predicting user preferences. This analogy served as a useful framework in our study, where the roles of users and items in the recommendation systems are mirrored by presynaptic and postsynaptic neurons, respectively. Likewise, the user-item preference matrix corresponds to the synaptic connection matrix in neural circuits. The recommendation systems are based on the assumption that user preferences and item attributes can be represented by latent factors; similarly, our model assumes that synaptic connectivity between various neuron types is determined by a shared latent feature space derived from gene expression profiles.

The applicability and effectiveness of our bilinear model were validated using two different datasets. Applying it to the C. elegans neuronal dataset, which include data of gap junction connectivity and innexin expression at the individual neuron level, we showed that the model could be generalized to single-cell level connectivity by treating each neuronal type as an individual cell (Section 3.1), and incorporate spatial constraints such as physical contact between neurons into the weight matrix (Section 4.1). In a more complex scenario where the transcriptomic and connectomic data are from different sources and aligned at the neuronal-type level, we demonstrated the model’s capability in decoding the genetic underlying of the connectivity between neuronal types (Section 3.2), using the mouse retinal neuronal dataset (Section 4.2). This emphasizes the model’s potential in offering insights into the genetic mechanisms that orchestrate synaptic connections across various nervous systems.

7.2 Insights from Analysis of C. elegans Dataset and Comparison with SCM

Using the C. elegans neuronal dataset, we conducted a comparative analysis between our bilinear model and the SCM, which correlates neuronal innexin expression with gap junction connectivity via a rule matrix [25, 26]. The SCM incorporates spatial constraints, such as physical contact between neurons, and represents the connectome as an edge list for regression against the Kronecker product of the gene expression matrix. Our model is closely related to the SCM, as it can be seen as factorization of the rule matrix into the product of two lower-dimensional transformation matrices. This factorization not only yielded a performance comparable to, if slightly better than, the SCM in reconstructing the gap junction connectivity matrix, but also revealing potential new innexin interactions for experimental exploration (Figure 2; Figure 3).

Beyond these, a crucial advantage of our bilinear model lies in in its computational efficiency, an attribute of significance when scaling to larger datasets, where the number of genes and the number of neurons or neuronal types escalates to the order of thousands, such as those of the mouse or macaque cortex [57, 58]. In such situation, the computational complexity of the SCM is substantial, given its reliance on the Kronecker product’s dimensions and subsequent matrix inversion. In contrast, the computational demands of our bilinear model, driven primarily by matrix multiplication during gradient descent, are considerably more manageable, offering scalability and feasibility even as dataset sizes increase. Furthermore, the requirement to calculate the Kronecker product in SCM significantly heightens memory usage, critical when the data scale is large but memory resources are constrained. These advantages ensure our bilinear model a scalable solution when applied to other organisms and brain regions.

In assessing the bilinear model’s and the SCM’s performance to reconstruct C. elegans gap junction connectivity, the resulting modest ROC-AUC scores (approximately 0.64, much lower than the ideal 1.0) underscore the challenges in predicting electrical synapse specificity using innexion expressions alone. This suggests that additional molecular mechanisms, beyond innexin interactions, play crucial roles in forming specific electrical synaptic connections. Indeed, in the realm of chemical synapses, it’s increasingly recognized that synaptic specificity is significantly influenced by factors such as cell-cell adhesion and recognition molecules, rather than just the pre- or post-synaptic machinery [3]. Recent studies support this viewpoint. For instance, research on the C. elegans motor circuit has revealed how a developmental program fine-tunes cAMP signaling to guide neuron-specific assembly of electrical synapses [59]. Furthermore, the observed coexistence of electrical and chemical synapses in close proximity intimates potential shared mechanisms underlying their specificity [60].

7.3 Insights from Application to Mouse Retinal Neuronal Dataset

Applying to the mouse retinal neuronal dataset, our bilinear model successfully reconstructed a neuronal type-specific connectivity map from gene expression profiles and recapitulated two core connectivity motifs of the retinal circuit, representing synapses formed in central or marginal parts of the IPL, and synapses formed in outer or inner regions (Figure 4; Figure 5). These motifs align well with recognized properties of retinal neurons: kinetic attributes (transient versus sustained responses) and polarity (ON versus OFF responses) [48, 11, 12, 49]. Significantly, these motifs aren’t predefined or explicitly encoded into the model; instead, they emerge naturally from the model, further attesting to the model’s power to capture key aspects of retinal circuitry.

The bilinear model also revealed unique insights into the gene signatures associated with the connectivity motifs. The weight vectors in the transformation matrices provide a means to assess the relative importance of individual genes. This direct interpretability is a significant advantage of the linear model, allowing for a more intuitive understanding of the gene-to-connectivity transformation process. Our analysis discovered distinct gene signatures associated with different connectivity motifs (Figure 6). Among these genes, some have been previously implicated in mediating specific synaptic connections, thererby validating our approach. For instance, Plexins A4 and A2 (PLXNA4, PLXNA2), predicted to be crucial for RGCs’ synapsing in the outer IPL, have been shown to be necessary for forming specific lamina of the IPL in the mouse retina, interacting with the guidance molecule Semaphorin 6A (SEM6A) [50, 51]. Contactin5 (CNTN5), which our model predicted as vital for BCs forming synapses in the inner IPL, has been shown to be essential for synapses between ON BCs and the ON lamina of ON-OFF direction-selective ganglion cells (ooDSGCs) [52]. Sidekick2 (SDK2), predicted to be critical for RGCs’ synapses in the inner IPL, has been shown to guide the formation of a retinal circuit that detects differential motion [53]. Similarly, Cadherins (CDH8,11,12), whose combinations have been implicated in synaptic specificity within retinal circuits [32, 33], were highlighted for multiple connectivity motifs. In particular, Cadherin8 (CDH8), which our model predicted to be crucial for RGC’s synaptic connections in the outer IPL, has been shown to be guided by the transciptional factor Tbr1 for laminar patterning of J-RGCs, a type of OFF direction-selective RGCs [54]. In addition to these validated gene signatures, our analysis identified promising candidate genes that may mediate specific synaptic connections. Particularly, delta-protocadherins (PCDH7,9,11x) appeared as potential new candidates. While their roles in synaptic connectivity aren’t fully understood [3], mutations in delta-protocadherins in mice and humans have been linked with various neurological phenotypes, including axon growth and guidance impairments and changes in synaptic plasticity and stability [61, 62, 63, 64]. Future experimental studies are needed to validate these findings and further unravel the roles of these genes in neural circuit formation and function in the mouse retina.

The bilinear model’s utility extends beyond the identification of gene signatures, emerging as a potent tool for hypothesis generation, particularly in predicting connectivity for transcriptionallydefined neuronal types whose synaptic partners remain uncharted (Figure 7). Trained on data from a specific neural region, the bilinear model can facilitate the anticipation of synaptic partners for newly characterized transcriptional types within that region, thereby generating hypotheses on their functional roles within neural circuits. Furthermore, this model opens avenues for inferring neural wiring alterations resulting from genetic manipulations. For instance, by altering the genetic profile of certain neuronal types to create new transcriptionally defined types, we can use the model to predict changes in their synaptic partners, offering insights into the consequent reconfiguration of neural networks. This could be further extended to hypothesize the rewiring of the brain under psychological disorders, such as autism, where significant connectome changes suggest shifts in synaptic partner choices [65, 66]. With recent availability of neuronal gene expression data of autism [67, 68, 69], our model stands poised to predict the implications of such genetic profiles on neural circuitry, guiding the research of understanding and treating this psychological disorder.

While our bilinear model offers valuable insights into the connectivity motifs of retinal circuits and the associated gene signatures, with many findings aligning with existing literature, it is important to acknowledge certain limitations of this study. Firstly, the model’s connectivity matrix was deduced from stratification profiles derived from EM reconstruction. Although prior research has indicated stratification as a meaningful indicator of connectivity within the mouse retina, as certain BC types preferentially connect with specific RGC types stratified in the same lamina [32, 53, 33], this metric may not capture the entire complexity of synaptic connections [70]. The incorporation of additional experimental data, such as electrophysiological measurements, could enhance both the accuracy and the reliability of the connectivity metrics. Secondly, the model, despite its overall success in reconstructing the connectivity matrix, missed several connections, notably among specific BC-RGC pairs such as those between RGC types 51, 5ti and BC types 3a, 3b, and 4 (Figure S5). This highlights the potential for a more complex approach, such as deep learning models, to capture the subtleties of synaptic connections. Finally, the list of top genes identified by our model is enriched with genes directly mediating synapse formation and maintenance, such as adhesion molecules (Figure 6; Table S4), yet overlooks transcription factors like Tbr1 known to affect synaptic specificity [54]. These factors, impacting various neuronal functionalities, might not be captured by a linear model that inherently favors predictor variables that strongly correlate with the target variable.

8 Future Directions

8.1 Experiment Validation of Candidate Genes

The bilinear model enables the predictions of possible changes in synaptic connections resulting from changes in expressions of the candidate genes. Emerging genome editing technologies, particularly CRISPR/Cas9 [71, 72], offers a precise and efficient way to validate these predictions through experiments. By leveraging CRISPR/Cas9, targeted genetic manipulations, such as gene silencing or modification, can be conducted to assess their impact on synaptic connectivity. In the context of the mouse retina, the delivery of CRISPR/Cas9 components into BCs or RGCs can be achieved through electroporation or adeno-associated virus (AAV) vectors, respectively, allowing for targeted gene intervention [73, 74].

The finding of delta-protocadherins (PCDH7, 9, 11x) as potential mediators of synaptic specificity in the mouse retina presents an exciting opportunity for experimental exploration. We propose to design CRISPR/Cas9 systems targeting these delta-protocadherins (PCDH7,9,11x), similar to those detailed in a recent study [75]. Delivered to the mouse retina using AAV vectors, we expect to knockdown delta-protocadherin expressions in RGCs [74]. With PCDH7 identified as a key factor in synapse formation within the central regions of the IPL, a focal point of our investigation will be RGC types like W3B RGCs, which are known to stratify in these central layers [76]. The consequences of PCDH7 downregulation on the connectivity of W3B RGCs can be examined through multiple approaches [53]: immunohistochemical techniques or the use of transgenic markers can reveal morphological changes indicative of altered connectivity; electrophysiological assessments, such as targeted recordings from postsynaptic neurons while optogenetically stimulating presynaptic partners, offer a functional probe into the synaptic alterations. Similarly, as PCDH9 and PCDH11x are implicated in synaptic connections within the marginal regions of the IPL, candidate RGCs for examination could include ON and OFF sustained alpha RGCs, known for their peripheral stratifications [77].

This experimental paradigm is not confined to the mouse retina but extends to a broad range of neuronal circuits, thanks to the flexibility and wide applicability of genome editing tools like CRISPR/Cas9 [78, 79, 80]. The capacity to induce targeted gene knockouts or modifications will empower researchers to validate our bilinear model’s predictions and explore the underlying genetic mechanisms for synaptic formation and maintenance. This endeavor opens new avenues for deciphering the complex interplay between genetics and neural circuit wiring, furthering our comprehension of the molecular mechanisms driving synaptic specificity.

8.2 Application to Other Neural Systems

Our bilinear model, while illustrated using the C. elegans and mouse retina datasets, holds significant potential for elucidating the genetic underpinnings of neuronal connectivity across various species and brain regions, contingent upon the availability of comprehensive gene expression profiles and synaptic connection data. For instance, the advent of a comprehensive single-cell transcriptome atlas for the adult fruit fly brain, alongside the recent establishment of its complete connectome, offers a fertile ground for extending our model to decipher the complex neural circuits of Drosophila [81, 82].

In the context of the mouse brain, the depth and breadth of single-cell sequencing efforts have unveiled a rich tapestry of transcriptomic cell types across cortex regions and the hippocampus [83, 84, 85, 57]. These efforts, in tandem with connectomic studies that meticulously map neuronal connections, lay a foundation for integrating transcriptomic and connectomic data [86, 87, 46, 88]. Such integration, especially across diverse brain regions, presents an exciting avenue to uncover both neuronal connection mechanisms that are shared by neuronal types across different regions and those unique to specific regions. The scalability of our bilinear model, akin to collaborative filtering’s effectiveness in commercial domains, supports the prospect of its cross-regional application. This approach positions our model at the forefront of efforts to explore how gene expression patterns contribute to the diversity of neuronal circuits across brain areas, moving us closer to a holistic understanding of the genetic blueprint of neuronal connectivity throughout the entire brain.

Nevertheless, we recognize the challenge that such well-aligned connectomic and transcriptomic data may not always be readily available. To address this, future research endeavors will also explore adaptations of our model to other available datasets, such as those that combine single-cell transcriptomic profiling with long-range neuronal projection mapping [89, 90]. Furthermore, our model is amenable to integration with trans-synaptic tracer-based sequencing methods [91, 92], expanding its utility in studies where detailed connectomic information is limited. Pursuing these avenues is pivotal in broadening the model’s utility and ensuring its relevance across a wider spectrum of brain connectivity research, making it an invaluable tool in the quest to unravel the complexities of neural circuitry.

8.3 Model Advancements

To enhance the model’s fidelity and applicability, we propose several advancements. First, we advocate for the integration of auxiliary data types, including electrophysiological data, neuron tracing data, and an array of omics data such as proteomics and epigenetics data, to augment and enrich the model’s training base [49, 91, 92, 93, 94]. These data modalities offer complementary insights into neuronal function and connectivity, providing valuable context that can inform and refine the model’s predictions.

Second, we envision extending the bilinear model to incorporate non-linear interactions, capturing the intricate dynamics between gene expressions and synaptic connections. A potential pathway for this is through kernel methods or the integration of neural networks, specifically adopting the “two-tower model” framework renowned in modern recommendation systems (Figure 8). In this model, each “tower” is a deep neural network that undertakes the non-linear transformation of input features [95, 96]. This architecture has proven effective in capturing complex user-item interactions and could significantly enhance our model’s ability to decipher the nuanced relationships between genetics and neural connectivity.

Future direction: A two-tower deep learning model. (a) Gene expression profiles of pre- and post-synaptic neurons are transformed into latent embedding representations via deep neural networks. The connectivity metric between the pre- and post-synaptic neurons is predicted by taking the inner product of their respective latent embeddings.

9 Data and Code Availability

Pointers to the data used in this study and the source code of the bilinear model are available at https://github.com/muqiao0626/Bilinear_Model.

10 Supplementary Materials

Hyperparameter selection through cross-validation for the *C. elegans* neuronal dataset. (a) Heatmap plot of the logarithm (base 10) of the validation loss, showing variations with respect to λ across [10⁻⁸, 10⁻⁶, 0.0001, 0.01, 1] and dimensionality across [2, 4, 6, 8, 10, 12, 14, 16]. (b) Plot showing the logarithm (base 10) of the validation loss against λ over the range [10⁻⁸, 10⁻⁶, 0.0001, 0.01, 1]. (c) Plot displaying the logarithm (base 10) of the validation loss against dimensionality over the range [2, 4, 6, 8, 10, 12, 14, 16].

Hyperparameter selection through cross-validation for the mouse retinal neuronal dataset. (a) Heatmap plot of the logarithm (base 10) of the validation loss, showing variations with respect to λ across [0.1, 1, 10, 100] and dimensionality across [1, 2, 3, 4, 8]. (b) Plot showing the logarithm (base 10) of the validation loss against λ over the range [0.1, 1, 10, 100]. (c) Plot displaying the logarithm (base 10) of the validation loss against dimensionality over the range [1, 2, 3, 4, 8].

Heatmaps showcasing the average absolute cosine similarities across five optimization repetitions for (a) Â and (b) . The color scale reflects value of the metric.

Detailed discrepancy analysis between the bilinear model and SCM genetic rules. (a) Discrepancy scores (DS) identifying divergences between the models’ rule matrices. (b, c) Significant entries from the bilinear model’s rule matrix (b) and the SCM’s rule matrix (c), respectively, with DS exceeding 0.5 and matrix entries no less than 0.1.

Detailed discrepancy analysis between the reconstructed and the target connectivity matrices. (a) Discrepancy scores (DS) identifying divergences between the two matrices. (b, c) Specific connections present in the target matrix (c) that were not captured in the reconstructed matrix (b), with DS exceeding 0.5, indicating notable deviations.

Correspondence of Mouse BC types [37, 34]

Correspondence of Mouse RGC types [36, 35, 38]

Gene Ontology (GO) Terms Associated with Latent Dimensions in BCs and RGCs

Predicted BC Partners of Transciptionally-defined RGC Types

References

[1]
1. Seung Sebastian
2012Connectome: How the Brain’s Wiring Makes Us Who We Are
[2]
1. Polleux F.
2. Snider William
2010Initiating and growing an axonCold Spring Harb Perspect Biol 2
[3]
1. Sanes Joshua R.
2. Zipursky S. Lawrence
2020Synaptic Specificity, Recognition Molecules, and Assembly of Neural CircuitsCell 181:1434–1435
[4]
1. Zeng Hongkui
2. Sanes Joshua R.
2017Neuronal cell-type classification: Challenges, opportunities and the path forwardNat Rev Neurosci 18:530–546
[5]
1. Stegle Oliver
2. Teichmann Sarah A.
3. Marioni John C.
2015Computational and analytical challenges in single-cell transcriptomicsNat Rev Genet 16:133–145
[6]
1. Denk Winfried
2. Horstmann Heinz
2004Serial Block-Face Scanning Electron Microscopy to Reconstruct Three-Dimensional Tissue NanostructurePLOS Biology 2
[7]
1. Helmstaedter Moritz
2. Briggman Kevin L.
3. Turaga Srinivas C.
4. Jain Viren
5. Seung H. Sebastian
6. Denk Winfried
2013Connectomic reconstruction of the inner plexiform layer in the mouse retinaNature 500:168–174
[8]
1. Tapia Juan Carlos
2. Kasthuri Narayanan
3. Hayworth Kenneth J.
4. Schalek Richard
5. Lichtman Jeff W.
6. Smith Stephen J.
7. Buchanan JoAnn
2012High-contrast en bloc staining of neuronal tissue for field emission scanning electron microscopyNat Protoc 7:193–206
[9]
1. Koren Yehuda
2. Bell Robert
3. Volinsky Chris
2009Matrix Factorization Techniques for Recommender SystemsComputer 42:30–37
[10]
1. Martin E. Anne
2. Lasseigne Abagael M.
3. Miller Adam C.
2020Understanding the Molecular and Cell Biological Mechanisms of Electrical Synapse FormationFront Neuroanat 14
[11]
1. Euler Thomas
2. Haverkamp Silke
3. Schubert Timm
4. Baden Tom
2014Retinal bipolar cells: Elementary building blocks of visionNat Rev Neurosci 15:507–519
[12]
1. Sanes Joshua R.
2. Masland Richard H.
2015The Types of Retinal Ganglion Cells: Current Status and Implications for Neuronal ClassificationAnnual Review of Neuroscience 38:221–246
[13]
1. Gollisch Tim
2. Meister Markus
2010Eye smarter than scientists believed: Neural computations in circuits of the retinaNeuron 65:150–164
[14]
1. da Silveira Rava Azeredo
2. Roska Botond
2011Cell types, circuits, computationCurr Opin Neurobiol 21:664–671
[15]
1. Kumar Nalin M.
2. Gilula Norton B.
1996The Gap Junction Communication ChannelCell 84:381–388
[16]
1. Phelan Pauline
2. Bacon Jonathan P.
3. Davies Jane A.
4. Stebbings Lucy A.
5. Todman Martin G.
1998Innexins: A family of invertebrate gap-junction proteinsTrends in Genetics 14:348–349
[17]
1. Rabinowitch Ithai
2. Schafer William R
2015Engineering new synaptic connections in the C. elegans connectomeWorm 4
[18]
1. Marcus Gary
2. Marblestone Adam
3. Dean Thomas
2014The atoms of neural computationScience 346:551–552
[19]
1. Südhof Thomas C.
2017Synaptic Neurexin Complexes: A Molecular Code for the Logic of Neural CircuitsCell 171:745–769
[20]
1. Hall David H.
2017Gap junctions in C. elegans: Their roles in behavior and developmentDev Neurobiol 77:587–596
[21]
1. de Wit Joris
2. Ghosh Anirvan
2016Specification of synaptic connectivity by cell surface interactionsNat Rev Neurosci 17:22–35
[22]
1. Fornito Alex
2. Arnatkevičiūtė Aurina
3. Fulcher Ben D.
2019Bridging the Gap between Connectome and TranscriptomeTrends Cogn Sci 23:34–50
[23]
1. Kaufman Alon
2. Dror Gideon
3. Meilijson Isaac
4. Ruppin Eytan
2006Gene Expression of Caenorhabditis elegans Neurons Carries Information on Their Synaptic ConnectivityPLoS Comput Biol 2
[24]
1. Varadan Vinay
2. Miller David M.
3. Anastassiou Dimitris
2006Computational inference of the molecular logic for synaptic connectivity in C. elegansBioinformatics 22:e497–506
[25]
1. Kovács István A.
2. Barabási Dániel L.
3. Barabási Albert-László
2020Uncovering the genetic blueprint of the C. elegans nervous systemProceedings of the National Academy of Sciences 117:33570–33577
[26]
1. Barabási Dániel L.
2. Barabási Albert-László
2020A Genetic Model of the ConnectomeNeuron 105:435–445
[27]
1. Taylor Seth R.
2. Santpere Gabriel
3. Weinreb Alexis
4. Barrett Alec
5. Reilly Molly B.
6. Xu Chuan
7. Varol Erdem
8. Oikonomou Panos
9. Glenwinkel Lori
10. McWhirter Rebecca
11. Poff Abigail
12. Basavaraju Manasa
13. Rafi Ibnul
14. Yemini Eviatar
15. Cook Steven J.
16. Abrams Alexander
17. Vidal Berta
18. Cros Cyril
19. Tavazoie Saeed
20. Sestan Nenad
21. Hammarlund Marc
22. Hobert Oliver
23. Miller David M.
2021Molecular topography of an entire nervous systemCell 184:4329–4347
[28]
1. Zeng Hongkui
2022What is a cell type and how to define it?Cell 185:2739–2755
[29]
1. Ricci Francesco
2. Rokach Lior
3. Shapira Bracha
4. Ricci Francesco
5. Rokach Lior
6. Shapira Bracha
7. Kantor Paul B.
2011Introduction to Recommender Systems HandbookRecommender Systems Handbook :1–35
[30]
1. Su Xiaoyuan
2. Khoshgoftaar Taghi M.
2009A Survey of Collaborative Filtering TechniquesAdvances in Artificial Intelligence 2009
[31]
1. Rendle Steffen
2. Freudenthaler Christoph
3. Gantner Zeno
4. Schmidt-Thieme Lars
2012BPR: Bayesian Personalized Ranking from Implicit FeedbackComment: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)
[32]
1. Duan Xin
2. Krishnaswamy Arjun
3. Huerta Irina De la
4. Sanes Joshua R.
2014Type II cadherins guide assembly of a direction-selective retinal circuitCell 158:793–807
[33]
1. Duan Xin
2. Krishnaswamy Arjun
3. Laboulaye Mallory A.
4. Liu Jinyue
5. Peng Yi-Rong
6. Yamagata Masahito
7. Toma Kenichi
8. Sanes Joshua R.
2018Cadherin Combinations Recruit Dendrites of Distinct Retinal Neurons to a Shared Interneuronal ScaffoldNeuron 99:1145–1154
[34]
1. Shekhar Karthik
2. Lapan Sylvain W.
3. Whitney Irene E.
4. Tran Nicholas M.
5. Macosko Evan Z.
6. Kowalczyk Monika
7. Adiconis Xian
8. Levin Joshua Z.
9. Nemesh James
10. Goldman Melissa
11. McCarroll Steven A.
12. Cepko Constance L.
13. Regev Aviv
14. Sanes Joshua R.
2016COMPREHENSIVE CLASSIFICATION OF RETINAL BIPOLAR NEURONS BY SINGLE-CELL TRANSCRIPTOMICSCell 166:1308–1323
[35]
1. Tran Nicholas M.
2. Shekhar Karthik
3. Whitney Irene E.
4. Jacobi Anne
5. Benhar Inbal
6. Hong Guosong
7. Yan Wenjun
8. Adiconis Xian
9. Arnold McKinzie E.
10. Lee Jung Min
11. Levin Joshua Z.
12. Lin Dingchang
13. Wang Chen
14. Lieber Charles M.
15. Regev Aviv
16. He Zhigang
17. Sanes Joshua R.
2019Single-cell profiles of retinal neurons differing in resilience to injury reveal neuroprotective genesbioRxiv
[36]
1. Bae J. Alexander
2. Mu Shang
3. Kim Jinseop S.
4. Turner Nicholas L.
5. Tartavull Ignacio
6. Kemnitz Nico
7. Jordan Chris S.
8. Norton Alex D.
9. Silversmith William M.
10. Prentki Rachel
11. Sorek Marissa
12. David Celia
13. Jones Devon L.
14. Bland Doug
15. Sterling Amy L. R.
16. Park Jungman
17. Briggman Kevin L.
18. Seung H. Sebastian
19. Eyewirers
2018Digital Museum of Retinal Ganglion Cells with Dense Anatomy and PhysiologyCell 173:1293–1306
[37]
1. Greene Matthew J.
2. Kim Jinseop S.
3. Seung H. Sebastian
4. EyeWirers
2016Analogous Convergence of Sustained and Transient Inputs in Parallel On and Off Pathways for Retinal Motion ComputationCell Rep 14:1892–1900
[38]
1. Goetz Jillian
2. Jessen Zachary F.
3. Jacobi Anne
4. Mani Adam
5. Cooler Sam
6. Greer Devon
7. Kadri Sabah
8. Segal Jeremy
9. Shekhar Karthik
10. Sanes Joshua R.
11. Schwartz Gregory W.
2022Unified classification of mouse retinal ganglion cells using function, morphology, and gene expressionCell Rep 40
[39]
1. Butler Andrew
2. Hoffman Paul
3. Smibert Peter
4. Papalexi Efthymia
5. Satija Rahul
2018Integrating single-cell transcriptomic data across different conditions, technologies, and speciesNat Biotechnol 36:411–420
[40]
1. Stuart Tim
2. Butler Andrew
3. Hoffman Paul
4. Hafemeister Christoph
5. Papalexi Efthymia
6. Mauck William M.
7. Hao Yuhan
8. Stoeckius Marlon
9. Smibert Peter
10. Satija Rahul
2019Comprehensive Integration of Single-Cell DataCell 177:1888–1902
[41]
1. Cook Steven J.
2. Jarrell Travis A.
3. Brittin Christopher A.
4. Wang Yi
5. Bloniarz Adam E.
6. Yakovlev Maksim A.
7. Nguyen Ken C. Q.
8. Tang Leo T.-H.
9. Bayer Emily A.
10. Duerr Hannes Janet S.
11. Bülow E.
12. Hobert Oliver
13. Hall David H.
14. Emmons Scott W.
2019Whole-animal connectomes of both Caenorhabditis elegans sexesNature 571:63–71
[42]
1. Qiao Mu
2023Factorized Discriminant Analysis for Genetic Signatures of Neuronal PhenotypesarXiv preprint
[43]
1. Chen Hung-I. Harry
2. Jin Yufang
3. Huang Yufei
4. Chen Yidong
2016Detection of high variability in gene expression from single-cell RNA-seq profilingBMC Genomics 17
[44]
1. Pandey Shristi
2. Shekhar Karthik
3. Regev Aviv
4. Schier Alexander F.
2018Comprehensive Identification and Spatial Mapping of Habenular Neuronal Types Using Single-Cell RNA-SeqCurr. Biol 28:1052–1065
[45]
1. Kurmangaliyev Yerbol Z
2. Yoo Juyoun
3. LoCascio Samuel A
4. Lawrence Zipursky S
2019Modular transcriptional programs separately define axon and dendrite connectivityeLife 8
[46]
1. Turner Nicholas L.
2. Macrina Thomas
3. Bae J. Alexander
4. Yang Runzhe
5. Wilson Alyssa M.
6. Schneider-Mizell Casey
7. Lee Kisuk
8. Lu Ran
9. Wu Jingpeng
10. Bodor Agnes L.
11. Bleckert Adam A.
12. Brittain Derrick
13. Froudarakis Emmanouil
14. Dorkenwald Sven
15. Collman Forrest
16. Kemnitz Nico
17. Ih Dodam
18. Silversmith William M.
19. Zung Jonathan
20. Zlateski Aleksandar
21. Tartavull Ignacio
22. Yu Szi-chieh
23. Popovych Sergiy
24. Mu Shang
25. Wong William
26. Jordan Chris S.
27. Castro Manuel
28. Buchanan JoAnn
29. Bumbarger Daniel J.
30. Takeno Marc
31. Torres Russel
32. Mahalingam Gayathri
33. Elabbady Leila
34. Li Yang
35. Cobos Erick
36. Zhou Pengcheng
37. Suckow Shelby
38. Becker Lynne
39. Paninski Liam
40. Polleux Franck
41. Reimer Jacob
42. Tolias Andreas S.
43. Reid R. Clay
44. da Costa Nuno Maçarico
45. Seung H. Sebastian
2022Reconstruction of neocortex: Organelles, compartments, cells, circuits, and activityCell 185:1082–1100
[47]
1. Kim Jinseop S.
2. Greene Matthew J.
3. Zlateski Aleksandar
4. Lee Kisuk
5. Richardson Mark
6. Turaga Srinivas C.
7. Purcaro Michael
8. Balkam Matthew
9. Robinson Amy
10. Behabadi Bardia F.
11. Campos Michael
12. Denk Winfried
13. Seung H. Sebastian
2014Space–time wiring specificity supports direction selectivity in the retinaNature 509:331–336
[48]
1. Masland Richard H.
2012The Neuronal Organization of the RetinaNeuron 76:266–280
[49]
1. Baden Tom
2. Berens Philipp
3. Franke Katrin
4. Rosón Miroslav Román
5. Bethge Matthias
6. Euler Thomas
2016The functional diversity of retinal ganglion cells in the mouseNature 529:345–350
[50]
1. Matsuoka Ryota L.
2. Nguyen-Ba-Charvet Kim T.
3. Parray Aijaz
4. Badea Tudor C.
5. Chédotal Alain
6. Kolodkin Alex L.
2011Transmembrane semaphorin signalling controls laminar stratification in the mammalian retinaNature 470:259–263
[51]
1. Sun Lu O.
2. Jiang Zheng
3. Rivlin-Etzion Michal
4. Hand Randal
5. Brady Colleen M.
6. Matsuoka Ryota L.
7. Yau King-Wai
8. Feller Marla B.
9. Kolodkin Alex L.
2013On and off retinal circuit assembly by divergent molecular mechanismsScience 342
[52]
1. Peng Yi-Rong
2. Tran Nicholas M.
3. Krishnaswamy Arjun
4. Kostadinov Dimitar
5. Martersteck Emily M.
6. Sanes Joshua R.
2017Satb1 Regulates Contactin 5 to Pattern Dendrites of a Mammalian Retinal Ganglion CellNeuron 95:869–883
[53]
1. Krishnaswamy Arjun
2. Yamagata Masahito
3. Duan Xin
4. Hong Y. Kate
5. Sanes Joshua R.
2015Sidekick 2 directs formation of a retinal circuit that detects differential motionNature 524:466–470
[54]
1. Liu Jinyue
2. Reggiani Jasmine D. S.
3. Laboulaye Mallory A.
4. Pandey Shristi
5. Chen Bin
6. Rubenstein John L. R.
7. Krishnaswamy Arjun
8. Sanes Joshua R.
2018Tbr1 instructs laminar patterning of retinal ganglion cell dendritesNat Neurosci 21:659–670
[55]
1. Reimand Jüri
2. Kull Meelis
3. Peterson Hedi
4. Hansen Jaanus
5. Vilo Jaak
2007G:Profiler–a web-based toolset for functional profiling of gene lists from large-scale experimentsNucleic Acids Res 35:W193–200
[56]
1. Raudvere Uku
2. Kolberg Liis
3. Kuzmin Ivan
4. Arak Tambet
5. Adler Priit
6. Peterson Hedi
7. Vilo Jaak
2019G:Profiler: A web server for functional enrichment analysis and conversions of gene lists (2019 update)Nucleic Acids Res 47:W191–W198
[57]
1. Yao Zizhen
2. van Velthoven Cindy T. J.
3. Kunst Michael
4. Zhang Meng
5. McMillen Delissa
6. Lee Changkyu
7. Jung Won
8. Goldy Jeff
9. Abdelhak Aliya
10. Aitken Matthew
11. Baker Katherine
12. Baker Pamela
13. Barkan Eliza
14. Bertagnolli Darren
15. Bhandiwad Ashwin
16. Bielstein Cameron
17. Bishwakarma Prajal
18. Campos Jazmin
19. Carey Daniel
20. Casper Tamara
21. Chakka Anish Bhaswanth
22. Chakrabarty Rushil
23. Chavan Sakshi
24. Chen Min
25. Clark Michael
26. Close Jennie
27. Crichton Kirsten
28. Daniel Scott
29. DiValentin Peter
30. Dolbeare Tim
31. Ellingwood Lauren
32. Fiabane Elysha
33. Fliss Timothy
34. Gee James
35. Gerstenberger James
36. Glandon Alexandra
37. Gloe Jessica
38. Gould Joshua
39. Gray James
40. Guilford Nathan
41. Guzman Junitta
42. Hirschstein Daniel
43. Ho Windy
44. Hooper Marcus
45. Huang Mike
46. Hupp Madie
47. Jin Kelly
48. Kroll Matthew
49. Lathia Kanan
50. Leon Arielle
51. Li Su
52. Long Brian
53. Madigan Zach
54. Malloy Jessica
55. Malone Jocelin
56. Maltzer Zoe
57. Martin Naomi
58. McCue Rachel
59. McGinty Ryan
60. Mei Nicholas
61. Melchor Jose
62. Meyerdierks Emma
63. Mollenkopf Tyler
64. Moonsman Skyler
65. Nguyen Thuc Nghi
66. Otto Sven
67. Pham Trangthanh
68. Rimorin Christine
69. Ruiz Augustin
70. Sanchez Raymond
71. Sawyer Lane
72. Shapovalova Nadiya
73. Shepard Noah
74. Slaughterbeck Cliff
75. Sulc Josef
76. Tieu Michael
77. Torkelson Amy
78. Tung Herman
79. Cuevas Nasmil Valera
80. Vance Shane
81. Wadhwani Katherine
82. Ward Katelyn
83. Levi Boaz
84. Farrell Colin
85. Young Rob
86. Staats Brian
87. Wang Ming-Qiang Michael
88. Thompson Carol L.
89. Mufti Shoaib
90. Pagan Chelsea M.
91. Kruse Lauren
92. Dee Nick
93. Sunkin Susan M.
94. Esposito Luke
95. Hawrylycz Michael J.
96. Waters Jack
97. Ng Lydia
98. Smith Kimberly
99. Tasic Bosiljka
100. Zhuang Xiaowei
101. Zeng Hongkui
2023A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brainNature 624:317–332
[58]
1. Chen Ao
2. Sun Yidi
3. Lei Ying
4. Li Chao
5. Liao Sha
6. Meng Juan
7. Bai Yiqin
8. Liu Zhen
9. Liang Zhifeng
10. Zhu Zhiyong
11. Yuan Nini
12. Yang Hao
13. Wu Zihan
14. Lin Feng
15. Wang Kexin
16. Li Mei
17. Zhang Shuzhen
18. Yang Meisong
19. Fei Tianyi
20. Zhuang Zhenkun
21. Huang Yiming
22. Zhang Yong
23. Xu Yuanfang
24. Cui Luman
25. Zhang Ruiyi
26. Han Lei
27. Sun Xing
28. Chen Bichao
29. Li Wenjiao
30. Huangfu Baoqian
31. Ma Kailong
32. Ma Jianyun
33. Li Zhao
34. Lin Yikun
35. Wang He
36. Zhong Yanqing
37. Zhang Huifang
38. Yu Qian
39. Wang Yaqian
40. Liu Xing
41. Peng Jian
42. Liu Chuanyu
43. Chen Wei
44. Pan Wentao
45. An Yingjie
46. Xia Shihui
47. Lu Yanbing
48. Wang Mingli
49. Song Xinxiang
50. Liu Shuai
51. Wang Zhifeng
52. Gong Chun
53. Huang Xin
54. Yuan Yue
55. Zhao Yun
56. Chai Qinwen
57. Tan Xing
58. Liu Jianfeng
59. Zheng Mingyuan
60. Li Shengkang
61. Huang Yaling
62. Hong Yan
63. Huang Zirui
64. Li Min
65. Jin Mengmeng
66. Li Yan
67. Zhang Hui
68. Sun Suhong
69. Gao Li
70. Bai Yinqi
71. Cheng Mengnan
72. Hu Guohai
73. Liu Shiping
74. Wang Bo
75. Xiang Bin
76. Li Shuting
77. Li Huanhuan
78. Chen Mengni
79. Wang Shiwen
80. Li Minglong
81. Liu Weibin
82. Liu Xin
83. Zhao Qian
84. Lisby Michael
85. Wang Jing
86. Fang Jiao
87. Lin Yun
88. Xie Qing
89. Liu Zhen
90. He Jie
91. Xu Huatai
92. Huang Wei
93. Mulder Jan
94. Yang Huanming
95. Sun Yangang
96. Uhlen Mathias
97. Poo Muming
98. Wang Jian
99. Yao Jianhua
100. Wei Wu
101. Li Yuxiang
102. Shen Zhiming
103. Liu Longqi
104. Liu Zhiyong
105. Xu Xun
106. Li Chengyu
2023Single-cell spatial transcriptome reveals cell-type organization in the macaque cortexCell 186:3726–3743
[59]
1. Palumbos Sierra Danielle
2021Molecular Determinants of Electrical Synaptic Specificity. Thesis
[60]
1. Lasseigne Abagael M
2. Echeverry Fabio A
3. Ijaz Sundas
4. Michel Jennifer Carlisle
5. Anne Martin E
6. Marsh Audrey J
7. Trujillo Elisa
8. Marsden Kurt C
9. Pereda Alberto E
10. Miller Adam C
2021Electrical synaptic transmission requires a postsynaptic scaffolding proteineLife 10
[61]
1. Kahr Irene
2. Vandepoele Karl
3. van Roy Frans
2013Delta-protocadherins in health and diseaseProg Mol Biol Transl Sci 116:169–192
[62]
1. Light Sarah E.W.
2. Jontes James D.
2017δ-Protocadherins: Organizers of neural circuit assemblySemin Cell Dev Biol 69:83–90
[63]
1. Peek Stacey
2. Mah Kar Men
3. Weiner Joshua A.
2017Regulation of Neural Circuit Formation by ProtocadherinsCell Mol Life Sci 74:4133–4157
[64]
1. Bisogni Adam J
2. Ghazanfar Shila
3. Williams Eric O
4. Marsh Heather M
5. Yang Jean YH
6. Lin David M
2018Tuning of delta-protocadherin adhesion through combinatorial diversityeLife 7
[65]
1. Roine Ulrika
2. Roine Timo
3. Salmi Juha
4. Nieminen-von Wendt Taina
5. Tani Pekka
6. Leppämäki Sami
7. Rintahaka Pertti
8. Caeyenberghs Karen
9. Leemans Alexander
10. Sams Mikko
2015Abnormal wiring of the connectome in adults with high-functioning autism spectrum disorderMolecular Autism 6
[66]
1. Hong Seok-Jun
2. de Wael Reinder Vos
3. Bethlehem Richard A. I.
4. Lariviere Sara
5. Paquola Casey
6. Valk Sofie L.
7. Milham Michael P.
8. Di Martino Adriana
9. Margulies Daniel S.
10. Smallwood Jonathan
11. Bernhardt Boris C.
2019Atypical functional connectome hierarchy in autismNat Commun 10
[67]
1. Velmeshev Dmitry
2. Schirmer Lucas
3. Jung Diane
4. Haeussler Maximilian
5. Perez Yonatan
6. Mayer Simone
7. Bhaduri Aparna
8. Goyal Nitasha
9. Rowitch David H.
10. Kriegstein Arnold R.
2019Single-cell genomics identifies cell type–specific molecular changes in autismScience 364:685–689
[68]
1. Nassir Nasna
2. Bankapur Asma
3. Samara Bisan
4. Ali Abdulrahman
5. Ahmed Awab
6. Inuwa Ibrahim M.
7. Zarrei Mehdi
8. Shabestari Seyed Ali Safizadeh
9. AlBanna Ammar
10. Howe Jennifer L.
11. Berdiev Bakhrom K.
12. Scherer Stephen W.
13. Woodbury-Smith Marc
14. Uddin Mohammed
2021Single-cell transcriptome identifies molecular subtype of autism spectrum disorder impacted by de novo loss-of-function variants regulating glial cellsHuman Genomics 15
[69]
1. Li Chong
2. Fleck Jonas Simon
3. Martins-Costa Catarina
4. Burkard Thomas R.
5. Themann Jan
6. Stuempflen Marlene
7. Peer Angela Maria
8. Vertesy bel
9. Littleboy Jamie B.
10. Esk Christopher
11. Elling Ulrich
12. Kasprian Gregor
13. Corsini Nina S.
14. Treutlein Barbara
15. Knoblich Juergen A.
2023Single-cell brain organoid screening identifies developmental defects in autismNature 621:373–380
[70]
1. Dunn Felice A
2. Wong Rachel O L
2014Wiring patterns in the mouse retina: Collecting evidence across the connectome, physiology and light microscopyJ Physiol 592:4809–4823
[71]
1. Le Cong F. Ann Ran
2. Cox David
3. Lin Shuailiang
4. Barretto Robert
5. Habib Naomi
6. Hsu Patrick D.
7. Wu Xuebing
8. Jiang Wenyan
9. Marraffini Luciano A.
10. Zhang Feng
2013Multiplex genome engineering using CRISPR/Cas systemsScience 339:819–823
[72]
1. Mali Prashant
2. Yang Luhan
3. Esvelt Kevin M.
4. Aach John
5. Guell Marc
6. DiCarlo James E.
7. Norville Julie E.
8. Church George M.
2013RNA-guided human genome engineering via Cas9Science 339:823–826
[73]
1. Sarin Sumeet
2. Zuniga-Sanchez Elizabeth
3. Kurmangaliyev Yerbol Z.
4. Cousins Henry
5. Patel Mili
6. Hernandez Jeanette
7. Zhang Kelvin X.
8. Samuel Melanie A.
9. Morey Marta
10. Sanes Joshua R.
11. Zipursky S. Lawrence
2018Role for Wnt Signaling in Retinal Neuropil Development: Analysis via RNA-Seq and In Vivo Somatic CRISPR MutagenesisNeuron 98:109–126
[74]
1. Tian Feng
2. Cheng Yuyan
3. Zhou Songlin
4. Wang Qianbin
5. Monavarfeshani Aboozar
6. Gao Kun
7. Jiang Weiqian
8. Kawaguchi Riki
9. Wang Qing
10. Tang Mingjun
11. Donahue Ryan
12. Meng Huyan
13. Zhang Yu
14. Jacobi Anne
15. Yan Wenjun
16. Yin Jiani
17. Cai Xinyi
18. Yang Zhiyun
19. Hegarty Shane
20. Stanicka Joanna
21. Dmitriev Phillip
22. Taub Daniel
23. Zhu Junjie
24. Woolf Clifford J.
25. Sanes Joshua R.
26. Geschwind Daniel H.
27. He Zhigang
2022Core transcription programs controlling injury-induced neurodegeneration of retinal ganglion cellsNeuron 110:2607–2624
[75]
1. Biswas Sayantanee
2. Emond Michelle R.
3. Chenoweth Kurtis P.
4. Jontes James D.
2021δ-Protocadherins regulate neural progenitor cell division by antagonizing Ryk and Wnt/β-catenin signalingiScience 24
[76]
1. Zhang Yifeng
2. Kim In-Jung
3. Sanes Joshua R.
4. Meister Markus
2012The most numerous ganglion cell type of the mouse retina is a selective feature detectorProc Natl Acad Sci U S A 109:E2391–E2398
[77]
1. Krieger Brenna
2. Qiao Mu
3. Rousso David L.
4. Sanes Joshua R.
5. Meister Markus
2017Four alpha ganglion cell types in mouse retina: Function, structure, and molecular signaturesPLOS ONE 12
[78]
1. Dickinson Daniel J.
2. Goldstein Bob
2016CRISPR-Based Methods for Caenorhabditis elegans Genome EngineeringGenetics 202:885–901
[79]
1. Gratz Scott J.
2. Rubinstein C. Dustin
3. Harrison Melissa M.
4. Wildonger Jill
5. O’Connor-Giles Kate M.
2015CRISPR-Cas9 genome editing in DrosophilaCurr Protoc Mol Biol 111
[80]
1. Li Mingyu
2. Zhao Liyuan
3. Page-McCaw Patrick
4. Chen Wenbiao
2016Zebrafish genome engineering using the CRISPR-Cas9 systemTrends Genet 32:815–827
[81]
1. Davie Kristofer
2. Janssens Jasper
3. Koldere Duygu
4. De Waegeneer Maxime
5. Pech Uli
6. Kreft Łukasz
7. Aibar Sara
8. Makhzami Samira
9. Christiaens Valerie
10. Bravo González-Blas Carmen
11. Poovathingal Suresh
12. Hulselmans Gert
13. Spanier Katina I.
14. Moerman Thomas
15. Vanspauwen Bram
16. Geurs Sarah
17. Voet Thierry
18. Lammertyn Jeroen
19. Thienpont Bernard
20. Liu Sha
21. Konstantinides Nikos
22. Fiers Mark
23. Verstreken Patrik
24. Aerts Stein
2018A Single-Cell Transcriptome Atlas of the Aging Drosophila BrainCell 174:982–998
[82]
1. Dorkenwald Sven
2. Matsliah Arie
3. Sterling Amy R
4. Schlegel Philipp
5. Yu Szi-chieh
6. McKellar Claire E.
7. Lin Albert
8. Costa Marta
9. Eichler Katharina
10. Yin Yijie
11. Silversmith Will
12. Schneider-Mizell Casey
13. Jordan Chris S.
14. Brittain Derrick
15. Halageri Akhilesh
16. Kuehner Kai
17. Ogedengbe Oluwaseun
18. Morey Ryan
19. Gager Jay
20. Kruk Krzysztof
21. Perlman Eric
22. Yang Runzhe
23. Deutsch David
24. Bland Doug
25. Sorek Marissa
26. Lu Ran
27. Macrina Thomas
28. Lee Kisuk
29. Bae J. Alexander
30. Mu Shang
31. Nehoran Barak
32. Mitchell Eric
33. Popovych Sergiy
34. Wu Jingpeng
35. Jia Zhen
36. Castro Manuel
37. Kemnitz Nico
38. Ih Dodam
39. Bates Alexander Shakeel
40. Eckstein Nils
41. Funke Jan
42. Collman Forrest
43. Bock Davi D.
44. Jefferis Gregory S.X.E
45. Seung H. Sebastian
46. Murthy Mala
2023Neuronal wiring diagram of an adult brainbioRxiv
[83]
1. Tasic Bosiljka
2. Menon Vilas
3. Nguyen Thuc Nghi
4. Kim Tae Kyung
5. Jarsky Tim
6. Yao Zizhen
7. Levi Boaz
8. Gray Lucas T.
9. Sorensen Staci A.
10. Dolbeare Tim
11. Bertagnolli Darren
12. Goldy Jeff
13. Shapovalova Nadiya
14. Parry Sheana
15. Lee Changkyu
16. Smith Kimberly
17. Bernard Amy
18. Madisen Linda
19. Sunkin Susan M.
20. Hawrylycz Michael
21. Koch Christof
22. Zeng Hongkui
2016Adult mouse cortical cell taxonomy revealed by single cell transcriptomicsNat. Neurosci 19:335–346
[84]
1. Tasic Bosiljka
2. Yao Zizhen
3. Graybuck Lucas T.
4. Smith Kimberly A.
5. Nguyen Thuc Nghi
6. Bertagnolli Darren
7. Goldy Jeff
8. Garren Emma
9. Economo Michael N.
10. Viswanathan Sarada
11. Penn Osnat
12. Bakken Trygve
13. Menon Vilas
14. Miller Jeremy
15. Fong Olivia
16. Hirokawa Karla E.
17. Lathia Kanan
18. Rimorin Christine
19. Tieu Michael
20. Larsen Rachael
21. Casper Tamara
22. Barkan Eliza
23. Kroll Matthew
24. Parry Sheana
25. Shapovalova Nadiya V.
26. Hirschstein Daniel
27. Pendergraft Julie
28. Sullivan Heather A.
29. Kim Tae Kyung
30. Szafer Aaron
31. Dee Nick
32. Groblewski Peter
33. Wickersham Ian
34. Cetin Ali
35. Harris Julie A.
36. Levi Boaz P.
37. Sunkin Susan M.
38. Madisen Linda
39. Daigle Tanya L.
40. Looger Loren
41. Bernard Amy
42. Phillips John
43. Lein Ed
44. Hawrylycz Michael
45. Svoboda Karel
46. Jones Allan R.
47. Koch Christof
48. Zeng Hongkui
2018Shared and distinct transcriptomic cell types across neocortical areasNature 563:72–78
[85]
1. Yao Zizhen
2. van Velthoven Cindy T. J.
3. Nguyen Thuc Nghi
4. Goldy Jeff
5. Sedeno-Cortes Adriana E.
6. Baftizadeh Fahimeh
7. Bertagnolli Darren
8. Casper Tamara
9. Chiang Megan
10. Crichton Kirsten
11. Ding Song-Lin
12. Fong Olivia
13. Garren Emma
14. Glandon Alexandra
15. Gouwens Nathan W.
16. Gray James
17. Graybuck Lucas T.
18. Hawrylycz Michael J.
19. Hirschstein Daniel
20. Kroll Matthew
21. Lathia Kanan
22. Lee Changkyu
23. Levi Boaz
24. McMillen Delissa
25. Mok Stephanie
26. Pham Thanh
27. Ren Qingzhong
28. Rimorin Christine
29. Shapovalova Nadiya
30. Sulc Josef
31. Sunkin Susan M.
32. Tieu Michael
33. Torkelson Amy
34. Tung Herman
35. Ward Katelyn
36. Dee Nick
37. Smith Kimberly A.
38. Tasic Bosiljka
39. Zeng Hongkui
2021A taxonomy of transcriptomic cell types across the isocortex and hippocampal formationCell 184:3222–3241
[86]
1. Bock Davi D.
2. Lee Wei-Chung Allen
3. Kerlin Aaron M.
4. Andermann Mark L.
5. Hood Greg
6. Wetzel Arthur W.
7. Yurgenson Sergey
8. Soucy Edward R.
9. Kim Hyon Suk
10. Reid R. Clay
2011Network anatomy and in vivo physiology of visual cortical neuronsNature 471:177–182
[87]
1. Lee Wei-Chung Allen
2. Bonin Vincent
3. Reed Michael
4. Graham Brett J.
5. Hood Greg
6. Glattfelder Katie
7. Reid R. Clay
2016Anatomy and function of an excitatory network in the visual cortexNature 532:370–374
[88]
1. Yao Shenqin
2. Wang Quanxin
3. Hirokawa Karla E.
4. Ouellette Benjamin
5. Ahmed Ruweida
6. Bomben Jasmin
7. Brouner Krissy
8. Casal Linzy
9. Caldejon Shiella
10. Cho Andy
11. Dotson Nadezhda I.
12. Daigle Tanya L.
13. Egdorf Tom
14. Enstrom Rachel
15. Gary Amanda
16. Gelfand Emily
17. Gorham Melissa
18. Griffin Fiona
19. Gu Hong
20. Hancock Nicole
21. Howard Robert
22. Kuan Leonard
23. Lambert Sophie
24. Lee Eric Kenji
25. Luviano Jennifer
26. Mace Kyla
27. Maxwell Michelle
28. Mortrud Marty T.
29. Naeemi Maitham
30. Nayan Chelsea
31. Ngo Nhan-Kiet
32. Nguyen Thuyanh
33. North Kat
34. Ransford Shea
35. Ruiz Augustin
36. Seid Sam
37. Swapp Jackie
38. Taormina Michael J.
39. Wakeman Wayne
40. Zhou Thomas
41. Nicovich Philip R.
42. Williford Ali
43. Potekhina Lydia
44. McGraw Medea
45. Ng Lydia
46. Groblewski Peter A.
47. Tasic Bosiljka
48. Mihalas Stefan
49. Harris Julie A.
50. Cetin Ali
51. Zeng Hongkui
2023A whole-brain monosynaptic input connectome to neuron classes in mouse visual cortexNat Neurosci 26:350–364
[89]
1. Chen Xiaoyin
2. Sun Yu-Chi
3. Zhan Huiqing
4. Kebschull Justus M.
5. Fischer Stephan
6. Matho Katherine
7. Huang Z. Josh
8. Gillis Jesse
9. Zador Anthony M.
2019High-Throughput Mapping of Long-Range Neuronal Projection Using In Situ SequencingCell 179:772–786
[90]
1. Sun Yu-Chi
2. Chen Xiaoyin
3. Fischer Stephan
4. Lu Shaina
5. Zhan Huiqing
6. Gillis Jesse
7. Zador Anthony M.
2021Integrating barcoded neuroanatomy with spatial transcriptional profiling enables identification of gene correlates of projectionsNat Neurosci 24:873–885
[91]
1. Tsai Nicole Y.
2. Wang Fei
3. Toma Kenichi
4. Yin Chen
5. Takatoh Jun
6. Pai Emily L.
7. Wu Kongyan
8. Matcham Angela C.
9. Yin Luping
10. Dang Eric J.
11. Marciano Denise K.
12. Rubenstein John L.
13. Wang Fan
14. Ullian Erik M.
15. Duan Xin
2022Trans-Seq maps a selective mammalian retinotectal synapse instructed by NephronectinNat Neurosci 25:659–674
[92]
1. Zhang Aixin
2. Jin Lei
3. Yao Shenqin
4. Matsuyama Makoto
5. van Velthoven Cindy
6. Sullivan Heather
7. Sun Na
8. Kellis Manolis
9. Tasic Bosiljka
10. Wickersham Ian R.
11. Chen Xiaoyin
2023Rabies virus-based barcoded neuroanatomy resolved by single-cell RNA and in situ sequencingbioRxiv
[93]
1. Mazan-Mamczarz Krystyna
2. Ha Jisu
3. De Supriyo
4. Sen Payel
2022Single-Cell Analysis of the Transcriptome and EpigenomeMethods Mol Biol 2399:21–60
[94]
1. Bennett Hayley M.
2. Stephenson William
3. Rose Christopher M.
4. Darmanis Spyros
2023Single-cell proteomics enabled by next-generation sequencing or mass spectrometryNat Methods 20:363–374
[95]
1. Wang Tian
2. Brovman Yuri M.
3. Madhvanath Sriganesh
2021Personalized Embedding-based e-Commerce Recommendations at eBayarXiv preprint
[96]
1. Yu Yantao
2. Wang Weipeng
3. Feng Zhoutian
4. Xue Daiyue
2021A dual augmented two-tower model for online large-scale recommendationKDD

Article and author information

Mu Qiao
LinkedIn, Mountain View, CA, 94043
For correspondence:
muqiao0626@gmail.com

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 319
downloads: 2
citations: 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.