SummaryAnalysis of neuronal morphology has until recently been limited to small data sets due to the amount of time necessary to generate a digitized representation of a neuron's structure.Recent efforts to curate data from many different labs working in parallel on vastly differing topics has led to the NeuroMorpho.Org database which currently provides access and metadata for over 10,000 neurons from a variety of species, cell types, brain regions, and experimental conditions.
Analysis of neuronal morphology has until recently been limited to small data sets due to the amount of time necessary to generate a digitized representation of a neuron’s structure. Recent efforts to curate data from many different labs working in parallel on vastly differing topics has led to the NeuroMorpho.Org database which currently provides access and metadata for over 10,000 neurons from a variety of species, cell types, brain regions, and experimental conditions. No one would call 10,000 “Big Data”, but that number is expected to rise dramatically in the future. These curation efforts have been growing more successful as data sharing has slowly transitioned to be the exception to the norm, a literature mining effort has taken off, and reconstruction techniques, including semi-automatic and fully-automatic reconstruction, are advancing rapidly.
When a type of data grows this way, new analysis techniques and technology are required to take advantage of it. When the human genome project was happening, the data growth was also enormous, and the field of sequence analysis became vital. A variety of different analyses were developed with heuristics specific to the data to make data processing fast and effective. When considering neuronal data, it is fair to ask whether such successful techniques might be borrowed such that the wheel need not be reinvented.
In determining whether the study of neuronal morphology might benefit from gene sequencing, an important question is what is it about neuronal morphology that makes it sufficiently similar to what is being studied in genetics that similar approaches might work? Genes are not themselves functional, they must be transcribed and the resulting RNA used to generate chains of amino acids, which in turn fold up into functional proteins. The sequence information of the nucleotides and amino acids only partially reflects the function of the final 3-dimensional molecule. If a sequence representation is generated from the 3-dimensional functional neuronal tree (i.e. axon or dendrite), could it not be used in finding relationships with other neuronal tree-derived sequences?
Those similarities provide a good starting point, but vital differences remain. The same gene in multiple people, and different but related genes within a person or species, derive from a single ancestor. Mutations over time generally occur one at a time, sometimes with one organism getting a new version different from another producing variation in a population, and sometimes with a duplicate gene straying in function and acting in some alternative way. Neurons on the other hand develop given some genetic instructions in the context of development and environmental conditions. There is no ancestor neuron in a literal sense. However the basic instructions are the same, and we know that neurons of the same type share certain characteristics to allow them to fill the same role. So, we might expect related neurons to vary a bit more than related genes. The functional interaction between amino acids in a protein and branches in a neuronal tree are also different, so any assumptions about likelihood to match types of amino acids or for there to be an insertion or a deletion cannot be simply applied to neuron-derived sequences.
With these ideas in mind we have developed a basic encoding of neuronal branches as letters in a sequence, with the tree traversed from root to tip, always starting with the smaller subtree of a bifurcation. A bifurcation with two terminal branches is a Terminating bifurcation, or T-node. A bifurcation with two bifurcating branches generates two subtrees and is called an Arborizing bifurcation, or A-node. Finally, when a bifurcation has one terminal child branch and one bifurcating child branch, it is called a Continuation, or C-node. This alphabet is simple and borrows from nucleotides, though without a “G”.
From here we can do all sorts of analysis. You can read more about Motif Analysis, in which different types of neurons are inspected to see which short sequences occur more or less frequently than expected, and Sequence Alignment and Clustering, in which sequences are aligned to determine similarity and then clustered to find relationships between groups of neurons. (Descriptions of these sub-projects are in progress)