Pattern recognition is re-emerging as one of the most important aspects of Artificial Intelligence and Neurological research. What has recently been determined is that a significant portion of neurological processing is actually pattern matching. Even what we think of as deductive reasoning is begining to be seen as a process involving a tremendous amount of pattern matching in its initial phase.
This re-raises the question of learnign machine algorithms and computational structures such as neural nets. Currently Neural nets have been getting a bad rap...Everyone seems to think that Support Vector Machines, AdaBoost, and other more recently derived algorithms.
However, the simplicity of construction coupled with the complexity of ability that classic neural nets provide strikes me as a powerful place to step off from in search of computational models of effective neurological processes. Or, as I have said before, I do not solve differential equations when I catch a ball (nor even quadratic equations).
Classic neural nets learn to recognize and segregate patterns through altered strengths of the connections between simplistic computing elements (neurons that pefome a simple sigmoid transfer or a discrete threshold transform).
But while neural net based pattern matching relies on strengthenng and weakening connection potentials between these artificial neurons, there are other emergent effects which are not fed back into the pattern matching process.
Essentially, what I am proposing here is that, as in continuous equations, patterns of data have multiple derivatives, slopes if you will, that reflect overlying patterns which moreadvanced techniques can take into account.
Neural nets do this to some degree as it is, but multi-layer nets do it better at the pure data level. The hidden layer of modern neural nets effectively captures the first derivative of the data pattern in an ordered set of connection weights. The connections encode the rate of change of the incoming feature data for different inputs, with differnt variations in those rates of change encoding different pattersn.
Please note, this is all sort of metaphorical. Discussing an encoding of data patterns as derivatives or trates of channge is not precisely accurate. However, Fourier transforms of image data do a similar reduction of a collection of data treated as a distribution of frequencies.
If we carry the analogy a step further, then, we could talk about the second derivative of the data pattern which would be the first derivative of the neural net's resulting pattern. From THAT pattern we could begin to derive deeper recognition of internal structures to our original data.
So, if we had one neural net stacked above another (metaphorically speaking) we could have it watch for pattersn in the lower level net. These patterns woudl arise from the patterns it detected in the data. Allowing the second level net to classify states in the first net provides a deeper, more refined set of nuances to the classification.
My recent experiments with this idea show a great deal of promise. However, the key remains to identify specific features of the first net to pass on to the second. The math associated with this effort is still somewhat obscure.
But the idea of stacked netswatching each other's patterns is similar to the way our brains networks watch each other. By feeding back patterns of connections and classifications about the primary networks, the secondary (and perhaps tertiary) networks can provide non-linear effects that act as perturbing noise in the process of pattern recognition.
Note that I am not talking about extra hidden layers in a network. It has been largely shown that extra hidden layers beyond about 2 do not add any benefit to the processing of the net. I suspect this is actually because we do not construct the nets with sufficient complexity.
However, I am talking about discontiguous nets, one being driven by extracted features from the other which is being driven by extracted features from a document or text corpora. The benefits from recognizing deeper patterns would provide us with nuanced patterns such as a recognition of trend data and sublties within the structure of the original data.
Thsi is a bit different from the other use f the term stacked neural nets that is common. In that use the same dayta (text for example) is passed to multipe nets that are trained to seek specific types of first level patterns. While the nets feed some information to the next net to recieve the data, they are essentially parallel and are all generating this first dericvative I spoke of. In my model, the higher order network is completely unaware of the the actual original text or data coming to it. Rather it is examining patterns of neuronal connections that it sees in the lowel level net without any knowledge of how they came to be.