This post records a minor success on my computer vision project training a LeNet-5 style convolutional neural net to recognize a stylus. The structure of a convolutional neural network is designed to automatically generate features from the image that are significant to the detection of categories.
The object of the game for my project is to be able to examine any given patch in an image and detect the presence of the stylus. And, if the stylus is detected, then determine the dimensions and location of the stylus. While categorization was not too difficult, the regression portion determining the location was/is problematic.
The neural net was created using PyNeurGen with modified classes for Convolutional and Subsampling layers. A StochasticNode class created for drop-out nodes simulates unreliable sensors. Basically, drop-outs are a method of regularization. At some point I will add those classes to the PyNeurGen project when the code settles down.
My initial approach used an output that signified category and location simultaneously, reasoning that the same inputs are going into both calculations. If the stylus was not detected, all the location coordinates would be 0. If detected, then draw the outline.
However, all too often, the tentative lines that appeared were erratic. The network was deciding for each and every point, whether (1) a stylus was present, and (2) the specific location of the point or output a zero. It makes more sense to detect the stylus presence, then having that belief, calculate the location.
Possibly by running additional epochs, the problems would sufficiently dissipate, but intuitively it seemed like a poor approach.
Eventually, it occurred to me that a better structure added an additional output layer. In the new format, only the two categorization nodes would sit in the former output layer. Then, when using that a priori determination of category, the category nodes would inform the location nodes without ambiguity.
In a sense, the network should be viewed as two networks, category and regression, with sparse links between them.<< more >>
When trying to find the location of an object on an image, one method matches points of an object that you have found with a template of points that sufficiently define the object. This post is a short description of a method for matching called the Procrustes distance using Python.
The Procrustes distance is named in honor of a story from Greek mythology. Procrustes was a son of Poseidon, a smith and bandit who lived in a cave. As a smith, he had made an iron bed. He would invite travelers to stay with him, and travelers could sleep in the bed.
However, Procrustes wanted the visitor to be the exact length of the bed. Fortunately, being a resourceful smith, he made tools to stretch the hapless visitors limbs if they were too short, or he would lop off the extra length of legs if they were too long.
Apparently, there was also a possibility that he actually had two beds, so that he could select the bed that was most ill-fitting. This does fly in the face of my OCD theory. In any event, Theseus eventually came along and put Procrustes out of his misery.<< more >>
An updated version of PyNeurGen which is Python Neural Genetic Algrorithms, has been uploaded. This software package implements a pure Python version of neural networks and a version of genetic algorithms, grammatical evolution.
It has been awhile since the last update. A major portion of the work entailed writing unit testing for the various modules. While some testing was done originally, in the intervening period I have developed a much greater appreciation for test functions, and so I went back and increased the coverage to somewhere around 90%. Some refactoring was necessary to break up functions into smaller more easily testable chunks.
In addition, some functions are improved. I reworked the stopping criteria for grammatical evolution. Normally, the determination of when to stop the evolutionary process revolves around when the figure of merit, the fitness value. The fitness value, generated by the process for each genotype, must become sufficiently great or small, depending upon the objective. When the hurdle is reached, it is time to stop. The other typical reason to stop the process is when a maximum number of generations has been reached.
PyNeurGen now enables custom fitness functions that among other things can enable an evolutionary process that tries build a distribution of solutions. Suppose that the fitness value is only a loose proxy for what you really want to do. In such cases, it may be that all of the fitness values in an upper range could signify genotypes that are suitable for that purpose. In situations like that, it makes more sense to think of building a population where, for example, the upper quartile must be above a hurdle fitness value. With the custom functions, that can be done now. In fact, any distribution of fitness values can be used as the stopping criteria.