After I posted these series of examples, I’ve received a few comments from readers. Some of them using different version of software and can’t run the program successfully, some were happy with the simple code in which they had applied it to some other application, while a few readers were asking for more explanations on how these code work.
Well, I believe I’ve answered the doubt on the different version issue, but not the doubt on more details explanations. Here, I tried to draft a few paragraph to explain the concept hidden behind the code, and hope it helps to answer the readers who ask me about it though email.
1. Image Preprocessing
The image is first being converted to grayscale image follow by the threshing technique, which make the image become binary image. The binary image is then go through connectivity test in order to check for the maximum connected component, which is, the box of the form. After locating the box, the individual characters are then cropped into different sub images that are the raw data for the following feature extraction routine.
The size of the sub-images are not fixed since they are expose to noises which will affect the cropping process to be vary from one to another. This will causing the input of the network become not standard and hence, prohibit the data from feeding through the network. To solve this problem, the sub-images have been resize to 50 by 70 and then by finding the average value in each 10 by 10 blocks, the image can be down to 5 by 7 matrices, with fuzzy value, and become 35 inputs for the network. However, before resize the sub-images, another process must be gone through to eliminate the white space in the boxes.
2. Feature Extraction
The sub-images have to be cropped sharp to the border of the character in order to standardize the sub-images. The image standardization is done by finding the maximum row and column with 1s and with the peak point, increase and decrease the counter until meeting the white space, or the line with all 0s. This technique is shown in figure below where a character “S” is being cropped and resize.
The image pre-processing is then followed by the image resize again to meet the network input requirement, 5 by 7 matrices, where the value of 1 will be assign to all pixel where all 10 by 10 box are filled with 1s, as shown below:
Finally, the 5 by 7 matrices is concatenated into a stream so that it can be feed into network 35 input neurons. The input of the network is actually the negative image of the figure, where the input range is 0 to 1, with 0 equal to black and 1 indicate white, while the value in between show the intensity of the relevant pixel.
3. Neural Network Training
Well, I have just met a lecturer in my country and we had a short discussion on the NN for classification purpose. Both of us agree that while using simple FF-BP-NN for classification, the more important thing is the pre-processing of the data. “Rubbish-in, rubbish-out”, always true… So if you were to use NN after this process, it should be quite straight forward after you get the features, which is 5 by 7 = 35 values.