OpenCVメモ（文字の検出と認識その2）

前回の続き。

「recognition」モジュールを見てみる。
いくつか種類がある。

OCR using Beam Search algorithm.
OCR using HMM（Hidden Markov Models）
tesseract-ocr API

今回は「BeamSearch」を使ったDecodeを試してみる。

サンプルコード

https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/cropped_word_recognition.cpp

Class Reference

OpenCV: cv::text::OCRBeamSearchDecoder Class Reference

Decoder生成

アルファベット小文字、アルファベット大文字、数字の「62」個を認識ターゲットとする。
モデル、遷移確率は学習済みのものを使う。

// Trained models
auto classifier = cv::text::loadOCRBeamSearchClassifierCNN("OCRBeamSearch_CNN_model_data.xml.gz");

// Vocabulary
std::string voc = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

// Trained transition probabilities
Mat transitions_prob;
FileStorage fs("assets/OCRHMM_transitions_table.xml", FileStorage::READ);
fs["transition_probabilities"] >> transitions_prob;
fs.release();

// Emission probabilities. Identity matrix.
cv::Mat emission_prob = cv::Mat::eye(62, 62, CV_64FC1);

// Size of the beam in Beam Search algorithm.
int beam_size = 50;

auto ocr = OCRBeamSearchDecoder::create(classifier,
                                        voc,
                                        transitions_prob,
                                        emission_prob,
                                        OCR_DECODER_VITERBI,
                                        beam_size);

第1引数のclassifierはCNNを使っている模様。

The character classifier consists in a Single Layer Convolutional Neural Network and a linear classifier.
It is applied to the input image in a sliding window fashion, providing a set of recognitions at each window location.

各パラメータ要素数は以下のとおり。

kernel      :  118row x   64col
M           :    1row x   64col
P           :   64row x   64col
weight      : 1062row x   62col
feature_min :    1row x 1062col
feature_max :    1row x 1062col

サンプルとして提供されている画像（サイズ242x102）のテキストを認識させてみる。

Mat cropImage = imread("scenetext_word02.jpg");

ocr->run(cropImage,         // Input binary image
         text,              // output_text
         &boxes,            // component_rects
         &words,            // component_texts 
         &confidences,      // component_confidences
         OCR_LEVEL_WORD);   // component_level

認識結果。