How to

Build Tesseract
This is the page where you will find information about procedure of building tesseract on Windows OS.


 * 1) Download Setup for tesseract and open with VS2012. If you have higher version of Visual Studio then it will ask to upgrade Project files of tesseract. press yes.
 * 2) You are ready to build tesseract, press build and wait for process to be finished.
 * 3) make sure that tesseract is startup project in solution explorer.
 * 4) Go to Debug>properties>Configuration properties>Debugging> here provide project directory and command line arguments.
 * 5) press Debug.

That's it you have compiled tesseract successfully.

Thresholding
call GetThresholdedImage and pixWrite after SetImage in baseapi.cpp. probably at line number 914.

like...

PIX* thresholded = GetThresholdedImage; pixWrite("out2.jpg", thresholded, 3);

Get Box file
Call GetRegions and boxaWrite after Recognize in baseapi.cpp just before failed = Recognize(NULL) < 0;

like: BOXA* layout = GetRegions(NULL); boxaWrite("test1.box", layout);

If you want co-ordinates of each lines, then use function, BOXA* layout = GetTextlines(NULL);

If you want co-ordinates of each words, then use function, BOXA* layout = GetWords(NULL);

Page layout Analysis with tool
Use command line "tesseract -psm N hocr

note: if output file created by tesseract is .hocr then simply rename it with .xml

Run Downloaded tool, open xml file generated by tesseract, browse image. you well see page layout analysis. Tool can be found at resources page.

Page layout analysis with OpenCV
For doing page layout analysis and to get boxes drawn on image, we first of need co-ordinates of regions or lines or words. to get co-ordinates, follow 'Get Box file tutorial. This box file will be having co-ordinates which can be used by opencv functions to draw rectangle.

Use following code just before return(0) in tesseractmain.cpp FILE *fp; int x, y, w, h, i, n, version, ignore; fp = fopen("SampleNewspaper.words.box", "r"); fscanf(fp, "\nBoxa Version %d\n", &version); fscanf(fp, "Number of boxes = %d\n", &n);

IplImage *img1 = cvLoadImage("SampleNewspaper.png", CV_LOAD_IMAGE_UNCHANGED); for (i = 0; i < n; i++){ fscanf(fp, " Box[%d]: x = %d, y = %d, w = %d, h = %d\n", &ignore, &x, &y, &w, &h); //printf(" Box[%d]: x = %d, y = %d, w = %d, h = %d\n", ignore, x, y, w, h); cvRectangle(img1, CvPoint(x, y), CvPoint(x + w, y + h), CvScalar(222, 0, 255), 2, 8, 0); } cvSaveImage("SampleNewspaper.words.png", img1);

Lorem ipsum dolor sitamet, consectetur adipisicing elit, sed do eiusmod temper incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco

cillum dolore eu fugia

Excepteur slnt occaec

at cupidatat non proident, sunt in culpa qui ofﬁcia deserunt mollit anim id est laborum.

</pre?