How to

Build Tesseract
This is the page where you will find information about procedure of building tesseract on Windows OS.


 * 1) Download Setup for tesseract and open with VS2012. If you have higher version of Visual Studio then it will ask to upgrade Project files of tesseract. press yes.
 * 2) You are ready to build tesseract, press build and wait for process to be finished.
 * 3) make sure that tesseract is startup project in solution explorer.
 * 4) Go to Debug>properties>Configuration properties>Debugging> here provide project directory and command line arguments.
 * 5) press Debug.

That's it you have compiled tesseract successfully.

Thresholding
call GetThresholdedImage and pixWrite after SetImage in baseapi.cpp. probably at line number 914.

like...

PIX* thresholded = GetThresholdedImage; pixWrite("out2.jpg", thresholded, 3);

Get Box file
Call GetRegions and boxaWrite after Recognize in baseapi.cpp just before failed = Recognize(NULL) < 0;

like: BOXA* layout = GetRegions(NULL); boxaWrite("test1.box", layout);

If you want co-ordinates of each lines, then use function, BOXA* layout = GetTextlines(NULL);

If you want co-ordinates of each words, then use function, BOXA* layout = GetWords(NULL);

Page layout Analysis with tool
Use command line "tesseract -psm N hocr

note: if output file created by tesseract is .hocr then simply rename it with .xml

Run Downloaded tool, open xml file generated by tesseract, browse image. you well see page layout analysis. Tool can be found at resources page.

Page layout analysis with OpenCV
For doing page layout analysis and to get boxes drawn on image, we first of need co-ordinates of regions or lines or words. to get co-ordinates, follow 'Get Box file tutorial. This box file will be having co-ordinates which can be used by opencv functions to draw rectangle.

Use following code just before return(0) in tesseractmain.cpp FILE *fp; int x, y, w, h, i, n, version, ignore; fp = fopen("SampleNewspaper.words.box", "r"); fscanf(fp, "\nBoxa Version %d\n", &version); fscanf(fp, "Number of boxes = %d\n", &n);

IplImage *img1 = cvLoadImage("SampleNewspaper.png", CV_LOAD_IMAGE_UNCHANGED); for (i = 0; i < n; i++){ fscanf(fp, " Box[%d]: x = %d, y = %d, w = %d, h = %d\n", &ignore, &x, &y, &w, &h); //printf(" Box[%d]: x = %d, y = %d, w = %d, h = %d\n", ignore, x, y, w, h); cvRectangle(img1, CvPoint(x, y), CvPoint(x + w, y + h), CvScalar(222, 0, 255), 2, 8, 0); } cvSaveImage("SampleNewspaper.words.png", img1);

Tm I. or have 1.

3. m. on: Iv .1»... 5... ...a nut up... .....1.....g ... mini. an ..... ... “.1... bung nu... but .0 feel m....... um. whu mu un du ......sar. m be mdrpelnlcnl. .0 1.. happy wuhmn ma... W... .0 gm mm luppmc n. 1.». . 1....» of dung: ..... mspua adieu .a m .n.... .1..\..... wnhanlt mcdmg ynu .0 sh) .\l:k: penplr ......u mu. mg... wondsr. ma 1.. mm 1.......-.»...-..... van gave .. only am: an wlul .9... an ‘Idling in lake Don't .....x..... m... suztcsx x....-nu. mu... M nuke . a.cr=.e..« Wllh («layout 3-... mm an an .... 1...... aldnﬂuvnney .. ..¢ xnllmg ... ....s.¢. .\<u'-I .9... .w luv: .... ovnuslnp ova peopl: rvcn |I}'n|| do yr: mg.. max: mu. mu ucnve. -n.. .11.... .1... _rwu F... Inn) um-piuu * mm... n. ma. .5... own um. md ... ue .\.. hes: in .1xa.:.. .9... mg... no =.....u=. an... m uh .. .. luv:

"- "‘”""!dIooI:x:no':: o[\'o'ursun:u1oI' W"-= W:-we balm-e Llm mix .32.... ' 1

..,........ 1;