Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 RegisterRegister   
 Open Source OCR
Index -> Programming, Turing -> Turing Submissions
Goto page 1, 2  Next
View previous topic Printable versionDownload TopicRate TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message
therealbuba




PostPosted: Sat Oct 03, 2009 11:07 am   Post subject: Open Source OCR

Hey everyone.

Made this OCR program a long time ago decided to post. For those of you who do not know what OCR software does it converts text in pictures to editable text like in notepad. This program only has a small character definition database but its enough to work with the picture i give you which is standard times new roman size 12. Any feedback is appreciated im now working on a cursive OCR which is alot more complex but also alot easier to complete as it does not need character definitions for each type of font and size.

The file you should load is hf.jpg

Then simply select a name.txt for a save destination

You can select yes for definition changing however no need for it on this file.

Im completely open if anyone wants to continue developing it as long as they contact me.

I realize the code is not optimized in some areas but i think the heavy computations are pretty good.



OCR.rar
 Description:

Download
 Filename:  OCR.rar
 Filesize:  190.41 KB
 Downloaded:  511 Time(s)

Sponsor
Sponsor
Sponsor
sponsor
therealbuba




PostPosted: Mon Oct 05, 2009 8:10 pm   Post subject: Re: Open Source OCR

Anyone got any feed back what do you guys think?
SNIPERDUDE




PostPosted: Mon Oct 05, 2009 10:59 pm   Post subject: RE:Open Source OCR

Awesome job. Great speed (considering the language), and fairly smart recognition.

Areas to improve:
- idiot-proofing (making sure it in no way can crash, no matter how stupid the user)

- grammar mistake on line 287 (the word "you")

I have found that pressing the space bar (as instructed in the help) is very sensitive, and often skips a step. One way to resolve this sensitivity issue is using Input.Flush where appropriate.

Since it is open source I may just repost it idiot proofed and fixing those minor interface problems.

Great work.
therealbuba




PostPosted: Tue Oct 06, 2009 3:42 pm   Post subject: Re: Open Source OCR

Thanks alot i actually want to start an open source project called Free Reader and was thinking of moving the code over to c with some assembly for the computations. If anyone is interested im open to it now, i see a lack of good free OCR software now so i believe this could fill the gap. Code will continue for a bit in turing though as i need to refine the character library input method. Right now i have a separate program for it.

Many ideas for how this program can progress.

If interested post here. Very Happy
Vermette




PostPosted: Tue Oct 06, 2009 3:48 pm   Post subject: Re: Open Source OCR

therealbuba @ October 6th 2009, 15:42 wrote:
Thanks alot i actually want to start an open source project called Free Reader and was thinking of moving the code over to c with some assembly for the computations. If anyone is interested im open to it now, i see a lack of good free OCR software now so i believe this could fill the gap. Code will continue for a bit in turing though as i need to refine the character library input method. Right now i have a separate program for it.

Many ideas for how this program can progress.

If interested post here. Very Happy


Were you aware of this: http://code.google.com/p/tesseract-ocr/
therealbuba




PostPosted: Tue Oct 06, 2009 4:25 pm   Post subject: Re: Open Source OCR

Yes I am however thats an engine not a program, I am talking about something user oriented.

Very Happy
andrew.




PostPosted: Tue Oct 06, 2009 8:12 pm   Post subject: RE:Open Source OCR

How long is it supposed to take to analyse hf.jpg? I left the program running for like 15 minutes and nothing. I am running on a Core2Duo 2.4 GHz. It's a decent CPU.
therealbuba




PostPosted: Tue Oct 06, 2009 8:19 pm   Post subject: Re: RE:Open Source OCR

andrew. @ Tue Oct 06, 2009 wrote:
How long is it supposed to take to analyse hf.jpg? I left the program running for like 15 minutes and nothing. I am running on a Core2Duo 2.4 GHz. It's a decent CPU.


Unfortunatly error on your part then, check the instructions, it should not take more then 1/4 of a second to begin converting. In fact not even noticeable.

Like comment above said space bar is very sensitive be carefull when clicking it and mouse.

Hope it works for you this time.
Sponsor
Sponsor
Sponsor
sponsor
therealbuba




PostPosted: Wed Oct 07, 2009 9:17 pm   Post subject: Re: Open Source OCR

Did you get it to work? Very Happy
andrew.




PostPosted: Thu Oct 08, 2009 5:45 pm   Post subject: RE:Open Source OCR

Yeah, I didn't understand the instructions. It works pretty good for a Turing program.
bbi5291




PostPosted: Thu Oct 08, 2009 7:18 pm   Post subject: Re: Open Source OCR

therealbuba @ Mon Oct 05, 2009 8:10 pm wrote:
Anyone got any feed back what do you guys think?

I'm on Linux, and it's too inconvenient to actually compile and run this (using Wine). But I looked at your source. I have a comment to make here.

One of the key strengths of open source software is its extensibility. Right now, you say it only detects Times New Roman, size 12. But is it really that difficult to detect any other font face and size? Of course not. The algorithm is the same, and as long as code and data are cleanly separated, other people should be able to create their own font face definitions for use with your software.

Your letterdef.txt doesn't seem to indicate the size of the font; I assume you hardcoded it into your program. Don't. Move all the data pertaining to the font definition into the data file(s). And the actual glyph specification should be in a form as easy to read as possible. (For example: width and height, seperated by spaces, then an actual bitmap, a grid of dots and asterisks, where a dot represents a pixel that is not in the glyph and an asterisk represents a pixel that is in the glyph).
therealbuba




PostPosted: Thu Oct 08, 2009 8:54 pm   Post subject: Re: Open Source OCR

bbi5291 @ Thu Oct 08, 2009 wrote:
therealbuba @ Mon Oct 05, 2009 8:10 pm wrote:
Anyone got any feed back what do you guys think?

I'm on Linux, and it's too inconvenient to actually compile and run this (using Wine). But I looked at your source. I have a comment to make here.

One of the key strengths of open source software is its extensibility. Right now, you say it only detects Times New Roman, size 12. But is it really that difficult to detect any other font face and size? Of course not. The algorithm is the same, and as long as code and data are cleanly separated, other people should be able to create their own font face definitions for use with your software.

Your letterdef.txt doesn't seem to indicate the size of the font; I assume you hardcoded it into your program. Don't. Move all the data pertaining to the font definition into the data file(s). And the actual glyph specification should be in a form as easy to read as possible. (For example: width and height, seperated by spaces, then an actual bitmap, a grid of dots and asterisks, where a dot represents a pixel that is not in the glyph and an asterisk represents a pixel that is in the glyph).


Not sure what you mean exactly at the end as it seems a bit different to my algorithm however yes it is very easy to add other fonts and sizes and load them into the memory however without a program to turn the letters into letter definitions automatically (will get around to developing) the process is very time consuming, as it requires gaining the letter def 1 letter at a time. However, its a minimal problem, right now trying to refine main algorithm to make itas quick as possible ( on turing) also as this is not cursive OCR the algorithm does not work with widths etc but only with sizes defined in a font file that includes all the possible sizes. If i was to decode the algorithm behind actually creating a font and its size (not sure if there is one) then the process would be come somewhat more streamlined. Need to create a dynamic method that can take a unlimited amount of font files that i can keep adding by the day.

Also a question alot of people might ask is how the large amounts of fonts that could be found later will effect the speed of the program. Well the solution is very simple, the program will scan the first letter in a word and search for its definition in the entire database, then once it finds it, all following letters will be scanned from the same font and size as the first until another letter cannot be found where then it will go through the first process again. This process can also be made easier by the user specifying the font(s) that is being used in the document.

If i was to create a dynamic method for working through the large amounts of fonts and size and create a build in method for automatically loading new fonts and sizes in text files will people consider helping me continue the open source project, move over to c?

bbi5291 @ Thu Oct 08, 2009 wrote:
Your letterdef.txt doesn't seem to indicate the size of the font; I assume you hardcoded it into your program.


Its not hard coded i simply did not specify the size because their was only one font and size, at that scale did not feel necessary.
jdubzisyahweh




PostPosted: Mon Mar 01, 2010 12:24 pm   Post subject: RE:Open Source OCR

DONT POST A RAR FILE PLEASE!!!post as .t or .zip. free rar openers are a pain to find
USEC_OFFICER




PostPosted: Mon Mar 01, 2010 12:48 pm   Post subject: RE:Open Source OCR

I'm glad someone else agrees. (ish) But we've had this conversation 2/3 times already since I was here.
Euphoracle




PostPosted: Mon Mar 01, 2010 1:13 pm   Post subject: Re: RE:Open Source OCR

jdubzisyahweh @ Mon Mar 01, 2010 12:24 pm wrote:
DONT POST A RAR FILE PLEASE!!!post as .t or .zip. free rar openers are a pain to find


http://7zip.org
Display posts from previous:   
   Index -> Programming, Turing -> Turing Submissions
View previous topic Tell A FriendPrintable versionDownload TopicRate TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 2  [ 16 Posts ]
Goto page 1, 2  Next
Jump to:   


Style:  
Search: