Today at work I introduced a new feature that is exceptionally cool. For a certain reason we need to group database records without primary/foreign keys. The lack of restrictions allows more flexibility to the user, but comes at the expense of uncertainty. The goal was to develop a fuzzy match that would demonstrate an acceptable level of accuracy. In this particular case, the primary point of comparison is a person’s name (the only required field), but the requirement is to account for naturally occurring variations in the spelling, including typos, or simply guesses.
My solution implements the use of metaphone – a phonetic algorithm developed by Lawrence Philips. Basically a phonetic comparison, implemented with Ruby.
irb> Text::Metaphone.metaphone('Tony Targonski')
=> "TN TRKNSK"
What this means is that even a horribly misspelled variation of my name will match, as long as it sounds somewhat like TN TRKNSK
irb> Text::Metaphone.metaphone('Tony Targonski') ==
Text::Metaphone.metaphone('Tonie Tergoonsky')
=> true
Throw in some Levenshtein distance to sort the results based on how far off they are from the original query, and we’ve got a very impressive functionality, developed by a co-op student. A round of applause for very little new code, so — what exactly did I do?
A year old IBM commercial comes to mind.
Look at the guy who invented sliced bread. He didn’t invent bread. He didn’t invent slicing. He just applied new technology to an age old problem.
Alright, so this isn’t exactly new technology, or an age old problem, though this does demonstrate the point of the other side of Computer Science – practical application of algorithms. You’ve memorized a bubble sort, great! Now learn to apply it. Find out what actual problems are solved with new concepts that you are currently learning, and it will be understood and retained so much better.
oooOOOOoo…
VERY nifty sounding.
Now I understand why you wanted to find said commercial so much.
Heh… I’m going to have to start calling you TarGOONski now, you know. :p
Reply to comment
Unless you can demonstrate a Ruby interpreter running in your head, you shall address me by my proper name.
Reply to comment
Please, please PLEASE!! don’t ever use BubbleSort let alone memorise it.
Use Shell sort as a minimum…
The amount of times I’ve heard “…it doesn’t matter, it’s only a small list, you’re worrying about nothing, I just have to get on and finish this…” only to find months later that the software is running like a dog because the lsit is no longer small!!
Reply to comment
Neat, I can see a couple of cases where this will come in handy.
Reply to comment
Isn’t AJAX some sort of Sliced Bread ?
Good article !
Reply to comment
@Mitch – heh, the Bubble Sort is likely to be taught first, as it’s the simplest. In the linked article I talked about the student behaviour where they do simply memorize the code. And if they never implement it in their own projects, they’ll never learn how bad it really is. And you are right – Bubble Sort obviously doesn’t belong in any production environment. It’s a purely educational-purposes algorithm.
@Ilya – awesome, now I don’t feel as guilty for using all that code that you post on your blog
@Walter – you are absolutely correct, AJAX is the Sliced Bread of modern programming. JavaScript and XML have been around for a while, but the current implementation of the two together has spawned a whole new industry. Good catch!
Reply to comment
Applying technology to solve problems that don’t seem to fit initially is in my opinion a form of creativity.
Reply to comment
Never knew about metaphone. I knew from first glance you are not an easy nerd to deal with
People always misspell my name into female one, so testing this it is useful:
bashar abdullah =>BXRBTL
basher abdullah => BXRBTL
besher abdullah => BXRBTL
I wonder why you are getting space in the result and I’m not
Tony Targonski => TNTRKNSK
I am using the php metaphone though.
Another thing is, it doesnt work with Arabic
Reply to comment
@Bashar – it indeed looks like the php library strips out the spaces. I suppose you could apply the algorithm to each word individually, and then join the results. Or switch over to the Ruby side
And yes, it would not work with Arabic. Since the algorithm applies a set of phonetic rules, it is language specific, namely English. And while any Latin based language should produce a result, the outcome will be that of a foreign word being pronounced as if it was English — so the results could be inaccurate.
Reply to comment
You are one real Rub fan. Out of topic qst if I may. I am looking for a good detailed analysis and comparison between PHP Symfony and RoR. Spped, development ease, drawbacks and all. Neutral one.
Reply to comment
I have not used PHP Symfony, nor would I be able to give you a neutral comparison in regards to Ruby on Rails. Sorry
Reply to comment
Thanks. Thats a shame. Both have potential and I touch based slightly on both.
Reply to comment