I found a really interesting QA article today while trying to figure out how to get C# speech recognition to guess what the user meant to input. Then correct the input, if needed.
My goal is to improve the programs natural language processing capabilities. I am trying to create an Iron Man's Jarvis interface type program. I have it working rather well now. It is hooked into the Wolfram Alpha's API. So you can ask Jarvis questions like, "How many people live in New York City?" or "What is 3 minus 4 equal?" Well enough about Jarvis. Information about my Jarvis project is coming in another post.
This post is for those of you who were wondering about creating a Google-ish "Did you mean" type response to user queries. Like I am currently looking into. This article should give you the keywords and jargon your looking for. Also links to great pre-existing solutions for you to use.
Knowing what to search for, is half of the battle. :)
#################################################################################
The question asked on Stackoverflow: Click here for source site.
#################################################################################
I have the following requirement: -
I have many (say 1 million) values (names). The user will type a search string.
I don't expect the user to spell the names correctly.
So, I want to make kind of Google "Did you mean". This will list all the possible values from my datastore. There is a similar but not same question here. This did not answer my question.
My question: - 1) I think it is not advisable to store those data in RDBMS. Because then I won't have filter on the SQL queries. And I have to do full table scan. So, in this situation how the data should be stored?
2) The second question is the same as this. But, just for the completeness of my question: how do I search through the large data set? Suppose, there is a name Franky in the dataset. If a user types as Phranky, how do I match the Franky? Do I have to loop through all the names?
I came across Levenshtein Distance, which will be a good technique to find the possible strings. But again, my question is do I have to operate on all 1 million values from my data store?
3) I know, Google does it by watching users behavior. But I want to do it without watching user behavior, i.e. by using, I don't know yet, say distance algorithms. Because the former method will require large volume of searches to start with!
- Users mistyping a word (an edit distance algorithm)
- Users not knowing a word and guessing (a phonetic match algorithm)
I am interested in both of these. They are really two separate things; e.g. Sean and Shawn sound the same but have an edit distance of 3 - too high to be considered a typo.
........... Click the link above to see the answers for the question. Or here...
Click here for source site.
2 comments: