Home

Cardbox Talk

 

CardboxForumsCardbox Talk > "fuzzy matching"

fuzzy matching

Possible bug in fuzzy matching.

Current user: [none]
Register / Log In · Help

Posted By Post

Charles Welling

23-Jan-2015 12:37

I noticed some weird behaviour in my Cardbox-driven website where visitors can get suggestions for mistyped words. I use fuzzy matching for that, and I was testing to improve the results. Then I noticed the following. I used the name of a famous cartoonist: "Raemaekers" and typing "Ramakers" or "Remarkers" etc. all resulted in the suggestion "Raemaekers". Both words have two differences compared to "Raemaekers".

Then I tried "Reameakers", a common mistake visitors make. Fuzzy matching, using either 1 or 2 differences, came up with absolutely nothing. Finally it dawned upon me that the number of differences is zero. Some characters have been switched, but they are all the same as in "Raemaekers".
That seemed to fool Cardbox. Although the words are different, Cardbox doesn't notice it. According to the manual (and to logic!), Cardbox should return all words with differences UP TO the number given in the query, and that includes zero differences.

Searching for "Raemaekers" (the correct name) in combination with fuzzy matching works fine. It's only when an exact match isn't found and Cardbox compares words in the index with the query when it seems to ignore 0 differences and only looks for 1, 2 or more differences.

bert

23-Jan-2015 19:09

In a field: Raemaekers
Then:
Select REAMEAKERS (fuzzy match 4)
Level 1: 1 records selected.
Indeed: 4 errors in search string.

If fuzzy match was 2: No Records Selected - because there were 4 errors in the search string.

Perhaps fuzzy match works a little different than expected?

Regards
Bert

Charles Welling

26-Jan-2015 08:48

I suppose you're right, but it still puzzles me. I tested this using a macro which increases the degree of fuzziness by 1 as long as nothing is found. So it starts using "fuzzy 1", and if nothing has been found it will use "fuzzy 2" etc.
It did come up with "Ria Beckers" who is also in the database.

Reameakers
Ria Beckers

That's 4 differences! E=i, space added, m=b, a=c.
But no Raemaekers, which also has 4 differences, but contains exactly the same characters.
And yes, I am aware of the fact that the client uses the sequence skip+fuzzy (e.g. 1+4), and in macros this is the other way around: fuzzy,skip (i.e. 4,1).
It's all a bit fuzzy.

Quick Reply

Please log in or register before trying to post a reply.

 
© 2010 Cardbox Software Limited   Home