Possible bug in fuzzy matching.

Charles Welling

23-Jan-2015 12:37

I noticed some weird behaviour in my Cardbox-driven website where visitors can get suggestions for mistyped words. I use fuzzy matching for that, and I was testing to improve the results. Then I noticed the following. I used the name of a famous cartoonist: "Raemaekers" and typing "Ramakers" or "Remarkers" etc. all resulted in the suggestion "Raemaekers". Both words have two differences compared to "Raemaekers".

Then I tried "Reameakers", a common mistake visitors make. Fuzzy matching, using either 1 or 2 differences, came up with absolutely nothing. Finally it dawned upon me that the number of differences is zero. Some characters have been switched, but they are all the same as in "Raemaekers".
That seemed to fool Cardbox. Although the words are different, Cardbox doesn't notice it. According to the manual (and to logic!), Cardbox should return all words with differences UP TO the number given in the query, and that includes zero differences.

Searching for "Raemaekers" (the correct name) in combination with fuzzy matching works fine. It's only when an exact match isn't found and Cardbox compares words in the index with the query when it seems to ignore 0 differences and only looks for 1, 2 or more differences.


23-Jan-2015 19:09

In a field: Raemaekers
Select REAMEAKERS (fuzzy match 4)
Level 1: 1 records selected.
Indeed: 4 errors in search string.

If fuzzy match was 2: No Records Selected - because there were 4 errors in the search string.

Perhaps fuzzy match works a little different than expected?


Charles Welling

26-Jan-2015 08:48

I suppose you're right, but it still puzzles me. I tested this using a macro which increases the degree of fuzziness by 1 as long as nothing is found. So it starts using "fuzzy 1", and if nothing has been found it will use "fuzzy 2" etc.
It did come up with "Ria Beckers" who is also in the database.

Ria Beckers

That's 4 differences! E=i, space added, m=b, a=c.
But no Raemaekers, which also has 4 differences, but contains exactly the same characters.
And yes, I am aware of the fact that the client uses the sequence skip+fuzzy (e.g. 1+4), and in macros this is the other way around: fuzzy,skip (i.e. 4,1).
It's all a bit fuzzy.

