small bug in collating table
small bug in collating table
There's a small bug caused by the SPLIT line in collating tables that you should be aware about.
Example for a SPLIT with a hyphen:
"test-case" is indexed (as it should be) as:
"Female suffrage 1900 - 1920" however is indexed as:
A phrase search for "Female suffrage 1900 - 1920" results in 0 records.
If you change HYPHEN to SPLIT in Contacts.fil it is rejected by Cardbox.
generates in Sample.fil an error :
Load Collating Sequence from File
(Line 46 is last line of the file: (==00DF(ß) S S) and has nothing to do with split.
This error I could solve by adding the "-" as indexed character.
My solution is in this type of thing:
My workaround for the "bug" is changing this line:
However, when "-" added to the indexed characters, SPLIT prevents indexing this. **That is also bug.**
Nice was to try "Female suffrage 1900-1920". The "-" is used for negative numbers, hyphen, index and in a date as separator...
Btw: I always make the SKIP and DELETE line a little longer. This to prevent that numbers, when there is a a comma, quotes, a € or £ etc before a number, numbers are not indexed as a words. That is not a nice function (force that numbers are index into Word index), but a pitfall.
First of all, when you use
This is what the index should look like when a database contains the title "Trafford Leigh-Mallory and the RAF 1940 - 1943."
with a SPLIT:
without a SPLIT:
The separate "-" being the hyphen between the years, NOT the hyphen between Leigh and Mallory.
Noord-Holland is standard indexed as (already in DOS version) NOORD, HOLLAND and NOORDHOLLAND
Help tells us that SPLIT works as HYPHEN. Only difference: SPLIT character is left between the terms.
That is how I understand the Help.
I've never experienced any trouble with reading a collating table, either with or without SPLIT or HYPHEN, so it's unlikely that that's a bug in Cardbox.
Adding the SPLIT character to the indexed characters is necessary, not a workaround. I quote from the help, where there's an example of a SPLIT with a full stop (.):
And not indexing the SPLIT character although it has been added to the indexable characters must indeed be a bug. That's what I've been telling the whole time.
Pasted from help:
However, indeed in a another Help part on another page that is suddenly called ""inclusive" splitting". There is added that a second line is needed which is not found in the main description of SPLIT. A pitfall.
And I agree, that does not work right.
It was all a little confusing caused by your example "1900 - 1920". That seems that you expected things from SPLIT in that situation while SPLIT must do nothing here. Only "-" had be to indexed.
"I could never explain to anyone why Cardbox Noord-Holland indexed as Noord as well as Holland as well as NoordHolland."
I came across a useful application of the HYPHEN command. Keywords may be the containers of other keywords, e.g. the word "sweetshop" contains the word "shop". An index search on "shop" would have no results. But, when you use HYPHEN with, for instance, the middle dot (·), you can have both words indexed.
sweet·shop is indexed as
You are right: this gives also nice possibilities. However, these tricks are fine for IT-ers which can implement that type of things (and many other Cardbox tricks) to generate a plain, easy to explain, simple interface to front-end Cardbox users.
So, my intention is "what you write, you will find". Can it more easy?
I just ran into another anomaly when using hyphens, and it's a serious one.
Now try this: use a hyphen as the first character of a word, without quotes, such as:
Cardbox will return (almost) the entire database in the search result. A preview is empty (it should be!), but Cardbox seems to find a lot of records anyway. Ans of course do not match the search.
I've modified the search form in the Internet version to deal with this, but when you use the Client you can't. There's no error message, just a hugely wrong search result.
I think you overlook boolean search.
Indeed, Bert, I overlooked the Boolean function of the hyphen. That is the trouble with using archaic things like "+" or "-" in a modern system. And then there's using the hyphen in dates and the same character being used as a minus sign in numbers. And combinations thereof. Who needs to use them as Booleans on top of all that?
I use boolean search every day.
|© 2010 Cardbox Software Limited|