Cardbox Talk


CardboxForumsCardbox Talk > "How to convert to UTF-8"

How to convert to UTF-8

I wish to replace my existing BESPOKE ASCII character set (and Collating Sequences) with a UTF-8 based system.

Posted By Post

Chris H

30-Aug-2017 10:29

I am currently using the 'spare' spaces in the standard 254 ASCII character set to give me 'special' characters such as chemical sub- & super-script numbers, some Greek symbols etc. (these work correctly both in display and printing). I have also adjusted the relevant collating sequences to treat these characters as their 'Latin' equivalents. I now wish to convert these features using a more universally acceptable approach.
Can any one advise me on the broad steps necessary to achieve this, i.e. how to create a suitable collating sequence to include my 'extra' characters, how to ensure proper display / printing of sub-scrip etc. characters, etc.


30-Aug-2017 11:45

Perhaps wrong quetion, but you do not write about this: Did you convert your database to unicode... ?
I think that can be a solution for you?

Chris H

31-Aug-2017 06:17

Hi Bert, Thanks for the quick response. No I have not converted my databases / FMT files to UTF-8. My wish is only to change the actual Character Set for data entry from a my bespoke system where several 'extra' characters have been added or replaced unused ones in the 'standard' ASCII set. These are mostly sub- and super-script numbers / letters, Greek letters. This has worked fine for all my purposes for many years.
I would now like to change the font and therefore need to add my 'extra' characters to the new character set of that font. I hoped that using some method of applying UTF-8 would let me do that as there are codes for all characters that I would need. Obviously I would need to tell Cardbox what my new font should be and edit the Collating Sequence according to the new character set with the 'extra' characters using the UTF-8 codes.


31-Aug-2017 10:16

Interesting to read what Cardbox can mean, even with this type of tricks.
However, if I read your question secure, I think converting your database to Unicode, you can use every available character in the world in Cardbox without any change to your collating sequence. The used TTF is your limitation then for showing it on your screen/printing (btw do not use Verdana). So you have to choose a (Unicode)font that supports the characters your need.
Choosing this way, the collating sequence you can change to the things you need (searching, indexing, sorting). But doing big tricks are not necessary any more.
Typing the strange characters is another issue. In the passed on Cardbox site there was a programme to insert (easy) Unicode characters. Cannot find it any more. I wrote myself a program that easy can insert Unicode characters into Unicode databases. Very useful typing East European characters as ş , Ş , etc!

Charles Welling

4-Sep-2017 07:51

Talking about Cardbox, Unicode and Slavic characters: I just discovered a strange phenomenon in Cardbox. I had no trouble entering the name of Radovan Karadžić, but unfortunately I could not search it. Cardbox treated the "ž" (z with caron) as an invalid character. Other characters (r or s) with carons are accepted. Some other, more obscure characters such as U+0181 (it cannot be displayed here) are also treated as invalid in a search.
I've not been able to discover a pattern in which characters are accepted and which are refused. I even entered the "ž" in the collating table, telling Cardbox to treat it as an ordinary "Z", but that didn't make any difference.

Charles Welling

4-Sep-2017 07:56

And to illustrate this strange behaviour:

"Slobodan Milošević" can be searched.

"Radovan Karadžić" cannot.


4-Sep-2017 09:16

Interesting. In my Unicode db I met no problem searching Radovan Karadžić when it is in a record.
Perhaps the chosen font do something (do not believe, but also did not test that)

Charles Welling

4-Sep-2017 09:54

Fonts have nothing to do with it, they only affect the display of characters. And of course I tested several Unicode fonts. But I'm glad you don't have the problem, because the cause then must be something that can be solved. Perhaps there's some difference in our collating tables that causes this. I'll keep you posted.

Charles Welling

4-Sep-2017 11:24

Found it. In the collating table there was a line starting with DELETE. I don't remember, but I think it's a leftover from Cardbox2 and now obsolete.
And look what was in the line (I added the **):

DELETE 00A4(¤) 00A8(¨) 00AF(¯) 00B2(²) 00B3(³) 00B4(´) 00B8(¸) 00B9(¹) ** 017D(Ž) 017E(ž) ** 02DC(˜) 20AC(€) 2122(™)

© 2010 Cardbox Software Limited   Home