Duplicate records on a Search
Searching on an indexed term and having duplicate records
Searching on an indexed term and having duplicate records
When I do a search on a single indexed term I would expect only records with that term to be selected/filtered. However, a search will often produce multiple duplicates of the filtered records. Any suggestions as to how I can prevent this? Thanks.
Firstly I would turn on "Highlight Matches" under VIEW in the toolbar to see why you are getting these results. Then give us some idea of how you are searching and the terms you are using.
If you search a term, and Cardbox finds n records with that term, then you can be sure there are n records in your database containing the searched term.
In such a case you get of course duplicate records while searching: you did add them!
To clean up duplicate records, please use tools, deduplicate.
I'd like to stress that Bert is right. You can NEVER find duplicate records if there are not duplicate records present in your database. It makes no difference what you search or how you search; a single record cannot be shown twice unless it has been entered twice.
There are two options to prevent the use of "save as new" or "duplicate record", which obviously is another cause of duplicate records.
Edit the native format, choose Tools > Toolbar > Editing Records.
1. Choose the toolbar button for "Save As New" and change it's action to "do nothing".
2. Alternatively, you can assign a macro to the toolbar button. This is the better option, as you can make the macro issue a warning and offer you the option to save the record in the normal way. Just "do nothing" may be confusing.
Changing the behaviour of the Save and Save As New buttons will also change the behaviour of the corresponding menu items. This is an undocumented feature.
The above will make sure that you will never use Save As New again.
The usual way to use Duplicate Record is to press CTRL-D. Again, edit the native format and use Tools > Keyboard and add a new keystroke. In this case add CTRL-D which will override the built-in action of CTRL-D. Make it either "do nothing" or trigger another macro.
There is a way to completely prevent the use of Duplicate Records, but explaining that would make this post a bit long and perhaps complicated. Please let me know if you want to know how to do that.
If you need help with writing the above mentioned (very simple) macros, please add a post to the macros section.
To Bert, Charles and Jelly,
Thanks for taking the trouble to reply to my problem. Perhaps I should give a little more detail, as the suggestions don't seem to have fixed the problem I am having (it's almost certainly me, not Cardbox!). As I'm editing a journal I've set up the data base so that there are a number of fields I can index and search on. These include a unique number for the manuscript (entered manually), the author's name, the Decision and the title. If I then want to check that a new author has taken acount of material we have previously published I first do a search on the Accept field. This reduces the number of records to about 10,000. If I then search on the relevant keywords in the Title field this throws up another search level. What I would then expect is a set of records containing the keywords but not duplicated (I save revised records as 'Save', never through 'Save as New'). However, what I am presented with are the records I want but also duplicates of titles rather than a unique set of titles.
If I now click on 'Deduplicate' and identify the indexed field 'Title ', I have the message 'There are no records in the current selection.'
So I am left having to check records manually to ensure that I haven't accidently referred to the same record twice (even though it seems to appear only once in the data base as it is linked to a unique number).
I hope that's clear? I quite see that a single record cannot be shown twice uless it's been entered twice, but I don't seem to be entering it twice, yet it's showing up more than once. I inherited this data base from the previous editor, so it's quite likely that there's sometiong going on in the background that I'm unaware of!
Thanlks again for your suggestins.
You get the records you did select by your search commands.
If you call that "duplicates" (and as far as I understand you try to explain this?), than this are not duplicates. It is simple a collection of 100 records containing in one field the same information (in this example).
If you want to eliminate 100x Wall Street Journal in reports, the best you can is developing a script for this (like in many other databases).
But perhaps I do not yet precise understand what you mean. In Cardbox are not things going on in background. You enter search terms, you get what you asked.
Peter, it's still a mystery to me.
"Birds of Great Britain."
A search for "birds" would return these three titles, but these titles are different. No question of duplicates.
Could you perhaps give us an example of what you call duplicate records? Two will do.
Dear Bert and Charles,
By 'duplicates' I mean that exactly the same record appears more than once, even though it exists on the data base just once. So to use Charles' example:
If I select in the Decision field the indexed term 'Accept' this filters out all but the records that have been accepted (and the display identifies that we have moved from Level 0 to Level 1).
If you want to select "Birds of Great Britain"
If data search is too slow, than you first can select by index search "Birds of Great Britain", and then perform data search.
A same record never is presented more times. Cardbox keeps it in your selection on any level if you have commanded to so by your searching. So I do understand why this is frustrating. I am glad with this feature!
(so I do NOT understand...)
Please try this:
Do you have still have 4 records?
To be on the safe side, export the contents of these records to a file. Use the internal format.
Delete one of the apparent duplicates. Then make the original selection once more.
After this, you might try another deduplication, but use the unique ID for deduplication, not any keywords. Don't forget to sort the database on the field "ID" before you start the deduplication.
Thanks for the suggestion. I did as you suggested (did a search on a term which produced at level 1 an exact copy of the record - i.e. there were two identical records at level 1. I then deleted one of these and returned to level 0, but the record was missing (I reloaded it by clicking 'undo deletion'), so it looks as if there are no actual duplicates at level 0. And yet when I do a search they appear.
extreme rebuild you database!
It seems as if we could go on searching for duplicates indefinitely, but at this stage I would advice you to cure this the hard way: pull the plug.
Here we go:
Be at level 0 and export your database to a file using the internal format (*.dmp).
This is an entirely safe procedure as you will keep a copy of the old database.
I was worried that you might suggest this eventually! Thanks for such clear instructions though. I've exported parts of the data base before (I have it on 2 PCs, so need to keep the second system up to date with the master one)without any trouble, but last night when trying to export the whole data base it kept freezing about half way through with an error message (I've the detailss on its logged report but won't bother you with them). Today I'll export it in blocks and then carry on with your suggestions and let you know what develops.
Thanks again for your trouble.
Use extreme rebuild - you can choose there to let your damaged records blank. Easy to find the damaged positions later.
The freezing may indicate that there is something seriously wrong with your database. Exporting it in blocks may or may not work, but there's no harm in that.
If I were you, I'd do another export in the external format (*.ext). It will give you a file which you can read with Notepad, Wordpad or Word and it will enable you to retrieve any data by copying it from this file.
When you do so, you'll have two files (*.dmp and *.ext). The DMP is the best file to use for an import, as it contains all the indexing information and even images if you have them. The EXT will contain your data in a form that can be easily read by you.
Then follow Bert's advice to rebuild your database, which is something you should do on a regular basis anyway. Try your two exports first, then rebuild your database and see if Cardbox reports any corrupted records. Tell Cardbox to keep corrupted records as blank records. The place where any blank records occur will give you an indication of which records were deleted. You may retrieve them from the external export file.
After having rebuilt your database, do another export, as above. Keep the export files separate.
If the rebuilding process did not report any errors and the second export went smoothly, then I'd use the internal file from the second export to import your records.
If the internal files for some reason cannot be read: use the external files. Any errors in these files can be corrected by hand.
And last but not least: you said you have your database on two PC's. If these PC's are connected, please consider installing the Cardbox server. It will save you the trouble of keeping both versions up-to-date. There's always a risk in keeping two versions that have to be synchronised.
Dear Bert and Charles,
It wasn't allowing me even to export a single file, let alone blocks, but I did as Bert has now suggested and the Extreme Rebuild of the data base produced no corrupted records and the system reported no problems, but interestingly it then did allow me to export the data to a .dmp file. Unfortunately it then repeats its 'Cardbox has encountered a fatal error and has had to close' message. So Charles is obviously right that there is something badly wrong, but perhaps it is with the format file?
However, I was then able to continue following Charles' advice to restart Cardbox and create a new database by first loading the renamed format file and then load the .dmp file. At this point the reloading of the .dmp failed and Cardbox repeated its error message.
I don't know if it's of any relevance but I run two databases in Cardbox for the journal. Database 1 captures information from authors, and this is the one causing the problems I've been sharing with you. Database 2 captures information about the academics who review and report on the authors' papers. This second database is rock solid, with no 'duplications', error messages, problems with exporting data etc. Obviously it has a different format file to the one causing problems.
Bearing in mind that I have a backup (in fact more than one!) on the master PC and having been working on the second PC's copy of Cardbox I wonder if, rather than do as Charles now suggests, I should delete the whole of Cardbox from my second system,reload it and then copy the format file from the first system and its data base? Pretty drastic! The only problem I can see with this approach is that by copying the format file and data base from PC 1 to PC2 I'll still have my problem of 'duplicates' in levels 1, 2.
I quite take your point, Charles, about having the same database on two systems. PC1 is the master which the administrator works on, PC2 the slave in that I often have to access the data base when I'm away from the office (to identify reviewers, check titles, etc., but I never input data into the slave, just read it. We update the slave every week, sometimes more often.
There are still some options.
Did you export to a file in the external format? If so, you could browse through this file and look for errors. It will show you the records in much the same way as in Cardbox itself and any corruptions will be clearly visible.
You could also use this file to reload your database.
A second option is to create a new format file.
If you are going to read an external file (*.ext), then you should use the same field names as in the original database. An *.ext file contains the field names and Cardbox will read the *.ext and match the field names to the corresponding fields in your new format file.
If you want to read your *.dmp, you should make a new format file and give it some extra fields. Don't give the fields any meaningful names yet.
After you've read the *.dmp succesfully (I hope), have a look at the data. E.g. field NONAME1 may contain your unique ID. Rename the fields to whatever names they should have and delete any unused (empty) fields. This is pretty drastic too, but it will eliminate any corruption from the old files.
Restoring your database from another PC is also an option, but don't forget that whatever went wrong, may already be present in those copies.
Did you also tried "rebuild format"
Dear Bert and Charles,
Sorry for the delay in getting back to you (have been away working).
Bert, I'm running Cardbox version 3.0 Professional Edition.
I thought I'd start again from the beginning, so checked that I was still getting 'duplicates' at Level 1 and following (which I am). I then exported the database again as a *dmp file (which generates a Fatal Error message and shuts down Cardbox - do you think this error occurs when Cardbox meets a corrupted record, as it hasn't happened before when I update the slave system?), renamed the .fil and then restarted Cardbox. When I clicked File/New Database I wasn't shown the database file, but had to hunt for it and double-click it to open it. Everything loaded OK and when I ran a search the same 'duplication' problem occurred.
I ran Rebuild and also the Extreme Rebuild on the database, but no damaged records were identified.
I then exported in external format and read the files in Notepad. None appear to be corrupted, but to my surprise there are a number that seem to be duplicated or even triplicated. So I went back to the master database (the one on my master PC) and, although I've done nothing on it recently, there appear to be four sets of records (they are not an exact copy of the data base, but some appear more than once. Yet when I first contacted you I'd run a check for duplicate records and there were none on level 0 (unless I did a search and moved up levels).
So what I'm proposing to do now is:
What do you think?
By the way, when I've moved records from the master to the slave PC (say record 4000 to the end of the data base) I tag record 4000 and then click Export (via .dmp). I NEVER move from the slave to the master database so I am really puzzled as to why there should be duplicates now at Level 0 on the Master database.
I am really am curious about this all.
If you mail me at 001meworldmail.nl, I really like to help to try to solve you problem (en find what caused it (free).
Well, this is quite a mess. The good part is that there's no mystery: there are duplicates as we suspected all along.
I'd leave both the master and the slave database alone. Make a new copy of the master and clean that copy up. I'm convinced you can use Deduplicate if you use it the right way. That means that you will have to sort your database in such a way that the duplicates will be adjacent. Then use the Deduplicate command and tell Cardbox what to look for. Read the manual if you need to.
If this works, rebuild the clean copy because many records will have been deleted. Check it, check it once more and copy it back to the original master/slave. Keep a backup of those for a while just to make sure.
And try to figure out where these duplicates come from. Have a look at the format file. Your predecessor may have changed the keystroke CTRL-S to execute the command "Save As New" instead of "Save". That's just one of the possibilities.
Dear Charles and Bert,
Many thanks for working through this issue with me. I've now a clean data base (and in fact at my wife's suggestion have archived the fulldata base, but now keep current only records starting from the year 2000, a much smaller total of 1,500). My only worry now is what on earth created this problem in the first instance. I'll have a good look at the format file as you suggest, but my first check seems to show that nothing has been altered.
Bert, it is very good of you to offer to have a lok at the data base. However, much of the material is confidential, so I cannot let you see the data. However, I could create a dummy data record or two and let you have that and its format file if you like.
Many thanks again for your time and trouble.
Me again, I'm afraid. The good news is that I have a clean data base on one of my two PCs. A further bit of good news is that I have identified how the duplicates were appearing on my slave PC. What happens is that I work on the data base on the master PC (let's say 20 records are updated, with a data base of 2,000)and so then I need to update the slave's data base.
I do this by deleting on the slave all records from 1,789 (to avoid duplication). I then move to the master data base, tag record 1,800, and then expect to be able to export in internal format 20 records. What in fact happens is the whole data base of 2,000 records is saved for exporting! THIS is the reason for the cursed duplicates I had problems with before.
So why is Cardbox not allowing me to save and then export only the 20 records I've updated?
I think that you forget to take the last necessary step, which is to select the tagged records. When you tag all records from 1800 up, your selection will still contain ALL records until you use Select > Tags.
Repeat your procedure and take a look at the number of records in your selection. Is this 20 or 2000?
But as the Cardbox server is now free, why not install the server? It will make this whole procedure obsolete.
Thanks again for your advice, Charles. Yes, I forgot that step (and the maths was wrong too!) - age is catching up on me. I guess I'll have to think about the server now, especially as the database is as it should be.
By the way, I have the default setting for ordering each record(i.e. the most recent addition appears as the latest record). I also give each record a unique number which is indexed, and as you would expect the most recent record has the highest number (currently 4143) I cannot see from the manual how I might re-order some records so that they appear earlier than they currently do. That is, how would I move record 1300, for example, so that it is positioned within the data base as the 500th record, but retaining its unique number?
Many thanks again for your advice.
Move a record to another position can be done.
Of course, you do not want to do this more then one time.
Thank-you, Bert. Yes, I don't want to do this too often, but what you suggest worked perfectly.
Many thanks again. Have a good weekend.
|© 2010 Cardbox Software Limited|