Reverse Duplicate

23 posts / 0 new

Last post

Tue, 06/27/2017 - 10:47

(Reply to #11) #12

Brian Element

Offline

Joined: 07/11/2012 - 19:57

Hi Sri,

How about trying something like this.

First make sure you remove all the duplicates. First add a virtual field call record number using the @precno function. This way when we are doing the analysis we can trace the record back to the original file.

Then perform Duplicate Key Detection using the ID and ID1 fields as the key.

In your example it will show that records 15, 17, 18 and 20 are duplicates and should be removed from the file.

You can remove them by doing a join with the mod file being the primary and the duplicate file being the secondary. The match is on the record number and you use the Records with no secondary match.

So now we have a file with no duplicates.

Now comes the fun part, I finally decided to create a virtual field adding the ID and ID1 fields together.

This gives a unique key that will be the same if the ID + ID1 is the same as any other ID + ID1

I then did a summary by ID_ADD and in the Fields I selected the REC_NO to be included and I used the Use fields from last occurrence option. This will give me the record number of the second transaction when there is a match.

So the results shows the REC_NO we must remove from the main file.

In this case they are all duplicates, if there are any non-duplicates then we would have to do one additional step by extracting NO_OF_RECS = 2. We join this file back to the main file using the REC_NO as the match and Records with no secondary match as the match option and hopefully you will now have a file with no duplicates.

Let me know how this works out.

Brian

Tue, 06/27/2017 - 11:36

#13

srivibish

Offline

Joined: 03/04/2017 - 03:27

Hi Brain,
I thought about this but I am afraid that what if by any chance two different number add up to same total. Considering the size of the database we can't trace back as well.
Sri

Tue, 06/27/2017 - 12:33

(Reply to #13) #14

Brian Element

Offline

Joined: 07/11/2012 - 19:57

I am thinking this is the way to go but I undersand the possible problem as the numbers could be duplicated. So why not use an algorithm to change the numbers to something that would be more unique. My suggestion would be to take the first ID and multiple each number of that ID by another number and concatenate them together. In this example I used the ASCII code of the number as the multiplier. So if you have the following:

ID 21771 ID1 21318

would be

(2 * 50) & (1 * 49) & (7 * 55) & (7 * 55) & (1 * 49) =

100 & 49 & 385 & 385 & 49 =

So your new ID would be 1004938538549, you do the same for ID1 and then add them together so the ID would really be unique. Do you think that would work? If not I think it comes down to creating a script that will take the first row, look through the rest of the file for a match, save the match record number to a another file, then do the second row and so on. You then use the file to do a join to remove all the duplicates.

Tue, 06/27/2017 - 13:15

(Reply to #14) #15

srivibish

Offline

Joined: 03/04/2017 - 03:27

I think this would be a good way to overcome the same total issue. But could you please tell me how to convert and multiply each element of the ID with the ASCII code. Is there any function that I could use for this?

Tue, 06/27/2017 - 13:58

(Reply to #15) #16

Brian Element

Offline

Joined: 07/11/2012 - 19:57

Hi Sri,

I started to do this in the equation editor but it would have been pretty messy so I created a custom function for you. I have attached it so try it out. It is called SRI_Convert.

Brian

Files:

SRI_Convert.ideafunc

Wed, 06/28/2017 - 02:09

(Reply to #16) #17

srivibish

Offline

Joined: 03/04/2017 - 03:27

Hi Brian,
Thanks for the custom function. But it is unfortunate that after after adding the converted ID's, I could see lot of pairs ending up into same total. I doubt we have any other options open to solve this problem.
Regards,
Sri

Wed, 06/28/2017 - 07:15

(Reply to #17) #18

Brian Element

Offline

Joined: 07/11/2012 - 19:57

I think the final option is to create a script. When I have a bit of time I will put something together. Any chance you can give me a larger file to work with?

Thanks

Brian

Wed, 06/28/2017 - 08:40

(Reply to #18) #19

srivibish

Offline

Joined: 03/04/2017 - 03:27

Sure. One thing is apart from the fields give in the attachment, there will be someother fields in my database. Will that affect the script by any chance.
Regards,
Sri

Files:

sample.xlsx

Wed, 06/28/2017 - 10:08

(Reply to #19) #20

Brian Element

Offline

Joined: 07/11/2012 - 19:57

Thanks for the additional data. No, I will copy over all the information from the records that are kept.

Thu, 06/29/2017 - 07:18

(Reply to #20) #21

Brian Element

Offline

Joined: 07/11/2012 - 19:57

Hi Sri,

Ok, you want to try this script out. What it does is display a dialog in which you select your file and the two ID Columns. The script then adds a recno field to the main file and goes through the file extracting the record number of duplicates to a new file. Once this is completed it joins the main file with the duplicate record number file and removes all those record numbers.

Some limitations, I did this script fairly quicly so there is not much in error checking. Also right now it assumes that the ID columns are numeric, if they are not that can be easily fixed but as it is now it will give an error. Also depending on how large a file you have and the power of your computer it might take some time to process.

So let me know how it works for you.

Brian

Files:

Reverse Duplicates.iss

Main menu

Author Information

Reverse Duplicate

Pages

Main menu

You are here

Author Information

Reverse Duplicate

Pages