Reverse Duplicate

23 posts / 0 new
Last post
srivibish
Offline
Joined: 03/04/2017 - 03:27
Reverse Duplicate

Hi,
I have two field say column A and E with ID. The ID's were sort of duplicate like
In 1 st row Column A have 12345 and column E have 34344
in 2 nd row Column A have 34344 and Column E have 12345

Both the rows has same information but the ID alone reversed in the 1st and second row. I need to retain the first and remove the second row in the output. Can anyone help me to achieve this in IDEA?

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Hi srivibish,

I have a few ideas but before I go further can you give me a larger example of your file, can you give me 10 rows of before and what it should look like after.  Also add in records that don't have a problem so I can see the difference.  If you want to create a text or an Excel file with the info you can attach it to your reponse.

Thanks

Brian

srivibish
Offline
Joined: 03/04/2017 - 03:27

Hi Brian,
Thanks for responding. I have attached the excel worbook with sheet 1 having the sample data and sheet 2 with output after removing duplicate.
In that ID and ID1 are the two columns with reverse duplicate. Apart from reverse duplicate there is also one scenerio which has same ID and ID1 in the successive row which is also deleted in the output.
Thanks
Sri

Files: 
Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Hi Sri,

Thanks for the data, I will have a look at it tonight and get back to you or let you know if I have any other questions.

Brian

srivibish
Offline
Joined: 03/04/2017 - 03:27

Thank you for the response. Kindly let me know if you have any questions.

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Hi Sri,

Sorry for not getting back to you sooner, I have been doing some training the last few days so I have been pretty busy. Just a few questions on your spreadsheet. Are the duplicate entries also the next transactions in the database? In your example record 1 and record 2 are copies so you remove record 2, record 3 and 4 are copies so you remove record 4 and so on, is this how it is in the actual database or this is how it just turned out for your example?

Also for record 5 and 6 even though we are removing record 6 the Identifier is different along with the Concat field, is this still considered a duplicate?

Thanks

Brian

srivibish
Offline
Joined: 03/04/2017 - 03:27

Hi Brian,
Please accept my apologies for late response as I was traveling yesterday. Yes, as you mentioned correctly, i.e from records one and two, record two will be deleted which follows for other duplicate cases as well.
 
Regarding the 5 and 6, I am afraid that it was a typo error. For all the duplicates the concat field will be same. 
 
Regards,
Sri

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Hi Sri,

No apologies necessary, we all have lives outside of IDEA :-)

Ok, this is what I came up with.  This assumes that the duplicates are always grouped together, if this is not true I will have to think of something else.  I am using your file for the example.

I first created a new virtual field called Remove, this I am placing a Yes or No to indicate if the row should be removed. 

The equation I used is the following:

@if(@Recno()=1, "No", @if(ID = @GetPreviousValue("ID1") .AND. ID1 = @GetPreviousValue("ID"), "Yes", "No"))

and this is how it looks in the equation editor.

So I first check if it is the first record, if it is I automatically yes that I want to keep that record.  I then use the @getPrevoiusValue functions to check to see if ID is equal to ID1 (from previous records) and ID1 is equal to ID (from previous record), if they are then I say Yes indicating I want to remove this transaction, if not I put a No.

I then just do a direct extract using Remove == "No" to get the non-duplicates.

Hopefully this makes sense and works for your scenario.  If not let me know and we will keep working on it.

Brian

srivibish
Offline
Joined: 03/04/2017 - 03:27

Hi Brian,
This solves a part of the problem. But the catch is I have some instances where duplicate instances are not placed successively i.e., the duplicate pair placed after 10 rows.
Please refer the Data2 sheet in the attached excel. I have included the three fields alone namely ID,ID1 and Concat field in that sheet.

Files: 
Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Hi Sri,

Well I had hoped that wasn't the case but it just makes it more of a challenge lol.  Next question, I notice that you can have duplicate ID numbers like in records 18 and 20 where we have two ID 27860 with the second ID being 21637, is that possible in the main database?  Or are they also considered duplicates.  If they are duplicates I have a solution using joins.  So let me know.

Brian

srivibish
Offline
Joined: 03/04/2017 - 03:27

Hi Brian,

  • Yes, records 18 and 20 will be considered as duplicate. However, I have removed that in my main database using concat of ID. So in my final database the data falls under two scenerios. First one, where the reverse duplicate falls on the successive lines which can be solved based on your equation.The second scenerio which I have mentioned in Data2 sheet where the reverse duplicate doesn't fall in successive lines.

Regards,
Sri

Pages