Tuesday, May 28, 2013

Deleting duplicates in the DB2 with the IBM Control Center

Sometimes you have to do just a small task on the database and if you don't have/want to use a command line, you can use an interface like the Control Center.

If you have tables with duplicate values in the DB2 and the tables have primary keys, let say ID, it is easy to avoid the duplicates.

First copy the table. Use the

create table message_table_copy ...

query and check that all fields you want to copy have the same types. 

Than you can load the data into the message_table_copy. The table from which you are coping and the table to which you are coping, in this case  message_table and message_table_copy should belong to the same schema. Choose Load (right-click menu) -> Add data -> from cursor. Than you can choose the data you want to copy (Select * from message_table if you copy the whole table) and the folder where the status information have to be stored (usually on the server, so choose the home directory of your db2 instance on the server). All other tabs are default except of the tab where you have to choose to perform the task immediately or to add in the task manager.

Now you can check if all values were copied. You can just count the number of entries in both tables:

select count(*) from message_table 

If everything is ok, look for duplicates. Let say, you have message_id which should be used to define duplicates. Here is the SQL query to delete the duplicates from the message_table

delete from message_table where id in 
          (SELECT MIN(ID) FROM message_table
                          GROUP BY message_id 
                          HAVING COUNT(message_id) >1) )

The query searches for entries with same message_id (INT, SMALLINT, BIGINT) and chooses the id ( you can choose between MAX(id)/MIN(id)) that is used for deleting.