Skip to main content

Identify and Delete duplicate Records in SQL Server

     
Scenario: I have accumulated data into SQL Server database from three different servers (oracle, O2 and oracle) for a financial reporting. After data accumulated , we found more duplicate rows; business requirement was to remove all duplicate rows because of final data will be import to another system.


For this purpose, I want to write script to find out duplicate rows from my table. Finally, I want to delete only duplicate rows not original row.

Original row means if i have total 2 rows those same or duplicate, so I want to delete only one row that is duplicate.

Business Web Hosting


Let’s go............................

Step 1: Create a student Table below Script:

CREATE TABLE [Students](
    [ID] [int] NULL,
    [Name] [varchar](50) NULL,
    [address] [varchar](50) NULL,
    [contact] [varchar](50) NULL,
    [email] [varchar](50) NULL
)

Step 2: insert Record that shown below:
 
Insert Duplicate Records
  
Step 3: Selecting Duplicate Rows

WITH CTE AS
(
    SELECT ROW_NUMBER()
          OVER
          (
                  PARTITION BY [ID] , [Name],[address],[contact],[email]
                        ORDER BY ( [ID])
          ) AS RowID,
          * FROM Students
)

 SELECT * FROM CTE
        WHERE [ID] IN ( SELECT [ID] FROM CTE WHERE RowID >1 ) 


Identify Duplicate Records

Step 4: Deleting Duplicate Rows

WITH CTE AS
(
    SELECT    ROW_NUMBER()
    OVER
    (
            PARTITION BY [ID] , [Name],[address],[contact],[email]
            ORDER BY ( [ID])
    ) AS RowID,
    * FROM Students
)

Delete From CTE where RowID Not In
(
    SELECT MIN(RowID)
        FROM CTE
        WHERE [ID] IN ( SELECT [ID] FROM CTE where RowID >1)  
)

Result: 2 rows Deleted. 

 Again select duplicate rows by Step-3. Display No records.


Note: I will discuss ASAP about major keywords are WITH , ROW_NUMBER() and PARTITION BY in another post.

Comments

Unknown said…
Why not simply this? Only one scan of the data...

WITH CTE AS
(
SELECT ROW_NUMBER()
OVER(PARTITION BY [ID] , [Name],[address],[contact],[email] ORDER BY [ID]) AS RowID
FROM Students
)
DELETE
FROM CTE
WHERE [RowID] >= 2;

Popular posts from this blog

What are the difference between DDL, DML , DCL and TCL commands?

Database Command Types DDL Data Definition Language  (DDL) statements are used to define the database structure or schema.  -           CREATE - to create objects in the database -           ALTER - alters the structure of the database -           DROP - delete objects from the database -           TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed -           COMMENT - add comments to the data dictionary -           RENAME - rename an object DML Data Manipulation Language  (DML) statements are used for managing data in database. DML commands are not auto-committed. It means changes made by DML command are not permanent to d...

Display Image Immediately from fileUpload Control

Client Side Programming is more efficient for user to very first interection Like Jquery . Now i provide some jquery script to display Image from FileUpload Control directly.. For this purpose we use reference . <script src="../../../JS/JQueryAutocomplete/jquery-1.7.2.js" type="text/javascript"></script>. You just download from jquery.com. Now code below... <script language="javascript" type="text/javascript"> /* Write FileUpload Change event .. */ $('#FileUpload1').bind('change', function() {                                 displayImage(this);                    }); //-------------------------// function displayImage(input)             {        ...

How to check dotnet SDK and runtime version installed

I need to install dot net SDK 5 but first, I need to know which versions of SDK installed on my system. So, use   dotnet --info .  This command will display the SDKs and also runtimes installed on your system with path together where they are installed. For example on my Windows 10 development machine,  dotnet --info   Microsoft Windows [Version 10.0.18363.1082] (c) 2019 Microsoft Corporation. All rights reserved. C:\WINDOWS\system32>dotnet --info .NET Core SDK (reflecting any global.json):  Version:   3.1.101  Commit:    b377529961 Runtime Environment:  OS Name:     Windows  OS Version:  10.0.18363  OS Platform: Windows  RID:         win10-x64  Base Path:   C:\Program Files\dotnet\sdk\3.1.101\ Host (useful for support):   Version: 3.1.8   Commit:  9c1330dedd .NET Core SDKs installed:   1.0.0-preview2-003131 [C:\Program Files\dotnet\sdk...