Scenario: I have accumulated data into SQL
Server database from three different servers (oracle, O2 and oracle) for a
financial reporting. After data accumulated , we found more duplicate rows;
business requirement was to remove all duplicate rows because of final data will
be import to another system.
For this purpose, I want to write script to
find out duplicate rows from my table. Finally, I want to delete only duplicate
rows not original row.
Original row means if i have total 2
rows those same or duplicate, so I want to delete only one row that is
duplicate.
Let’s go............................
Step 1: Create a student Table below Script:
CREATE TABLE [Students](
    [ID] [int] NULL,
    [Name] [varchar](50) NULL,
    [address] [varchar](50) NULL,
    [contact] [varchar](50) NULL,
    [email] [varchar](50) NULL
) 
Step 2: insert
Record that shown below:
Step 3: Selecting Duplicate Rows
WITH CTE AS
(
    SELECT ROW_NUMBER() 
          OVER
          (
                  PARTITION BY [ID] , [Name],[address],[contact],[email] 
                        ORDER BY ( [ID])
          ) AS RowID,
          * FROM Students
)
 SELECT * FROM CTE
        WHERE [ID] IN ( SELECT [ID] FROM CTE WHERE RowID >1 ) 
|  | 
| Identify Duplicate Records | 
Step 4:
Deleting Duplicate Rows
WITH CTE AS
(
    SELECT    ROW_NUMBER() 
    OVER
    (
            PARTITION BY [ID] , [Name],[address],[contact],[email] 
            ORDER BY ( [ID])
    ) AS RowID,
    * FROM Students
)
Delete From CTE where RowID Not In
(
    SELECT MIN(RowID)
        FROM
CTE
        WHERE
[ID] IN ( SELECT [ID] FROM CTE where RowID >1)   
)
Result: 2 rows Deleted. 
 Again select duplicate rows by Step-3. Display
No records.
Note: I will discuss ASAP about major keywords
are WITH , ROW_NUMBER()
and PARTITION BY in another post.


Comments
WITH CTE AS
(
SELECT ROW_NUMBER()
OVER(PARTITION BY [ID] , [Name],[address],[contact],[email] ORDER BY [ID]) AS RowID
FROM Students
)
DELETE
FROM CTE
WHERE [RowID] >= 2;
Post a Comment