Skip to main content

What is an Index? Explain Custered Index and Non-Clustered Index

Index

Index is a database object, which can be created on one or more columns (16 Max column combination). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify. So it depends on how much data retrieval can be performed on table versus how much of DML (Insert, Delete and Update) operations.

clustered index

A clustered index is something that reorganizes the way records in the table are physically stored. Therefore a table can have only one clustered index. The leaf nodes of a clustered index contain the data pages, by which I mean the key-value pair in the clustered index has the index key and the actual data value. Also remember, a clustered index will be created on a table by default the moment a primary key is created on the table. A clustered index is something like your train ticket B4/24, you know that you need to board coach B4 and sit on seat number 24. So this index physically leads you to your actual seat.


You should be using indexes to help SQL server performance. Usually that implies that columns that are used to find rows in a table are indexed.Clustered indexes makes SQL server order the rows on disk according to the index order. This implies that if you access data in the order of a clustered index, then the data will be present on disk in the correct order. However if the column(s) that have a clustered index is frequently changed, then the row(s) will move around on disk, causing overhead - which generally is not a good idea.
Now let’s build a clustered index on the ID column of the Sales table.


CREATE CLUSTERED INDEX ClusteredIndexNake ON TABLE(COLUMN); 
CREATE CLUSTERED INDEX CL_ID ON SALES(ID); 
 
Some things to remember when using clustered indexes:
  • The reordering of the index occurs every time the index changes (ie: on Updates, Inserts, Deletes).
  • Affects the physical order of data so there can only one clustered index.
  • Keeps the rows in order within a page (8k) of data. The pages are not physically ordered except after an ordered load or re-index based on that cluster.
  • Re-orders the way records in the table are physically stored.
  • Choose this index wisely as there can only be one. Rule of thumb: Apply to a unique, somewhat ordered, and commonly queried column.
  • Like the pages of content in a book. Each page is a collection of data. The order (page numbers 1, 2, 3, etc.) that the data is stored in is controlled by the clustered index.
There a few things to keep in mind when changing the default clustered index in a table:
  1. Lookups from non-clustered indexes must look up the query pointer in the clustered index to get the pointer to the actual data records instead of going directly to the data on disk (usually this performance hit is negligble).
  2. Inserts will be slower because the insert must be added in the exact right place in the clustered index. (NOTE: This does not re-order the data pages. It just inserts the record in the correct order in the page that it corresponds to. Data pages are stored as doubly-linked lists so each page is pointed to by the previous and next. Therefore, it is not important to reorder the pages, just their pointers and that is only in the case where the newly inserted row causes a new data page to be created.)
Best Practices for Clustered Indexes
  • Large amount of selects on a table, create a clustered index on the primary key of the table. Then create non-clustered indexes for all other columns used in selects and searches. Put non-clustered indexes on foreign key/primary key columns that are used in joins.

Non-clustered Indexes

Non-clustered indexes are not copies of the table but a sorting of the columns you specify that "point" back to the data pages in the clustered index. This is why the clustered index you choose is so important because if effects all other indexes.
There are 2 modes for non-clustered indexes, Non-unique and unique. Non-Unique means that the index does not act as a constraint on the table and does not prevent identical rows from being inserted. Unique constraints mean that the index prevents any identical rows from being inserted.
Now let’s build a clustered index on the ID column of the Sales table.
 
CREATE NONCLUSTERED INDEX NCL_ID  ON SALES (ID); 
  • Does not re-order the actual table data.
  • Sometimes called a "heap table" for tables lacking clustered indexes because it points to the actual data pages that are essentially unordered and non-indexed.
  • If no clustered index, non-clustered indexes point to the actual data in the table.
  • If clustered index present, non-clustered index point to clustered index.
  • Logical order of the index does not match the physical stored order of the rows on disk.
  • Similar to an index in the back of a book. The actual data is stored in the pages of the book but the index reorders and stores a pointer to each data value.
Best Practices for Non-clustered Indexes
  • Add non-clustered indexes for queries that return smaller result sets. Large results will have to read more table pages anyway so they will not benefit as much from a non-clustered index.
  • Add to columns used in WHERE clauses that return exact matches.
  • If a clustered index is not used on these columns, add an index for collections of distinct values that are commonly queried such as a first and last name column group.
  • Add for all columns grouped together for a given query that is expensive or very common on a large data table.
  • Add to foreign-key columns where joins are common that are not covered by the clustered index.
Indexes are a lot of "trial and error" depending on your database design, SQL queries, and database size.


Comments

Popular posts from this blog

What is RDBMS?

In our crucial programming world, RDBMS is stand for Relational Database Management Systems is based on the Relational Model that maintain data records and index in tables. Generally, the term "Relations" are maintained in RDMS.  In a relational database,store the data into collection of tables, which might be related by common fields (database table columns). RDBMS also provide relational operators to manipulate the data stored into the database tables. Most RDBMS use SQL as database query language. There are several objects are included in RDBMS like ,Indexes,Synonyms,Tables,Views Clusters,Constraints,Database links,Database triggers. Most popular RDBMS  are Oracle, MS-SqlServer, MySql,DB2. etc.

What are the difference between DDL, DML , DCL and TCL commands?

Database Command Types DDL Data Definition Language  (DDL) statements are used to define the database structure or schema.  -           CREATE - to create objects in the database -           ALTER - alters the structure of the database -           DROP - delete objects from the database -           TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed -           COMMENT - add comments to the data dictionary -           RENAME - rename an object DML Data Manipulation Language  (DML) statements are used for managing data in database. DML commands are not auto-committed. It means changes made by DML command are not permanent to database, it can be rolled back. -           SELECT – retrieve data from the a database -           INSERT – insert data into a table -           UPDATE – updates existing data within a table -           DELETE – deletes all records from a table, the space for the records rema

Identify and Delete duplicate Records in SQL Server

      S cenario: I have accumulated data into SQL Server database from three different servers (oracle, O2 and oracle) for a financial reporting. After data accumulated , we found more duplicate rows; business requirement was to remove all duplicate rows because of final data will be import to another system. For this purpose, I want to write script to find out duplicate rows from my table. Finally, I want to delete only duplicate rows not original row. Original row means if i have total 2 rows those same or duplicate, so I want to delete only one row that is duplicate. Let’s go............................ Step 1: Create a student Table below Script: CREATE TABLE [Students] (     [ID] [int] NULL,     [Name] [varchar] ( 50 ) NULL,     [address] [varchar] ( 50 ) NULL,     [contact] [varchar] ( 50 ) NULL,     [email] [varchar] ( 50 ) NULL ) Step 2: insert Record that shown below:   Insert Duplicate Records     Step 3: Sel