Wednesday, 18 March 2015

Column Store Index in MS SQL 2012 / 2014

Introduction

Column store index is the unique feature of Microsoft SQL Server starting from Microsoft SQL 2012 and improving with Microsoft SQL Server 2014. Here in this article I am going to discuss about the architecture of Column store Index.
Please note that this Column store Index architecture is suitable only for OLAP or BI environment not in OTLP environment, others the performance down.

Architecture of Column Store Index
Before going to the depth of the architecture of column store index in MS SQL 2012, we assume that the we all well known about the row based index and how it load in the memory to fulfill the request of a query.

First of all we look at the figure to understand the column store index properly



Segment Concept:
Each column in the column store index is it own segment. A Segment contains the value from one column only. Each and every column in a table that include in the column store index forms individual segment. Which allows the column data access individually. Each segment can from by multiple pages.

A segment has the limitation of 1 millions of rows. So for a big table it is not be possible to fit single entire columns within a single segment, so single column of a table may be fit with multiple segments.

Data is transferred from the disk to memory by segment, not by page. A segment is a highly compressed Large Object (LOB) that can contain up to one million rows. The data within each column’s segment matches row-by-row so that the rows can always be assembled correctly.

In our example the disk data came from three pages that is Page -1, Page-2 and Page-3. If we crate column store index in the column named “Name” then the segment is created from Page-1, Page-2 and Page-3.

Group Concept:
When a column is not fit in a single segment as the data of columns exceed the one million then columns takes multiple segments to store data. In such cases, multiple segments are created for each column and grouped into multiple row groups, one for each set of segments.

When a column store index is broken into multiple row groups, each row group contains a set of complete rows.  Please look at the bellow figure to understand it properly.



Dictionary Concept:
Besides column segments a column store index consists of another data storage element: dictionaries. Dictionaries are widely used in columnar storage as a means to efficiently encode large data types, like strings. The values stores in the column segments will be just entry numbers in the dictionary, and the actual values are stored in the dictionary. This technique can yield very good compression for repeated values, but yields bad results if the values are all distinct (the required storage actually increases). This is what makes large columns (strings) with distinct values very poor candidates for column store indexes. Column store indexes contain separate dictionaries for each column and string columns contain two types of dictionaries.

Primary Dictionary
This is a global dictionary used by all segments of a column.
Secondary Dictionary
This is an overflow dictionary for entries that did not fit in the primary dictionaries. It can be shared by several segments of a column: the relation between dictionaries and column segments is one-to-many.

Data Compression:
Column store indexes are based on xVelocity (formerly known as VertiPaq), an advanced storage and compression technology that originated with Power Pivot and Analysis Services but has been adapted to SQL Server 2012 databases.
New feature of Column Store in SQL 2014
  • We can create only one non-clustered column store index on a table although we can include all the columns of the table in the single index.

  • SQL Server 2014 enhances it to add support for creating Clustered Column store Index.
  • When we create a column store index and it makes the table read only.

  • With SQL Server 2014, we can create a column store index without having much impact on write-ability on the table. This means we can issue some INSERT, UPDATE, DELETE statements with a table with clustered column store index. No more tedious workaround is required for writing data to a table with column store index in this release like the previous release.




Hope you like it.





Posted by: MR. JOYDEEP DAS