Azure SQL Database query ideas - table partitions

Azure SQL Database query ideas - table partitions

Knowing more about table partitions in Azure SQL Database...

·

7 min read

This post is part of the blog series created primarily for database engineers, with the hope that it might help you get a head start on your development tasks. With that being said, let us dive in...


Benefits of Partitioning

When tables have a large data volume, it can be challenging to perform certain operations efficiently. For instance, moving data from one table to another, maintaining or deleting data, or even querying data and performing joins can become time consuming and resource intensive. In such scenarios, partitioning tables (in a right manner) can make a huge difference. When situationally appropriate, you can target specific partitions and the operations will take place on that subset of the overall table data, and it will be a much faster and efficient operation. The query optimizer considers partitions on the tables involved in joins, thus optimizing the operation.

Key Terminology

  1. Partition Function - defines how rows of the table will be partitioned based on partitioning column (datatype), number of partitions and boundaries of each partition.

  2. Partition Scheme - maps partitions to one or more filegroups. Note - Azure SQL Database supports only PRIMARY filegroup

  3. Partioning Column - the column that the partition function uses to partition the table (or index)

Creating a Partition Function and Partition Scheme

Let us look at a simple example demonstrating creation of a partition function, partition scheme and a table that partitions data using this partition scheme.

if not exists (
select *
from sys.partition_functions
where name = 'date_partition_function'
)

BEGIN
    IF OBJECT_ID('[dbo].[orders]', 'U') IS NULL
    CREATE PARTITION FUNCTION [date_partition_function](date) AS RANGE RIGHT
FOR VALUES (
    CONVERT(date, DATEADD(day, -7, GETDATE()), 23),
    CONVERT(date, DATEADD(day, -6, GETDATE()), 23),
    CONVERT(date, DATEADD(day, -5, GETDATE()), 23),
    CONVERT(date, DATEADD(day, -4, GETDATE()), 23),
    CONVERT(date, DATEADD(day, -3, GETDATE()), 23),
    CONVERT(date, DATEADD(day, -2, GETDATE()), 23),
    CONVERT(date, DATEADD(day, -1, GETDATE()), 23),
    CONVERT(date, GETDATE(), 23),    
    CONVERT(date, DATEADD(day, 1, GETDATE()), 23),
    CONVERT(date, DATEADD(day, 2, GETDATE()), 23),
    CONVERT(date, DATEADD(day, 3, GETDATE()), 23),
    CONVERT(date, DATEADD(day, 4, GETDATE()), 23),
    CONVERT(date, DATEADD(day, 5, GETDATE()), 23));

END
GO

This creates a partition function named date_partition_function. Range right indicates that the boundary value (e.g. current date) belongs to the right side of the boundary value interval (current date belongs to the interval current date + 1 day).

A partition scheme for this can be created as shown below.

if not exists (
select *
from sys.partition_schemes
where name = 'date_partition_scheme'
)

BEGIN

    CREATE PARTITION SCHEME date_partition_scheme
as PARTITION date_partition_function ALL TO ( [PRIMARY] );

END
GO

An orders table that records all the orders and partitions data by order date can be defined as shown below:

DROP TABLE IF EXISTS [dbo].[orders];

CREATE TABLE [dbo].[orders]
(
    order_date date,
    order_time time(0),
    order_time_zone_offset numeric(4, 0),
    order_time_zone NVARCHAR(4),
    order_division nvarchar(4),
    order_sub_division nvarchar(6), 
    create_date_time datetime2,
    item_code nvarchar(20) NOT NULL,
    item_department nvarchar(20),
    item_price nvarchar(60),
    item_quantity nvarchar(60),
) on date_partition_scheme(order_date)

Building query to generate several partitions

In the partition function code snippet shown above, we have mentioned the partition boundary values individually. This looks acceptable for a smaller number of partitions, but when we need to create tens or hundreds of partitions, there is an alternate way of doing this programatically as shown below.

if not exists (
    select *
    from sys.partition_functions
    where name = 'date_partition_function'
)
BEGIN
    DECLARE @DatePartitionFunction nvarchar(max) = cast(
        'CREATE PARTITION FUNCTION date_partition_function (date)
        AS RANGE RIGHT FOR VALUES (' as nvarchar(max)
        );
    declare @i int;
    set @i = -180;
    WHILE @i < 5 
    BEGIN
        SET @DatePartitionFunction += cast('CONVERT(date, DATEADD(day, ' as nvarchar(max)) + cast(@i as nvarchar(max)) + cast(', GETDATE()), 23), ' as nvarchar(max));
        SET @i += 1;
    END
    SET @DatePartitionFunction += 'CONVERT(date, DATEADD(day, ' + cast(@i as nvarchar) + ', GETDATE()), 23)' + N');';
    EXEC sp_executesql @DatePartitionFunction;
END
GO

Simply create the sql command programatically by concatenating substrings.

Maintaining date partitions and deleting older data using partitions

In above example, the orders table is partitioned by order_date. Consider a scenario where the requirement is to delete all order data that is older than 180 days. In such cases, we can schedule a stored procedure that maintains date partitions i.e. creates partitions for future dates and deletes data from partitions of older dates and merges those old partitions.

declare @partition1value DATE;
declare @partitionmaxvalue  DATE;
set @partition1value = (
    select cast(value as date) from sys.partition_range_values
    where function_id = (
        select function_id from sys.partition_functions
        where name = 'date_partition_function'
    )
      and boundary_id = 1
);
set @partitionmaxvalue = (
    select max(cast(value as date)) from sys.partition_range_values
    where function_id = (select function_id from sys.partition_functions
                         where name = 'date_partition_function'));

if @partition1value <=  cast(DATEADD(day, -181, GETDATE()) as date)
begin
    while @partition1value <= cast(DATEADD(day, -181, GETDATE()) as date) -- while1
begin
-- truncate partition
truncate table dbo.orders with (
                    PARTITIONS(1)
                    );
-- merge partition
ALTER PARTITION FUNCTION date_partition_function()
                            MERGE RANGE(@partition1value);
set @partition1value = (
    select cast(value as date) from sys.partition_range_values
    where function_id = (
        select function_id from sys.partition_functions
        where name = 'date_partition_function'
    )
      and boundary_id = 1
);
end -- end of while1
end  -- end of if statement


-- Split range for new partition
declare @next_date CHAR(10);
set @next_date = (select CONVERT(date, DATEADD(day, 6, GETDATE()), 23));
set @partitionmaxvalue = (select CONVERT(date, DATEADD(day, 1, @partitionmaxvalue), 23));
-- while loop to continue till max partition is >= nextdate
if @partitionmaxvalue <=  @next_date
begin
    while @partitionmaxvalue <= @next_date -- while3
begin
ALTER PARTITION SCHEME [date_partition_scheme] NEXT USED [PRIMARY]
                    ALTER PARTITION FUNCTION [date_partition_function]()
                        SPLIT RANGE(@partitionmaxvalue)
-- set new value for @partitionmaxvalue
set @partitionmaxvalue = (select CONVERT(date, DATEADD(day, 1, @partitionmaxvalue), 23));
end -- end of while3
end -- end of if

There is a likelihood of a deadlock when attempting to alter partition scheme to merge range or split range. One way to resolve this issue is by implementing a wait period followed by a retry.

WHILE @RetryCount < =  300 AND @Success = 0 --while4
BEGIN
    BEGIN TRY
        SET DEADLOCK_PRIORITY HIGH;
                -- Print 'Attempt : ' + CAST (@RetryCount AS VARCHAR(5)) + ' : ';
        ALTER PARTITION SCHEME [date_partition_scheme] NEXT USED [PRIMARY]
        ALTER PARTITION FUNCTION [date_partition_function]()
            SPLIT RANGE(@partitionmaxvalue)
                -- print 'Success!'
        SELECT @Success = 1 -- To exit the loop
    END TRY

    BEGIN CATCH
        --                print cast(ERROR_NUMBER() as varchar(10)) + ' ' + error_message();
        -- Now we check the error number to only use retry logic on these errors
        IF ERROR_NUMBER() IN (  1204, -- SqlOutOfLocks
                                1205, -- SqlDeadlockVictim
                                1222 -- SqlLockRequestTimeout
            )
            BEGIN
                SET @RetryCount = @RetryCount + 1
                -- This delay is to give the blocking transaction time to finish.
                WAITFOR DELAY '00:00:01'
            END
        ELSE
            BEGIN
                -- If we don't have a handler for current error then we throw an exception and abort the loop
                THROW;
            END
    END CATCH
END  -- end of while4 loop

Checking partition details of a table for ad-hoc analysis

If you need to check the left-most or smallest partition value for a partition function, use this query.

select cast(value as date) from sys.partition_range_values
where function_id = (
    select function_id from sys.partition_functions
    where name = 'date_partition_function'
    )
and boundary_id = 1

If you need to check the right-most or largest partition value for a partition function, this is the query.

select max(cast(value as date)) from sys.partition_range_values
        where function_id = (select function_id from sys.partition_functions
            where name = 'date_partition_function');

If you need to check the partitions of a table to make sure they are as expected, you could use a query such as below. This query also helps understand the record count across all the partitions.

-- see partition details of a specific table
SELECT PartitionScheme AS [Partition Scheme Name],
    PartitionFunction AS [Partition Function Name],
    FileGroupName AS [File Group Name],
    rows AS [Record Count],
   -- CAST(SUM(CAST(sf.size AS BIGINT))/131072.0 AS DECIMAL(19,2)) AS [Size GB],
    PartitionFunctionValue AS [Partition Function Value],
    BoundaryId AS [Boundary ID]
FROM
    (select distinct
        ps.Name AS PartitionScheme,
        pf.name AS PartitionFunction,
        fg.name AS FileGroupName,
        p.rows,
        prv.value AS PartitionFunctionValue,
        prv.boundary_id as BoundaryId,
        fg.data_space_id
    from sys.indexes i
        join sys.partitions p ON i.object_id=p.object_id AND i.index_id=p.index_id
        join sys.partition_schemes ps on ps.data_space_id = i.data_space_id
        join sys.partition_functions pf on pf.function_id = ps.function_id
        left join sys.partition_range_values prv on prv.function_id = pf.function_id AND prv.boundary_id = p.partition_number
        join sys.allocation_units au ON au.container_id = p.hobt_id
        join sys.filegroups fg ON fg.data_space_id = au.data_space_id
    where i.object_id = object_id('<Enter your Table name here>')) a
    join sys.sysfiles sf ON a.data_space_id=sf.groupid
GROUP BY PartitionScheme,PartitionFunction,FileGroupName,rows,PartitionFunctionValue, BoundaryId
order by PartitionFunctionValue

If you need to manually remove older data for more than one partitions, you could use a query such as shown below:

-- truncate specific partitions of a table
truncate table "<The table that needs partition truncation>" with (
                    PARTITIONS(1,2,3)
                    );

Performance considerations

For optimal performance when querying on a table with partitions, developers should filter on the partitioning column. More detailed performance considerations are outlined in the documentation.


References

Below are some useful references for concepts covered in this blog post.

https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver16

https://learn.microsoft.com/en-us/sql/t-sql/statements/create-partition-function-transact-sql?view=sql-server-ver16#f-creating-partitions-for-multiple-years

https://learn.microsoft.com/en-us/sql/t-sql/language-elements/string-concatenation-transact-sql?view=sql-server-ver16

https://www.sqlshack.com/sql-server-concatenate-operations-with-sql-plus-and-sql-concat-functions/

https://seniordba.wordpress.com/2016/05/03/using-n-before-strings-in-sql-server/#:~:text=As%20you%20may%20know%2C%20the,different%20national%20languages%20still%20work.


In the series of posts titled "Azure SQL Database query ideas", I will cover some of the topics that I usually encounter as part of my projects.

Did you find this article valuable?

Support Sneh Bhatt by becoming a sponsor. Any amount is appreciated!