Azure SQL Database query ideas - table partitions
Knowing more about table partitions in Azure SQL Database...
Table of contents
This post is part of the blog series created primarily for database engineers, with the hope that it might help you get a head start on your development tasks. With that being said, let us dive in...
Benefits of Partitioning
When tables have a large data volume, it can be challenging to perform certain operations efficiently. For instance, moving data from one table to another, maintaining or deleting data, or even querying data and performing joins can become time consuming and resource intensive. In such scenarios, partitioning tables (in a right manner) can make a huge difference. When situationally appropriate, you can target specific partitions and the operations will take place on that subset of the overall table data, and it will be a much faster and efficient operation. The query optimizer considers partitions on the tables involved in joins, thus optimizing the operation.
Key Terminology
Partition Function - defines how rows of the table will be partitioned based on partitioning column (datatype), number of partitions and boundaries of each partition.
Partition Scheme - maps partitions to one or more filegroups. Note - Azure SQL Database supports only PRIMARY filegroup
Partioning Column - the column that the partition function uses to partition the table (or index)
Creating a Partition Function and Partition Scheme
Let us look at a simple example demonstrating creation of a partition function, partition scheme and a table that partitions data using this partition scheme.
if not exists (
select *
from sys.partition_functions
where name = 'date_partition_function'
)
BEGIN
IF OBJECT_ID('[dbo].[orders]', 'U') IS NULL
CREATE PARTITION FUNCTION [date_partition_function](date) AS RANGE RIGHT
FOR VALUES (
CONVERT(date, DATEADD(day, -7, GETDATE()), 23),
CONVERT(date, DATEADD(day, -6, GETDATE()), 23),
CONVERT(date, DATEADD(day, -5, GETDATE()), 23),
CONVERT(date, DATEADD(day, -4, GETDATE()), 23),
CONVERT(date, DATEADD(day, -3, GETDATE()), 23),
CONVERT(date, DATEADD(day, -2, GETDATE()), 23),
CONVERT(date, DATEADD(day, -1, GETDATE()), 23),
CONVERT(date, GETDATE(), 23),
CONVERT(date, DATEADD(day, 1, GETDATE()), 23),
CONVERT(date, DATEADD(day, 2, GETDATE()), 23),
CONVERT(date, DATEADD(day, 3, GETDATE()), 23),
CONVERT(date, DATEADD(day, 4, GETDATE()), 23),
CONVERT(date, DATEADD(day, 5, GETDATE()), 23));
END
GO
This creates a partition function named date_partition_function. Range right indicates that the boundary value (e.g. current date) belongs to the right side of the boundary value interval (current date belongs to the interval current date + 1 day).
A partition scheme for this can be created as shown below.
if not exists (
select *
from sys.partition_schemes
where name = 'date_partition_scheme'
)
BEGIN
CREATE PARTITION SCHEME date_partition_scheme
as PARTITION date_partition_function ALL TO ( [PRIMARY] );
END
GO
An orders table that records all the orders and partitions data by order date can be defined as shown below:
DROP TABLE IF EXISTS [dbo].[orders];
CREATE TABLE [dbo].[orders]
(
order_date date,
order_time time(0),
order_time_zone_offset numeric(4, 0),
order_time_zone NVARCHAR(4),
order_division nvarchar(4),
order_sub_division nvarchar(6),
create_date_time datetime2,
item_code nvarchar(20) NOT NULL,
item_department nvarchar(20),
item_price nvarchar(60),
item_quantity nvarchar(60),
) on date_partition_scheme(order_date)
Building query to generate several partitions
In the partition function code snippet shown above, we have mentioned the partition boundary values individually. This looks acceptable for a smaller number of partitions, but when we need to create tens or hundreds of partitions, there is an alternate way of doing this programatically as shown below.
if not exists (
select *
from sys.partition_functions
where name = 'date_partition_function'
)
BEGIN
DECLARE @DatePartitionFunction nvarchar(max) = cast(
'CREATE PARTITION FUNCTION date_partition_function (date)
AS RANGE RIGHT FOR VALUES (' as nvarchar(max)
);
declare @i int;
set @i = -180;
WHILE @i < 5
BEGIN
SET @DatePartitionFunction += cast('CONVERT(date, DATEADD(day, ' as nvarchar(max)) + cast(@i as nvarchar(max)) + cast(', GETDATE()), 23), ' as nvarchar(max));
SET @i += 1;
END
SET @DatePartitionFunction += 'CONVERT(date, DATEADD(day, ' + cast(@i as nvarchar) + ', GETDATE()), 23)' + N');';
EXEC sp_executesql @DatePartitionFunction;
END
GO
Simply create the sql command programatically by concatenating substrings.
Maintaining date partitions and deleting older data using partitions
In above example, the orders table is partitioned by order_date. Consider a scenario where the requirement is to delete all order data that is older than 180 days. In such cases, we can schedule a stored procedure that maintains date partitions i.e. creates partitions for future dates and deletes data from partitions of older dates and merges those old partitions.
declare @partition1value DATE;
declare @partitionmaxvalue DATE;
set @partition1value = (
select cast(value as date) from sys.partition_range_values
where function_id = (
select function_id from sys.partition_functions
where name = 'date_partition_function'
)
and boundary_id = 1
);
set @partitionmaxvalue = (
select max(cast(value as date)) from sys.partition_range_values
where function_id = (select function_id from sys.partition_functions
where name = 'date_partition_function'));
if @partition1value <= cast(DATEADD(day, -181, GETDATE()) as date)
begin
while @partition1value <= cast(DATEADD(day, -181, GETDATE()) as date) -- while1
begin
-- truncate partition
truncate table dbo.orders with (
PARTITIONS(1)
);
-- merge partition
ALTER PARTITION FUNCTION date_partition_function()
MERGE RANGE(@partition1value);
set @partition1value = (
select cast(value as date) from sys.partition_range_values
where function_id = (
select function_id from sys.partition_functions
where name = 'date_partition_function'
)
and boundary_id = 1
);
end -- end of while1
end -- end of if statement
-- Split range for new partition
declare @next_date CHAR(10);
set @next_date = (select CONVERT(date, DATEADD(day, 6, GETDATE()), 23));
set @partitionmaxvalue = (select CONVERT(date, DATEADD(day, 1, @partitionmaxvalue), 23));
-- while loop to continue till max partition is >= nextdate
if @partitionmaxvalue <= @next_date
begin
while @partitionmaxvalue <= @next_date -- while3
begin
ALTER PARTITION SCHEME [date_partition_scheme] NEXT USED [PRIMARY]
ALTER PARTITION FUNCTION [date_partition_function]()
SPLIT RANGE(@partitionmaxvalue)
-- set new value for @partitionmaxvalue
set @partitionmaxvalue = (select CONVERT(date, DATEADD(day, 1, @partitionmaxvalue), 23));
end -- end of while3
end -- end of if
There is a likelihood of a deadlock when attempting to alter partition scheme to merge range or split range. One way to resolve this issue is by implementing a wait period followed by a retry.
WHILE @RetryCount < = 300 AND @Success = 0 --while4
BEGIN
BEGIN TRY
SET DEADLOCK_PRIORITY HIGH;
-- Print 'Attempt : ' + CAST (@RetryCount AS VARCHAR(5)) + ' : ';
ALTER PARTITION SCHEME [date_partition_scheme] NEXT USED [PRIMARY]
ALTER PARTITION FUNCTION [date_partition_function]()
SPLIT RANGE(@partitionmaxvalue)
-- print 'Success!'
SELECT @Success = 1 -- To exit the loop
END TRY
BEGIN CATCH
-- print cast(ERROR_NUMBER() as varchar(10)) + ' ' + error_message();
-- Now we check the error number to only use retry logic on these errors
IF ERROR_NUMBER() IN ( 1204, -- SqlOutOfLocks
1205, -- SqlDeadlockVictim
1222 -- SqlLockRequestTimeout
)
BEGIN
SET @RetryCount = @RetryCount + 1
-- This delay is to give the blocking transaction time to finish.
WAITFOR DELAY '00:00:01'
END
ELSE
BEGIN
-- If we don't have a handler for current error then we throw an exception and abort the loop
THROW;
END
END CATCH
END -- end of while4 loop
Checking partition details of a table for ad-hoc analysis
If you need to check the left-most or smallest partition value for a partition function, use this query.
select cast(value as date) from sys.partition_range_values
where function_id = (
select function_id from sys.partition_functions
where name = 'date_partition_function'
)
and boundary_id = 1
If you need to check the right-most or largest partition value for a partition function, this is the query.
select max(cast(value as date)) from sys.partition_range_values
where function_id = (select function_id from sys.partition_functions
where name = 'date_partition_function');
If you need to check the partitions of a table to make sure they are as expected, you could use a query such as below. This query also helps understand the record count across all the partitions.
-- see partition details of a specific table
SELECT PartitionScheme AS [Partition Scheme Name],
PartitionFunction AS [Partition Function Name],
FileGroupName AS [File Group Name],
rows AS [Record Count],
-- CAST(SUM(CAST(sf.size AS BIGINT))/131072.0 AS DECIMAL(19,2)) AS [Size GB],
PartitionFunctionValue AS [Partition Function Value],
BoundaryId AS [Boundary ID]
FROM
(select distinct
ps.Name AS PartitionScheme,
pf.name AS PartitionFunction,
fg.name AS FileGroupName,
p.rows,
prv.value AS PartitionFunctionValue,
prv.boundary_id as BoundaryId,
fg.data_space_id
from sys.indexes i
join sys.partitions p ON i.object_id=p.object_id AND i.index_id=p.index_id
join sys.partition_schemes ps on ps.data_space_id = i.data_space_id
join sys.partition_functions pf on pf.function_id = ps.function_id
left join sys.partition_range_values prv on prv.function_id = pf.function_id AND prv.boundary_id = p.partition_number
join sys.allocation_units au ON au.container_id = p.hobt_id
join sys.filegroups fg ON fg.data_space_id = au.data_space_id
where i.object_id = object_id('<Enter your Table name here>')) a
join sys.sysfiles sf ON a.data_space_id=sf.groupid
GROUP BY PartitionScheme,PartitionFunction,FileGroupName,rows,PartitionFunctionValue, BoundaryId
order by PartitionFunctionValue
If you need to manually remove older data for more than one partitions, you could use a query such as shown below:
-- truncate specific partitions of a table
truncate table "<The table that needs partition truncation>" with (
PARTITIONS(1,2,3)
);
Performance considerations
For optimal performance when querying on a table with partitions, developers should filter on the partitioning column. More detailed performance considerations are outlined in the documentation.
References
Below are some useful references for concepts covered in this blog post.
https://www.sqlshack.com/sql-server-concatenate-operations-with-sql-plus-and-sql-concat-functions/
In the series of posts titled "Azure SQL Database query ideas", I will cover some of the topics that I usually encounter as part of my projects.