In this tip series we have been discussing various techniques that can be used to optimize your SQL Server Analysis Services (SSAS) environment. In this segment we look at best practices for performance optimization for your cube design and development.
In Part 1 of this tip series, I talked about a typical SSAS application architecture and different components of the SSAS internal architecture. I also talked about roles for different components and areas for performance improvement. In Part 2, I talked about the source system design and network considerations to improve performance.
In this tip, I am going to share with you best practices and performance optimization techniques for cube design and development:
Include only those columns in dimension which are required by the business
We might be tempted to include all the columns in the cube dimension from the database dimension, although there is no harm to include them all if all the columns are required for analysis, but if not then don't include them. Including unnecessary columns puts extra overhead on SSAS for managing/storage of these columns and takes longer for processing and querying.
Define attribute relationships or cascading attribute relationships
By default all attributes are related to the key attribute, so define attribute relationships wherever applicable. For example, days roll up into months, months roll up into quarters, quarters roll up into years, etc... This makes queries faster, since it has aggregated 4 quarters or 12 months of data to arrive at yearly figures instead of having to aggregate 365 days. Make sure you don't create redundant attribute relationships, for example "days roll up into month" and "months roll up into quarter" and also "days roll up into quarter" because this would add extra overhead.
Specify the appropriate attribute relationship type
By default an attribute relationship is considered Flexible, but wherever applicable make it Rigid for better performance. If you make it rigid, SSAS doesn't bother updating members of a dimension on subsequent processing and hence improves the performance. Please make sure you are changing relationships to rigid only in cases where it does not change or else you may get exceptions during processing.
Turn Off the Attribute Hierarchy and Use Member Properties
Set AttributeHierarchyEnabled to False for all those attributes ( like Address or List Price etc.) for which you don't need aggregation to be calculated and want them to access it as member properties. Setting the AttributeHierarchyEnabled property improves the processing performance and also reduces the overall cube size as those attributes will not be considered in aggregation and for index creation. This makes sense for all those attributes which have high cardinality or one to one relationships with a key attribute and which are not used for slicing and dicing; for example Address, Phone Numbers, etc...
Appropriately set KeyColumns property
Ensure that the Keycolumns property is set to identify unique values; for example, a month value of 1 is insufficient if the dimension contains more than a single year...so in this case combine Year and Month columns together to make them unique or key columns.
Setting AttributeHierarchyOptimizedState property to Not Optimized
During processing of the primary key attribute, bitmap indexes are created for every related attribute. Building the bitmap indexes for the primary key can take time if it has one or more related attributes with high cardinality (for example Address or Phone number or List price). At query time, the bitmap indexes for these attributes are not useful in speeding up retrieval, since the storage engine still must sift through a large number of distinct values to reach the desired values. Unwanted bitmap indexes increase processing time, increase the cube size as well as they may have a negative impact on query response time. To avoid spending time building unnecessary bitmap indexes during processing set the AttributeHierarchyOptimizedState property to Not Optimized.
Creating user defined hierarchies
You should consider creating user defined hierarchies whenever you have a chain of related attributes in a dimension as that would be a navigation path for end users. You should create at least one user defined hierarchy in a dimension which does not contain a parent-child hierarchy. Please make sure your lower level attribute contains more members than the members of the attribute above it, if this is not a case then your level might be in the wrong order.
AttributeHierarchyVisible property of an attribute
Although it does not impact performance, it's recommended to set AttributeHierarchyVisible to FALSE for all those attributes which have been included in user defined hierarchies, this removes the ambiguous (duplicity) experience to end users.
Defining default member
By default "All member" is considered as a default member for an attribute and hence its recommended to define a default member for an attribute especially in the case where the attribute cannot be aggregated.
Measure Group Design and Optimization
Partitioning the measure groups
Apply a partitioning strategy for all the measure groups (especially those which are quite large in size) and partition them by one or more dimensions as per usage. This will greatly improve the cube processing as well as query performance of the cube.
The processing and query performance improves because of the fact that multiple threads can work together on multiple partitions of a measure group in parallel for processing or for serving query response. You can even define a different aggregation strategy for each partition. For example, you might have a higher percentage aggregation for all those older partitions which are less likely to change whereas a lower percentage of aggregations for those recent partitions which are more likely to change.
Define the aggregation prudently for the measure groups as aggregations reduce the number of values that SSAS has to scan from the disk to generate the response. While having more (all required) aggregations improves the query performance it will be too slow during cube processing whereas if you have too few aggregations it slows down the query performance, but increases the processing performance. Ideally you should start with 20%-30% query performance improvement and can then use the Usage Based Optimization wizard to define more aggregations as discussed below. If you have created partitions on measure groups, you might consider having a higher percentage of aggregation for all those older partitions which are less likely to change whereas lower percentage of aggregations for those recent partitions which are more likely to change. You should not create aggregations that are larger than one-third of the size of the fact data.
You can define the fact table source record count in the EstimatedRows property of each measure group, and you can define attribute member counts in the EstimatedCount property of each attribute. This way you can ensure your metadata is up-to-date which will improve the effectiveness of your aggregation design and creation.
Usage Based Optimization Wizard - Aggregation redefined
Generally we create aggregations to gain 20%-30% performance in the beginning and the later use the Usage Based Optimization wizard to create more aggregations for all the queries being run against the cube. The idea is you enable logging for queries being run against your cube and then you use the collected information as an input to the Usage Based Optimization wizard for creating aggregations for all or long running queries. To learn more about this click here.
AggregationUsage is a property of an attribute which is used by SSAS to determine if the attribute is an aggregation candidate or not. By default SSAS considers only key attributes and attributes in natural hierarchies for inclusion in aggregations. If you find any other attribute which might be used for slicing and dicing then you should consider setting AggregationUsage to Unrestricted for including it in the aggregation design. Avoid setting AggregationUsage property to FULL for an attribute that has many members. You should not create an aggregation that contains several attributes from the same attribute relationship because the implied attribute's value can be calculated from the first attribute of the attribute relationship chain.
IgnoreUnrelatedDimensions property usage
IgnoreUnrelatedDimensions is a property of the measure group which has a default value of TRUE in which case the measure group displays the current amount even for the dimensions which are not related, which might eventually lead to false interpretation. You should consider setting it to FALSE, so a measure group does not ignore an unrelated dimension and to also not show the current amount.
Distinct count measures
Its recommended to have each distinct count measure in a separate measure group for improving performance.
Referenced relationship of dimension and measure group
You should consider materializing the reference dimension if both dimensions and the measure group are from the same cube for improving performance.
When we talk of processing a cube, there are two parts to it, processing data which rebuilds dimensions with attribute store, hierarchy store and fact data store and processing indexes which creates bitmap indexes and defined aggregation. You can execute a single command (ProcessFull) to perform these two operations together or execute separate commands (ProcessData and ProcessIndexes) for each of these operations, this way you can identify how much time each operation is taking.
You might choose to do the full process each time or you might do the full process followed by subsequent incremental processes. No matter what approach you use, SSAS uses job based architecture (creates a controller jobs and many other jobs depending on number of attributes, hierarchies, partitions etc.) for processing dimensions and facts.
Cube processing requires exclusive locks on the objects which are being committed, it means that the object will be unavailable to users during the commit. It also means long running queries against SSAS prevents taking exclusive locks on the objects therefore processing may take longer to complete. To prevent processing and querying interfering with each other you can use a different strategy. You can have a cube (also called processing cube) which gets processed (refreshed with latest set of data from the source) and then another cube (also called querying cube) which gets synchronized with the first cube. The second cube is what users will be accessing. There are several ways to synchronize the second (querying) cube and one of the options is built into the cube synchronization feature.
Cube Synchronization (SSAS database synchronization) synchronizes the destination cube with the source cube with the latest metadata and data. When destination cube is getting synchronized, users can still query destination cube because during synchronization SSAS maintains two copies, one of them gets updated while another one is available for usage. After synchronization SSAS automatically switches the users to the new refreshed copy and drops the outdated one. To learn more about cube synchronization best practices click here.
If you remember from the SSAS architecture, about which I talked about in Part 1 of this tip series, the Query Processor Cache/Formula Engine Cache caches the calculation results whereas the Storage Engine Cache caches aggregated/fact data being queried. This caching technique helps in improving the performance of queries if executed subsequently or if the response of the other queries can be served from the caches. Now the question is, do we really need to wait for first query to complete or can we run the query on its own (pre-execute) and make the cache ready? Yes we can pre-execute one or more frequently used queries or run the CREATE CACHE statement (this one generally runs faster as it does not include cell values) to load the cache and this is what is called Cache Warming.
As a precautionary note, you should not consider that once a query result is cached it will remain there forever; it might be pushed out by other query results if you don't have enough space for additional query result caching.
To clear the formula engine cache and storage engine you can execute this XMLA command:
I have created XMLA to process cube in incremental way. It is using type "ProcessUpdate" for dimensions and "ProcessAdd" for measurement partitions. I am facing one issue on distinct count. Let me take one example
If we browse cube, SSAS shows sum of orders as 1500.00, and distinct customer count for all orders as 5. Now adding new fact record for cancelling one order, e.g.
After incremental processing, it shows sum of orders as 1200.00 which is correct one. But the distinct customer count for all orders keeps same and shows 5 which is incorrect.
I can understand that the rows are getting append on incremental process which works for sum operation, but fails computing distinct count. I want to know if there is any way that can remove order #3 from all aggregate operation while processing in incremental way.