Tips and Tricks to Growing Your SQL Server Estate from SMB to Enterprise Scale
Thursday, June 28, 2018 - click here to learn more
When I process my neural network data mining model in SQL Server Analysis Services (SSAS), I get the following messages in my Process Progress window.
When I click on one of the informational messages and then click on View Details, I see the message as shown below. The message says "Informational (Data mining): Cardinality reduction has been applied on column, '' of model, 'NeuralNetwork' due to the large number of states in that column. Set MAXIMUM_STATES to increase the number of states considered by the algorithm."
The message is not clear as to which columns are causing the issue. How can I figure out which columns are causing the problem?
Every mining model in SSAS has Algorithm Parameters. To get to these parameters, click on the Mining Models tab and then right-click on the name of the model. In the image below, I have just right-clicked on the term "NeuralNetwork".
In the Algorithm Parameters window, we can see the MAXIMUM_STATES parameter is set to its Default value of 100. When the Value box is blank, then the Default value is used.
We have a few options here. First, we can run a T-SQL query to select the count of distinct values for all of the varchar datatype columns. This might take a while, but it will give us the columns that have a count of distinct values greater than MAXIMUM_STATES. We can then raise the MAXIMUM_STATES value to exceed the largest distinct count. In this example, I have five varchar columns where the count of distinct values exceeds the MAXIMUM_STATES value, which happens to correspond to the number of informational messages displayed in the Process Progress window.
Second, we can choose to Ignore a column. The Resource column is actually a foreign key identifier, so we will ignore it.
Third, we can see if we have stored our data in the wrong datatype. In our example, latitude and longitude are stored as varchar.
However, we can store these values for latitude and longitude as decimals since they are essentially decimals in our data. The neural network handles continuous values extremely well, so we need to use numeric datatypes as much as possible.
We can change our datatype in our source database using the ALTER TABLE command.
alter table [dbo].[tblExample] alter column latitude decimal(9,6) alter table [dbo].[tblExample] alter column longitude decimal(9,6)
We also need to make sure we update our data source view in Visual Studio. Our data source still thinks the latitude and longitude are of datatype varchar(50).
After replacing the table, we can see that our data source view recognizes the latitude and longitude columns as decimals.
Now we need to go to the mining structure and change the Content and Type for the data types.
Change the Type to Double and then change the Content to Continuous as shown below.
After making the above changes, the informational message goes away when processing the neural network mining model.
As we see in this tip, we have options to eliminating these warning messages and improving our neural network accuracy.
Check out these other tips on data mining in SQL Server Analysis Services.
- SQL Server 2012 Analysis Services Association Rules Data Mining Example
- Explaining the Calculations of Probability and Importance for Complex Association Rules in SQL Server 2012 Analysis Services
- Classic Machine Learning Example In SQL Server Analysis Services
- Microsoft Na´ve Bayes Data Mining Model in SQL Server Analysis Services
- Data Mining Clustering Example in SQL Server Analysis Services SSAS
- SQL Server Analysis Services Glossary
Last Update: 2015-11-06
About the author
View all my tips