Planning an Analytical Model
The next logical step in architecting a business intelligence solution is deciding whether an analytical model is necessary. Prior to the release of SQL Server 2012, making that decision was a much simpler process. However, considering the added and enhanced features in that release as well as the latest release, as of this writing, making that determination is now a little more complicated. Why, and what has complicated this process?
In most cases, the driving factors behind developing analytical models is improving query performance time. Developers have come up with some workarounds that circumvent the need for an analytical model, such as building aggregate tables, using indexed views, or adding more hardware. Although these solutions work, they usually only temporarily fix the problem. As data needs grow, the return on either of the aforementioned solutions becomes too expensive or simply just does not work. Recognizing this, Microsoft included a new column-based index within the database engine. By implementing these index types, you can increase query performance several magnitudes over, possibly eliminating the need to develop an analytical model entirely.
Although the new column-based index may improve query performance, when it comes to addressing analytical needs such as complex time-based analysis, key performance indicators, hierarchies, unary operators, and other capabilities that are beyond the scope of a relational model, the only choice is to develop an analytical model. Prior to SQL Server 2008R2, you only had a single type to choose from when developing a business intelligence solution based on the Microsoft stack. However, with later releases of SQL Server, you have three choices: Power Pivot, tabular, and multidimensional. Chapter 5 provides a detailed explanation that will assist you in choosing the right one. However, regardless of which model you select, the planning process is very similar across all three.
Prior to building the model, you should consider the following;
● Make a concerted effort to ensure that the data warehouse schema is almost 100 percent complete, including the data governance and load process. The model development could actually begin without the data; however, little can be done in regards to validating the code without data.
● Ensure that the model requirements have been scoped. This may seem like an obvious step, but often developers start building the model without consulting any stakeholders or end users. Typically, the result is something irrelevant or wrong.
● Decide on data latency. In other words, can the data in the model be 15 minutes behind, 1 hour behind, 1 day behind, or 1 week behind. This decision is ultimately based on end-user needs.
If a decision was made not to develop a data warehouse and instead obtain the data directly from the source, it is possible to have almost real-time access to data by implementing either a Direct Query tabular model or Real-time Online Analytical Process (ROLAP) multidimensional model. Note that this feature is not available when developing a Power Pivot model. These methods can also be implemented if a data warehouse is implemented, but the data will be as fresh as the data in the warehouse. For example, if the data is loaded into the warehouse nightly, then you'll have a 24-hour difference between the data in the source and what you'll see in the model.
On the other hand, a period of latency may be acceptable. If that is the case, you should add a step after the data warehouse is loaded and prior to anyone accessing the data. This step will load or process the data into the model. For tabular data it would be an In-Memory model, and for multidimensional data it would be a Multidimensional Online Analytical Processing (MOLAP) model. In addition, adding this step also opens up the possibility of developing a Power Pivot model, which was not available before.
One more thing to consider is a hybrid scenario where some data can afford latency and some may not. In that case, one solution would be to use a Hybrid Online Analytical Processing (HOLAP) model, in which some objects access data real-time and some require the processing of data.
Planning the Business Intelligence Delivery Solution
With everything in progress or close to completion, the time has come to decide how to deliver everything to end users. More importantly, you must ensure that the selected topology or solution is performant, secure, and available. Implementing a solution that encompasses all three of these characteristics can be time-consuming and costly. Which leads to the question: Are they all required? As a best practice yes, but typically no. The goal of most deployments is to improve performance and to present data in ways that resonate with consumers. Security and availability are usually an afterthought, often not occurring until it's too late. Fortunately, when building a Microsoft business intelligence delivery solution, you have several options available to economically implement a solution that addresses all three.
When implementing Microsoft business intelligence in an enterprise-wide manner, you should consider using SharePoint as the preferred method for delivery due to its collaborative nature and centralized delivery methodology. Its ability to consume data from an array of heterogeneous sources, coupled with the plethora of visualization capabilities, makes it a complete solution that addresses a variety of needs. On the other hand, architecting a solution that performs in a manner that exceeds expectations often requires a subject matter expert. Moreover, ensuring that the deliverables (dashboards, reports, spreadsheets, and so on) are always available and secure can potentially introduce other challenges and requirements.
Considering Performance
As mentioned earlier, performance is often a driving factor behind a large percentage of business intelligence projects. Most of the time this performance is related to accessing data and producing reports. While some may argue, including myself, that a business intelligence project should solve a problem, ultimately the end goal must be granting end users access to data in an effective but also efficient way. Satisfying this goal is accomplished by architecting a server topology that can scale and grow. This is where the Microsoft business intelligence stack lends itself perfectly. Figure 2.4 illustrates a sample SharePoint deployment.
Figure 2.4 Sample Microsoft business intelligence server topology
This configuration is scaled out to support a high number of end users and a growing number of requests. You could reduce the number of servers in this configuration by consolidating the web server and application server, but this could adversely affect performance depending on the number of requests sent to the web server. Therefore, as a best practice, always separate the two in anticipation of some type of growth. In addition, as the end-user base grows, SharePoint provides the flexibility of adding more web and application servers to accommodate the increasing population. With multiple servers in place, not only has scale been introduced, but now SharePoint can use its internal load-balancing features to provide optimal performance for end users.
Considering Availability
The topology presented in Figure 2.3 does offer options to improve scale and support a high level of performance. However, it lacks a key factor that should not go overlooked when implementing the solution: availability. In most cases no one considers availability until a disaster occurs. For example, what happens if the SQL Server that hosts the SharePoint databases shuts down? How much downtime will accumulate prior to the system coming back online? As a result, in addition to installing and configuring more web and application servers for scale and high availability, additional servers should be added to ensure that a high level of availability is maintained for all parts of the configuration.
In Figure 2.4, the Domain Controller, SQL Server and Analysis Servers are single machine. This typically introduces a problem of availability. To ensure that the servers are protected in the event of a failure or disaster means implementing some type of High Availability (HA) or Disaster Recovery (DR) solution.