Meta Analysis Introduction
The SIOS iQ Meta Analysis feature that appears under Performance Root Cause Analysis adds Deep Learning to strengthen iQ’s overall Performance Root Cause Analysis. Deep Learning is a Machine Learning approach that helped AlphaGo master the game of Go and Deep Blue to master Chess. Now, the incarnation of Deep Learning in SIOS iQ will help to identify the root causes of the performance problems across very large dataset (behaviors, topologies, anomalies and patterns over time) events in very dynamic virtualization and cloud environments. Meta Analysis drastically reduces problem identification to a very small number of recurring anomalous behavior patterns and their root cause(s). IT admins can now manage even the largest and “noisiest” environments and can gain insights instantaneously, eliminating hours or days of trial-and-error guesswork when trying to understand and mitigate a problem affecting infrastructure operations and application service delivery.
Problem vs Issue
- An Issue is an incident identified by the Performance Root Cause Analysis feature, powered by patented Topological Behavior Analysis (TBA), that takes place at a particular time in the environment.
- A Problem is a holistic view of the performance issues (incidents) over time that better reveals their root cause and recommendations to address them.
Following these definitions, Performance Root Cause Analysis performs identification of the Performance issue (incident), while Performance Meta Analysis provides identification and root cause analysis of the related Performance Problems along with resolution(s) for them.
How does Performance Meta Analysis work?
As the issues (incidents) are identified in the environment they are gathered and analyzed by the Meta Analysis feature across behaviors, topologies, anomalies, and patterns over time. As a result of Meta Analysis the provided root cause and recommendations are no longer based on individual issues (incidents) but on a problem overall that repeats itself in the environment across the topologies of the objects. Currently Meta Analysis is performed based on the centers of contention (i.e. hosts and datastores).
How to use Performance Meta Analysis
There are two simple workflows for the Performance Root Cause Analysis enabled by Meta Analysis.
PERC Topology Dashboard and PERC Dashboard
All performance related problems identified by Meta Analysis will surface through the PERC Topology and PERC dashboards. When the user selects one of the severities, the user is guided to a list of corresponding problems on the Performance Root Cause dashboard (discussed later in more detail), as illustrated in Figure 1.
Performance Root Cause Dashboard
Another convenient way to access performance related problems identified by SIOS iQ is through the Performance Root Cause dashboard directly.
Once there, the user will have access to the performance problems based on the set filters (default 24 hours, In-Progress) see Figure 2.
Meta Analysis breaks down the problem into two sections: a visual representation on the left and a detailed description along with the root cause and recommendation(s) on the right.
Let’s take a look at each section individually. The Meta Analysis graph on the left displays the problem by breaking down the objects by Root Cause (1), Impacted (2), and Associated (3) (Figure 3).
In addition, Meta Analysis presents the type of problem (4) as well as a button to close the Meta Analysis event once it has been resolved by the user (Set User Resolved in the Details section) (5) analyzed over the selected time frame (6) (Details section). While there could be a number of Root Cause objects identified over time as a result of the analysis, visualizing the relationship edges (7) and highlight(s) (9) reveals that only a subset of the Root Cause objects are actually causing most of the “damage” resulting from the Performance problem. In addition to the visualization, the Root Cause objects along with the recommendations are explicitly listed in the Details on the right (Figure 4). Green “healthy” edges (8) complete the picture of topological relationships, indicating that the connected Associated Objects are not involved in the problem.
The Details section and navigation bar captured in Figure 4 provide functions familiar to the user as well as some new features. A dropdown menu (1) provides the ability to adjust the time-window of the problems identified; problems may be filtered (2) via the Show/Hide selector; and an additional menu (3) provides the ability to browse through the list of problems in Meta Analysis. The dropdown menu at (5) may be used to control the scope of Meta Analysis, i.e., the number of individual incidents analyzed leveraging Meta Analysis. As mentioned above, the Set User Resolved button (6) allows the user to inform the system that the reported and analyzed problem was resolved, and any further analysis should proceed accordingly. Finally, an additional button (4) provides ability to “playback” individual similar incidents. The remainder of the Details section mimics that of P/E/R/C issues existing elsewhere in the SIOS iQ product.
How to use the “Playback” feature of the Performance Root Cause Analysis
By selecting button (4) shown on Figure (4), you can access the playback feature of the problem by investigating individual incidents that were incorporated into the Meta Analysis of the Performance Root Cause Analysis.
SIOS iQ learns the behavior of each individual object across different metrics in the environment leveraging the principles of machine learning and topological behavior analysis. SIOS iQ identifies the anomalies in the behavior that potentially cause the performance issues to the application, correlates the anomalies to derive the relationships and determine the root cause of the problem (such as object or event), and recommends the solution to address it. SIOS iQ then presents any infrastructure component events that may affect performance through the Performance Root Cause Analysis Dashboard (Not Available in SIOS iQ free edition).
The Impact Analysis tab provides information regarding the impacted and associated object(s) for each Issue in the PERC Issue and Performance Root Cause Lists. In List View, these objects can be sorted by Name, Type or Impacted status. Properties and Impact data (for impacted objects) for a specific object can be accessed by selecting it in the list and clicking the Properties or Impact button, respectively. Topology View (the default) provides a comprehensive, interactive graphical representation of the relationships among the root cause, impacted and associated objects or events, each indicated as shown in the legend. Selecting any object in the graph provides access to its Name, Type, Properties and Impact Details as shown in the below.
Symptom Graphs and Learned Behavior
SIOS iQ machine learning develops behavior patterns that appear in the Symptoms graph and Impact Analysis graph which show the learned behavior vs anomalous behavior. The highlighted Learned Behavior region (Best Practices region, when iQ is still in a learning state) represents the expected behavior of the symptom being displayed. Depending upon the Sensitivity setting selected by the user, the learned behavior and its underlying statistical features are combined to determine a decision region, where any data point lying outside this region is identified as an anomaly. Below is a sample image of a symptom graph and a summary of all of its individual parts.
- The Issue Type
- The object displaying the symptom
- The anomalous metric identified as having the most impact to the impacted object
- The history of the given metric
- The red highlighted section shows the duration of the selected event
- The blue highlighted section shows the learned expected behavior
- The observed values of the given metric
- The legend for the symptom graph
Infrastructure Event Correlation
In Performance Root Cause analysis, performance issues may be identified whose true root cause(s) consist of virtualization and infrastructure related events (such as VM migration and VM provisioning). Such events will be correlated and will appear in the list of Root Cause Objects as well as in the Symptom Graph as illustrated below.
|Infrastructure Event Type||Description|
|VM Migration event||Migrated VMs have the potential to introduce greater work load (cpu/memory usage, IOPs, etc) on underlying resource layers (Compute, Storage or Network) and may eventually cause a negative impact on the performance of the related objects (host, datastore, VM, etc).|
|Newly Provisioned VM||Provisioning of the new VM has the potential to introduce greater work load (cpu/memory usage, IOPs, etc) on underlying resource layers (Compute, Storage or Network) and may eventually cause a negative impact on the performance of the related objects (host, datastore, VM, etc).|
Utilizing the topological relationship of corresponding objects, SIOS iQ correlates VM migration and provisioning events with identified Performance Root Cause Issues and identifies whether each event constitutes a true root cause of a related performance issue.
Performance Impact Graph & Symptoms
The Performance Impact graph provides chart information and symptoms metrics regarding the Impacted and Root Cause object(s) for each issue in the PERC Issue and Performance Root Cause lists. The Performance Impact data for Impacted and Root Cause objects can be accessed by selecting Root Cause Object link on the Details tab or selecting the object on the Impact Analysis tab and clicking the Impact button.
What does that mean? And what should I do with this information?
For detailed information about each possible Root Cause event, please see the description in the Specific Issue Details topic.