Performance Issues

Storage Contention

One of the most common performance issues that affects applications in cloud/virtualization environments is related to storage. More specifically performance issues are related to datastore contention. A datastore is an object that is shared with VMs on the same host and/or on different hosts within an environment. Datastore contention can be caused by many different events within the environment. The datastore contention will be revealed by an abnormal increase in IO latency. Although IO latency potentially affects all of the applications on that datastore, in a Storage Contention issue, the impact is isolated to the datastore alone.

Severity Root Cause Type Layer Symptoms Impacted Objects Associated Objects
Datastore Storage Contention Storage Latency increase N/A Related Host(s) and VM(s)
Datastore Storage Contention Storage IOPS increase N/A Related Host(s) and VM(s)

What does that mean? And what should I do with this information?

Abnormal behavior in datastore latency generates a Warning issue and may indicate that the datastore or the infrastructure that supports the datastore is experiencing performance degradation. An Information issue suggests an increase in the workload. Regardless of severity, these issues indicate that, at this time, the VMs are not affected. Either issue should be investigated to determine the potential for problems to develop in the future (such as hardware failures or the first sign of an infrastructure issue that supports the datastore).

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, and if latency is involved it may progress and impact other workloads, causing storage contention.

In addition, other recommendations may be attached based on other issues occurring in your system. The following may be seen:

  • Recommendation
    • Move Num candidate workloads to Flash technology identified in the Storage Acceleration Dashboard to improve performance of datastore and remediate contention issue.

Application Impact

Application Impact indicates that an anomaly has been discovered and is currently isolated to the VM(s) and supported application(s) only. Types of anomalies that fall into this category are: supported Application CPU% utilization, VM CPU% utilization, VM virtual memory utilization, VM/CPU ready time, VM virtual disk latency and/or IOPS, and VM network usage.

Severity Root Cause Type Layer Symptoms Impacted Objects Associated Objects
VM(s) Application Impact Storage Latency increase N/A Associated VM(s), Host(s), and Datastore
VM(s) Application Impact Storage Latency increase VM(s) Host(s) and Datastore
VM(s) Application Impact Compute CPU Ready N/A Associated VM(s) and Host(s)
Supported applications or VM(s) Application Impact Compute CPU and/or virtual memory utilization increase N/A Associated VM(s) and Host(s)
VM(s) Application Impact Storage IOPS increase N/A Associated VM(s), Host(s), and Datastore
VM(s) Application Impact Network Increasing and/or Decreasing Network Workload VM(s) Associated VM(s) and Host

What does that mean? And what should I do with this information?

The Critical and Information severity issues above indicate that abnormal behavior has been identified (in metrics such as CPU Ready, IOPS, or Network Workload). The Information severity rating suggests that application behavior has changed significantly enough to be considered an anomaly and a possible application problem in cases where the Root Cause is the supported application. The elevated severity of Critical signals a high probability that the application running on the identified VM(s) is being impacted.

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, and if latency is involved it may progress and impact other workloads, causing storage contention.
    • Continue observing this issue, as vMemory and/or CPU utilization may progress and impact other workloads(s).
    • Continue observing this issue, as changes in Network workload may progress and impact other workload(s).

Application Storage Contention

Application Storage Contention indicates that SIOS iQ has identified a performance issue caused by:

  • An excessive load (IOPS) on the datastore that is caused by the VM(s) that impact the datastore. The root cause of this type of issue (known in the IT industry as the “noisy neighbor”) points to the VMs that are identified and responsible for the issue.
  • Anomalous latencies have been identified on the datastore that also impact the VM(s) on the datastore. In this case the VM(s) behavior is considered normal (expected). This issue indicates that the datastore, or the infrastructure that supports this datastore, is experiencing some unexpected behavior (such as an overloaded datastore). Additional capacity related adjustments may be required such as creating a new datastore and relocating the VM(s) to offload the datastore.

Severity Root Cause Type Layer Symptoms Impacted Objects Associated Objects
Datastore Application Storage Contention Storage Latency increase VM(s) Associated VM(s) and Host(s)
VM(s) Application Storage Contention Storage Latency and IOPS increase Datastore and possibly VM(s) Associated VM(s) and Host(s)
VM(s) Application Storage Contention Storage IOPS increase Datastore (Latency) Associated VM(s) and Host(s)
VM(s) Application Storage Contention Storage IOPS increase Datastore (IOPS) Associated VM(s) and Host(s)

What does that mean? And what should I do with this information?

The Critical severity issue when the root cause is a datastore indicates that anomalous latencies were identified on the datastore and the related VM(s). No anomalous behavior (such as IOPS) has been identified in the VMs (no “noisy neighbors”). To determine whether any objects in the configuration related to the root cause are impacted, navigate to the Impact Analysis tab of the selected issue. To observe the impact, click on the Performance Impact button after selecting the object. This issue may indicate that the datastore, or the infrastructure that supports the datastore, is experiencing some performance degradation.

  • Recommendations:
    • Verify that the datastore identified as the Root Cause object is not experiencing any hardware issues, as the degradation may indicate the first sign of hardware-related problems.
    • Rebalance VM(s) on the datastore identified as the Root Cause object(s) to a different datastore.

The Critical severity issue when the root cause is a VM(s) indicates that a VM(s) has been identified as a “noisy neighbor”. This problem impacts and degrades the performance of the datastore and can potentially impact related VMs. To determine whether any objects in the configuration related to the root cause are impacted, navigate to the Impact Analysis tab of the selected issue. To observe the impact, click on the Performance Impact button after selecting the object.

  • Recommendations:
    • Rebalance VM(s) identified as the Root Cause object(s) to a different datastore.

The Warning severity issue when the root cause is a VM(s) indicates that the datastore is experiencing increased latency. To determine the impact on the datastore click on the Performance Impact button on the datastore selected in the Impact Analysis. This issue may indicate that the application behavior has changed significantly enough to be identified as an anomaly.

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, as the latency from the datastore may impact the workload(s).

For issues in which anomalous workload behavior (increased IOPS) has been identified on the VM(s) and the datastore, the severity of the issue is downgraded to Informational.

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, and if IOPS continues to increase, workload(s) may be impacted.

Application Network Contention

Like other shared virtualized resources, a Host’s network resources are shared among its guested VMs. Generally speaking, the more VMs are guested on one Host, the fewer network resources are available to each VM. Also, if one VM consumes a larger share of the Host’s network resources, other VMs guesting on the same Host will have fewer network resources to use.

Fluctuations in actual network usage can indicate problems in balancing, resource levels, or hardware. A drop in network usage for a Host can indicate a hardware or infrastructure failure. A drop in network usage for a VM can indicate that either the Host is over-provisioned or that another VM is using too much of the Host’s resources. An increase in network usage for a VM can indicate that other VMs guesting on the same Host will not have enough network resources available to them.

Application Network Contention indicates SIOS iQ has identified a performance problem caused by increased or decreased Network Workload on one or more VMs and their associated Host. When dealing with Network Workload, it is important to remember that both higher-than-expected values AND lower-than-expected values can indicate a problem in both VM(s) and Hosts.

Severity Root Cause Type Layer Symptoms Impacted Objects Associated Objects
Host Application Network Contention Network Increasing and Decreasing Network Workload VM(s) Associated VM(s)
VM(s) Application Network Contention Network Increasing and/or Decreasing Network Workload VM(s) & Host Associated VM(s)
Host Application Network Contention Network Increasing Network Workload VM(s) Associated VM(s)
VM(s) Application Network Contention Network Increasing AND Decreasing Network Workload Host Associated VM(s)
Host Application Network Contention Network Decreasing Network Workload VM(s) Associated VM(s)

What does that mean? And what should I do with this information?

In the first issue type listed, a Host has experienced increased and decreased Network Workload, and this anomalous behavior has spread to one or more of the Host’s associated VM(s).

  • Recommendations:
    • Verify that there was a network hardware/infrastructure failure, as the observed degradation may indicate the first signs of hardware-related problems.
    • Rebalance VM(s) identified as the Root Cause object(s) across the available hosts.

In the second issue type listed, at least one VM has experienced increased and/or decreased Network workload, and this anomalous behavior has spread to both the Host and one or more other VMs associated with the root cause VM.

  • Recommendations:
    • Verify that changes in Network workload are not the result of malicious activity.
    • Rebalance VM(s) identified as the Root Cause object(s) across the available hosts.

In the third issue type listed, a Host has experienced increased Network Workload, and the anomalous behavior has spread to one or more of the Host’s associated VMs.

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, as changes in Network workload may progress and impact other workload(s).

In the fourth issue type listed, at least one VM has experienced both increased AND decreased Network Workload during a short period of time. The anomalous behavior has spread to the associated Host of the VM(s) affected.

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, as changes in Network workload may progress and impact other workload(s).

In the fifth issue type listed, a Host has experienced decreased Network Workload, and the anomalous behavior has spread to one or more VMs. Note that in this case, no VM or Host has experienced increased network workload.

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, as changes in Network workload may progress and impact other workload(s).

Compute Contention

One of the many benefits of virtualization is to provide the ability to host multiple applications on a single physical host reducing capital and operational expenses. The goal is to place as many applications on the host as possible in order to take advantage of the benefits of the virtualization technology. It is important at all times to have a good understanding of whether or not the physical host is capable of sustaining the load. Keeping the balance is challenging especially when workloads are dynamic and no one is protected from infrastructure failures that would reduce the compute capacity and its availability. As a result, another common set of issues (in addition to storage) is related to the contention of the compute (CPU and/or memory) resources. Typically, cloud/virtualization platforms partition a physical host (ESXi in the case of VMware) to provide compute resources to the hosted VM(s). Any type of compute (CPU and/or memory) contention starts from the host that attempts to mitigate the risk of affecting the application by leveraging available mechanisms (such as memory ballooning). If contention is too severe it impacts the applications that reside on the VM(s). It is important to analyze compute resources at the application, VM, and host levels in order to understand whether the host is simply under pressure and no applications are affected (which is okay), or it is actually experiencing the severe symptoms of contention (such as memory swapping) that affect the performance of the applications. Below is the list of issue types, with corresponding severities, that SIOS iQ will report when Compute Contention is observed.

Severity Root Cause Type Layer Symptoms Impacted Objects Associated Objects
Host Compute Contention Compute Increase in utilization of CPU and/or virtual memory N/A Related VM(s)
Host Compute Contention Compute Increase in CPU Ready Time, Memory Swapping and/or Memory Ballooning N/A Related VM(s)

What does that mean? And what should I do with this information?

The Information Severity issue indicates that an anomalous workload increase is being observed in the following host counters: CPU and/or Virtual Memory. There is currently no application impact identified. To determine the impact to the host, click on the host name in the Root Cause Objects section of the issue Details tab. This type of issue may occur if new workload(s) (application(s) installed on the VM(s) or VM(s) themselves) were added to the host or existing application(s) changed its workload(s).

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.

The Warning type of Severity issue indicates that an anomalous workload increase is being observed in the following host counters: CPU Ready Time, Memory Swapping, and/or Memory Ballooning. There is currently no application impact identified. To determine the impact to the host, click on the host name in the Root Cause Objects section of the issue Details tab. This type of issue may occur if new workload(s) (application(s) installed on the VM(s) or VM(s) themselves) were added to the host or existing application(s) changed its workload(s).

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, as Memory Ballooning may increase, impacting other workload(s).
    • Continue observing this issue, as CPU Ready Time may increase, impacting other workload(s).
    • Continue observing this issue, as Memory Swapping may increase, impacting other workload(s).

In addition, other recommendations may be attached based on other issues occurring in your system. The following may be seen:

  • Recommendation
    • Resize Num identified Oversized VMs to reduce the compute resource contention on the host.

Application Compute Contention

Application Compute Contention indicates that SIOS iQ has identified an anomalous compute (CPU and/or memory related) workload increase that is potentially impacting a host and related VM(s) or application(s). The Root Cause Objects section presents the list of the objects(s) (such as VMs or applications) that are experiencing the anomalous workload increase and causing excessive pressure on the host. The Severity of the issue specifies its criticality.

Severity Root Cause Type Layer Symptoms Impacted Objects Associated Objects
VM(s) or supported application(s) Application Compute Contention Compute CPU and/or virtual memory utilization increase Host Related VM(s) and supported application(s)
VM(s) or supported application(s) Application Compute Contention Compute Memory Ballooning and/or Memory Swapping on Host Host Related VM(s) and supported application(s)
VM(s), supported application(s), and Host Application Compute Contention Compute VM(s) and/or application(s) and Host having different (possibly unrelated) increase in CPU and/or virtual memory utilization None Related VM(s) and supported application(s)
VM(s), supported application(s), and Host Application Compute Contention Compute Memory Ballooning and/or Memory Swapping on Host None Related VM(s) and supported application(s)
VM(s), supported application(s), and/or Host Application Compute Contention Compute Increase in CPU Ready Time on Host only Host, related VM(s), and/or supported application(s) Related VM(s) and supported application(s)
VM(s), supported application(s), and/or Host Application Compute Contention Compute Increase in CPU Ready Time on VM(s) Host, related VM(s), and/or supported application(s) Related VM(s) and supported application(s)

What does that mean? And what should I do with this information?

The first two issues listed in the table above indicate that listed VM(s) or supported application(s) in the Root Cause Objects section are experiencing an anomalous workload increase; however, no impact was observed on the application(s) and related VM(s). Although these issues are generally Informational, when the host is experiencing a Memory Ballooning and/or Memory Swapping increase, the issue severity is elevated to a Warning. To determine the impact on the host click on the host name under the Impact Analysis tab. To evaluate the workload increase observed by the VM(s), inspect each individual VM and application listed in the Root Cause Objects section under the Details tab for the issue. These types of issue may occur if new workload(s) (application(s) on the VM(s) or the VM(s) themselves) were added to the host or existing application(s) changed its workload(s).

In the third and fourth cases listed above, SIOS iQ has identified anomalous workload increases on the host, VM(s), and supported application(s) that are likely unrelated such as: VM(s) is(are) experiencing an increase in CPU utilization, while the host is experiencing an increase in the virtual memory utilization or vice versa. This type of issue may occur if new workload(s) (application(s) on the VM(s) or the VM(s) themselves) were added to the host or existing application(s) changed its workload(s). Although these issues are generally Informational, when the host is experiencing a Memory Ballooning and/or Memory Swapping increase, the issue severity is elevated to a Warning.

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, as Memory Ballooning may increase, impacting other workload(s).
    • Continue observing this issue, as Memory Swapping may increase, impacting other workload(s).

The fifth case in the list is a Warning severity issue. It indicates that listed VM(s) and/or supported application(s) are experiencing an anomalous workload increase, accompanied by an increase in CPU Ready Time on their Host. There is a future possibility of application impact in this case that may eventually require attention. To determine the impact click on the impacted objects under the Impact Analysis tab. To evaluate the VM(s), supported application(s) or Host that caused the issue, inspect each individual object listed in the Root Cause Objects section under the Details tab for the issue.

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Continue observing this issue, as CPU Ready Time may increase impacting other workload(s).

The last issue listed above is Critical and indicates that the identified VM(s) are experiencing an anomalous increase in CPU Ready Time. VM(s) (and possibly host) symptoms indicate that there is a high risk of application impact. To determine the impact click on the impacted objects under the Impact Analysis tab. To evaluate the VM(s) and/or host that caused the issue, inspect each individual object listed in the Root Cause Objects section under the Details tab for the selected issue.

  • Recommendations:
    • Verify that the increase in workload is not the result of malicious activity.
    • Rebalance VM(s) identified as the Root Cause object(s) across the available hosts, as the host is overprovisioned.

Efficiency Issues

Storage Acceleration Candidates

Read-cache is a valuable feature used to accelerate workload performance. The challenge is to understand the profile of the workload that is most suitable for caching and the required configuration parameters. SIOS iQ provides the ability to identify the VMs that are candidates for acceleration through a caching technology, provides recommendations on the necessary configuration parameters, and forecasts the expected improvement once cache is applied.

Severity Root Cause Type Layer Symptoms
VM disk usage Storage Performance Optimization Storage Certain disk usage patterns

What does that mean? And what should I do with this information?

This information issue indicates that the given VM is suited for Storage Acceleration. For more details navigate to the Storage Acceleration dashboard by selecting the “View in Dashboard” button.

  • Recommendation:
    • Based on the learned behavior of the application workload, SIOS iQ has identified the Root Cause VM as a candidate for Storage Acceleration to improve application performance and reduce infrastructure workload.

Undersized VM

Severity Root Cause Type Layer Symptoms
Undersized VM Compute Efficiency Optimization Compute CPU and/or Memory Utilization above given thresholds

This Warning issue (Undersized VM) indicates that a VM has been determined to be undersized in terms of vCPU, vMemory, or both.

  • Recommendation:
    • Add x vCPU to avoid the observed over utilization of the identified Root Cause object VM – if applicable.
    • Add x GB of vMemory to avoid the over utilization of the identified Root Cause object VM – if applicable.

Oversized VM

This issue indicates that a particular VM has consistently been underutilizing its allocated resources, failing to achieve specified vCPU or vMemory thresholds (or both).

Severity Root Cause Type Layer Symptoms
Oversized VM Compute Efficiency Optimization Compute CPU and/or Memory Utilization below given thresholds

What does that mean? And what should I do with this information?

This informational issue indicates that the reported VM should be considered for reclamation of Compute resources as suggested in the accompanying recommendation. For more details navigate to the Oversized VMs dashboard by selecting the View in Dashboard button.

  • Recommendation:
    • Based on the behavior of the workload serviced by the VM identified as a Root Cause Object reduce vCPU from x to y in order to improve the efficiency of your environment – if applicable.
    • Based on the behavior of the workload serviced by the VM identified as a Root Cause Object reduce vMemory from x to y GB in order to improve the efficiency of your environment – if applicable.

VM Sprawl

It is not uncommon for VMs to be provisioned for single purposes and for these VMs to become fragmented, scattered, and forgotten, continuing to consume resources that could be better reallocated.

Severity Root Cause Type Layer Symptoms
Idle VM(s) Compute Efficiency Optimization Compute CPU, Disk, and Network Utilization below given thresholds

What does that mean? And what should I do with this information?

This informational issue indicates that the reported VM should be considered for powering off or deletion to more efficiently use Compute resources. For more details navigate to the Idle VMs dashboard by selecting the View in Dashboard button.

  • Recommendation:
    • Based on CPU, IO, and Network utilization, SIOS iQ has identified that the VM identified as a Root Cause object should be considered for powering off or deletion.

Snapshot Sprawl

Just as it is simple to take a snapshot of a VM, it is similarly easy to forget that a snapshot was taken, or perhaps to have snapshot tool failure which can quickly consume disk space.

Severity Root Cause Type Layer Symptoms
VM Snapshots Storage Efficiency Optimization Storage VM Snapshots

What does that mean? And what should I do with this information?

This informational issue reports that a VM Snapshot was found for a particular VM. For more details navigate to the VM Snapshots dashboard by selecting the View in Dashboard button.

  • Recommendation:
    • To improve storage efficiency, SIOS iQ recommends merging or deleting the identified snapshots.

Rogue VM Instances

This type of issue indicates that one or more “rogue” virtual machines were found. A rogue virtual machine instance occurs when the virtual disk files for a virtual machine are found on a datastore without a known virtual machine in the virtual environment. This can happen, for example, if a virtual machine is removed but the user doesn’t select the “remove from disk” option, thus leaving the virtual machine’s files behind on the datastore. For more details, including the VM Instance Path to the data files, navigate to the Rogue VM Instances dashboard by selecting the View in Dashboard button.

Severity Root Cause Type Layer
Rogue VM Instances Storage Efficiency Optimization Storage

What does that mean? And what should I do with this information?

This informational issue reports that a rogue VM was found that may be wasting space on a datastore.

  • Recommendation:
    • To improve storage efficiency, SIOS iQ recommends deleting the rogue VM instance’s files.

Reliability Issues

Compute Reliability Issue

Compute Reliability Issues indicate that there is a condition which either degrades the compute power of the environment, or may put it at risk of degrading.

Severity Root Cause Type Layer
Admission Control Disabled Compute Reliability Issue Compute
Host HA Monitoring Disabled Compute Reliability Issue Compute
VM HA Monitoring Disabled Compute Reliability Issue Compute
Insufficient Host Resources to Tolerate Failure Compute Reliability Issue Compute

Admission Control Disabled

In a vSphere cluster, Admission Control allows an HA enabled cluster to bring the resources of a failed host online on a new host, even if it violates any availability constraints. Without this setting enabled, a failed Host’s VMs may fail to come online, even if sufficient resources are available to them.

Severity Root Cause Type Layer
Admission Control Disabled Compute Reliability Issue Compute

What does that mean? And what should I do with this information?

This means that Admission Control is disabled in the cluster indicated as the Root Cause of this issue. Enable Admission Control in your vSphere Cluster settings to help ensure that VMs will be automatically brought back online after a Host failure.

  • Recommendation:
    • The ‘Admission Control’ of the cluster mentioned as the Root Cause object is currently disabled, which will prevent the failover of the VMs in case of a Host failure. Enable ‘Admission Control’ of the cluster in order to sustain the host failure.

Host HA Monitoring Disabled

In a vSphere cluster, the Host Monitoring setting determines if the cluster’s master host should respond to host failures within the cluster. If this is disabled, then the master host will not respond to host failures and automatic failover of that host’s VMs will not occur.

Severity Root Cause Type Layer
Host HA Monitoring Disabled Compute Reliability Issue Compute

What does that mean? And what should I do with this information?

This means that Host Monitoring is disabled in the cluster indicated as the Root Cause of this issue. Enable Host Monitoring in your vSphere Cluster settings to help ensure that VMs will be automatically brought back online after a Host failure.

  • Recommendation:
    • The ‘Host HA Monitoring’ of the cluster mentioned as the Root Cause object is currently disabled which will prevent the failover of the VMs in case of a Host failure. Enable ‘Host HA Monitoring’ of the cluster in order to sustain the host failure.

VM HA Monitoring Disabled

In a vSphere cluster, the VM Monitoring setting determines if the the individual VMs will be required to send keep-alive heartbeats. Should a VM not send sufficient heartbeats, that VM will be restarted to try and restore its functionality.

Severity Root Cause Type Layer
VM HA Monitoring Disabled Compute Reliability Issue Compute

What does that mean? And what should I do with this information?

This means that VM Monitoring is disabled in the cluster indicated as the Root Cause of this issue. Enable VM Monitoring in your vSphere Cluster settings to help ensure that unresponsive VMs will be restored to normal operation.

  • Recommendation:
    • The ‘VM HA Monitoring’ of the cluster mentioned as the Root Cause object is currently disabled which will prevent the failover of the VMs in case of VM level failures. Enable ‘VM HA Monitoring’ of the cluster in order to sustain the VM level failures.

Insufficient Host Resources to Tolerate Failure

In a vSphere cluster, a Host Failure Tolerance can be specified. This setting indicates how many hosts the user would like to be able to lose from the cluster and still have sufficient resources to continue running the VMs in the cluster.

Severity Root Cause Type Layer
Insufficient Host Resources to Tolerate Failure Compute Reliability Issue Compute

What does that mean? And what should I do with this information?

This means that SIOS iQ has determined that the cluster indicated as the Root Cause of this issue does not have enough resources to tolerate the desired number of host failures as indicated by the clusters Host Failure Tolerance setting. The issue recommendation indicates how many hosts should be added to the cluster to allow the indicated number of host failures.

  • Recommendation:
    • The Cluster mentioned as the Root Cause object is not able to sustain the failure number of hosts specified. Please add ‘#’ hosts to satisfy the specified requirement.

In addition, other recommendations may be attached based on other issues occurring in your system. Any of the following may be seen:

  • Recommendation
    • Power off, delete, or remove Num identified Idle VMs in the cluster to increase the resource available in the case of a host failure.
    • Resize Num identified Oversized VMs to increase the resources available in the case of a host failure.

Capacity Issues

Storage Capacity Issue

Frequently, advance warning may be desired, or even necessary, when the storage capacity of one or more Datastores is reaching a specified limit. Lack of oversight can lead to unexpected application reliability issues in the infrastructure, and rapid growth of capacity usage can be indicative of other issues, including those involving performance and security. The Capacity Forecasting feature of SIOS iQ provides this oversight via trend analysis of Datastore capacity usage over time. Within this framework, recent trends in capacity usage are leveraged to predict the future date that a given Datastore will achieve a specified capacity usage percentage (relative to its total capacity). When the predicted time frame falls within one of the thresholds specified in the Capacity Forecasting Policy, the resulting Storage Capacity issue provides a unique Symptom Graph that presents the capacity usage data along with their estimated trend (represented in the graph as a dark dashed line).

Severity Root Cause Type Layer
Capacity nearing Critical threshold Storage Capacity Issue Storage
Capacity nearing Warning threshold Storage Capacity Issue Storage
Capacity nearing Information threshold Storage Capacity Issue Storage

What does that mean? And what should I do with this information?

This means that the SIOS iQ Capacity Forecasting framework has predicted that the indicated Datastore will reach the capacity usage percentage specified in the Capacity Forecasting Policy within one of the defined Critical, Warning, or Informational time thresholds. The issue recommendation indicates the amount of time available before the Datastore reaches that capacity usage.

  • Recommendation:
    • The Datastore mentioned as a root cause object has n days until it will reach percent % of capacity. Please add additional storage to avoid storage reliability issues.

In addition, other recommendations may be attached based on other issues occurring in your system. Any of the following may be seen:

  • Recommendation
    • Delete Num identified idle VMs in the cluster to recover datastore capacity.
    • Recover a total of Num Gb of Storage by merging or deleting identified Snapshots from this datastore.
    • Recover a total of Num Gb of storage by deleting identified Rogue VM instances from this datastore.
    • Move Num candidate workloads to Flash technology identified in the Storage Acceleration Dashboard to improve performance of the datastore and to recover storage capacity.

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment