Self-Health Metrics¶

Assure1 pollers, collectors, and threshold engines save self-health metrics on the performance of the individual application. Administrators can use the collected self-health metrics to monitor the overall performance and as an advanced indicator of potential issues, such as overloaded poll cycle, polling cluster balance, database contention and latency. These metrics allows administrators to make informed decisions about the overall Assure1 installation and to maintain application efficiency and performance, such as adding additional pollers to the cluster to spread the poller load.

Some of the self-health metrics include:

Average Db Time - The average time taken to insert data into the database per poll cycle.
Average Poll Time - The average amount of time taken to poll a single device for data per poll cycle.
Database Queue Length - The number of metrics in the DB queue after the poll cycle ends.
Poll Duration - The time the poll cycle took to complete per cycle.
Poll Queue Length - The number of devices in the queue before the next poll period starts.
Polled Devices - The number of devices polled by the application (Note: For threshold engines this number will be the number of metrics processed).

Viewing Self-Health Metrics¶

The self-health metrics are viewable from the metric tab in the collector's Device portal. The Assure1 tab displays the collected list of performance metrics per Assure1 application running on the server.

The tab works like the other metric tabs allowing drill-downs to individual metric graphs when a row is clicked.

Troubleshooting¶

Average Db Time - If this value is extremely high, there may be high network latency, other networking issues, or high database contention for inserts. If happens consistently.
Average Poll Time - If this value is extremely high, you may have latency or network issues when communicating with polled devices.
Database Queue Length - This number may vary. The value should stay relatively steady, or increase slightly over time with additional devices / metrics being polled. If there are any large spikes or a constant increase, this may point towards a DB connectivity issue or too few database threads to handle the amount of metrics being inserted.
Poll Duration - Generally if this value is greater than 1/2 of the configured Poll Time it indicates an overloaded poller and it should be adjusted to give more threads, split into a cluster, or additional pollers added the cluster.
Poll Queue Length - This value should be 0 or stay a consistent value. If the value is increasing over time, the application may be getting behind (starting a second poll before the first finished), check the Poll Duration metric listed above. If pollers get behind and are left unchecked it can cause issues with collection and delay metric insertion.