Performance Monitoring - Metrics¶
Assure1 Performance Monitoring provides a complete set of tools capable of gathering any metric, from any device, using any technology, at the granularity required for near real-time data collection. The gathered comprehensive set of data is stored leveraging the "Big Data" model and enhances it further with proactive analysis, monitoring, and reporting, providing forewarning of anomalies before they become outages. Integration with ticketing systems allows for rapid turnaround, high visibility, and increased tracking of troubleshooting issues. Through the included Knowledgebase system, detailed historical and current information and troubleshooting documents are available all in one place so repeat issues can be quickly found and resolved. This, coupled with ad hoc reports, scheduled reporting and dashboards provides a powerful and useful tool for reducing overhead costs and minimizing downtime.
The following sections cover configuring and using performance monitoring in Assure1.
Objectives¶
-
Gather Key Performance Indicator (KPI) metrics for devices.
-
Set up Ping Polling for devices
-
Configure SNMP CDM Polling for devices (CPU, Disk, Memory)
-
Poll bandwidth information from devices using the Network Interface Poller
-
-
Configure metric thresholds
-
Set up a polling policy
-
Set up a consolidation policy
-
Metric reporting
Example¶
In this example, ping and SNMP polling are set up for devices, thresholds are added and configured, and a polling policy is created.
Metric Collection¶
Configuring Ping Polling¶
The following section covers Ping Polling of devices for Latency and Packet Loss metrics, using Assure1's Default Ping Poller Template.
-
Navigate to Configuration -> Metrics -> Metric Types. From this UI, you can add, edit and remove metric types, which are used in Assure1 to define how a metric is visualized/displayed.
-
In this example, the Latency and Packet Loss metric types are enabled for TopN viewing (i.e. TopN Scope set to Value), so that these metric types become available in the TopN Overview.
Note
This is an optional step. Having the TopN Scope set to "Disabled" will not effect the gathering of metrics. It only effects metric availability/visibility in the TopN Overview.
- To enable these metrics in the TopN Overview, click on them to open for editing, change the "TopN Scope" value to "Value", and click "Submit" to save the changes.
-
Navigate to Configuration -> Metrics -> Poller Templates. Note that the "Default Ping" template includes the Latency, Packet Loss, Ping Jitter and Ping Jitter Utilization metric types.
-
From this UI, you can create, edit and delete Poller Templates.
-
These templates consist of groups of Metric Types, and specify what metrics will be created and stored for the devices and instances that are polled using the template.
-
-
Navigate to Configuration -> Metrics -> Polling Assignments. From this UI, you configure polling on a device instance for metric data.
-
In this below example, ping polling is set up for a selected number of devices.
-
For Method, select "NA".
-
For Poller Template, select "Default Ping".
-
For Threshold Group, select "Default Ping".
- Thresholds will be covered in detail in a later section.
-
For Poll Time, enter 300.
- Poll Time is how often the poller will poll for these metrics. Poll Time is measured in seconds, so entering 300 means that these metrics will be polled for every 5 minutes.
-
-
In the Devices section, select the devices you want to poll metrics from and use the "Add" or "Add All" buttons to select them for polling.
- If the list of devices is very large, you can use the filter button to filter specific devices by Device Name, Device ID, Device Group and/or Device Zone.
-
Using the filter button, filter for the "Device" instances and use the "Add All" button to select the instances.
-
Click "Submit" to create the metrics for polling.
-
Navigate to Configuration -> Broker Control -> Services.
-
Click on the "Metric Ping Latency Poller" service to open the "Service (Edit)" form to the right of the UI.
-
In the edit form, change the Status value to "Enabled". Click the "Submit" button to save the changes. The Ping Latency Poller service is now set to enabled.
-
Select the service again, then click on the Start button to start the ping poller immediately. The Ping Latency Poller will then poll devices for Latency, Packet Loss, Ping Jitter and Ping Jitter Utilization metrics every 5 minutes (assuming the default Assure1 application configuration is used). These metrics will be viewable from a number of UI pages, such as the "Device Overview" dashboard and the "All Metrics Overview".
Note
If not manually started, the application will be automatically started within the next minute by the broker.
-
Navigate to Devices in the navigational bar, and then click on the "Metrics" icon for a device to view the data. This interface will display a list of all metrics that are being polled from that device. You can click on the filter icon in the top right of the UI to open the filter bar, in order to filter the list of metrics. Clicking on a metric from the list will open a performance graph for that metric.
Configuring SNMP Polling¶
This section covers the configuration/setup of SNMP-based polling of devices, firstly with an example using Assure1's "Default CDM" (CPU, Disk, Memory) poller template. This is followed by an example demonstrating how to configure custom rules-based SNMP polling of devices.
Assure1 Default CDM (CPU, Disk, Memory)
Note
The "Polling Assignments" UI (Configuration -> Metrics -> Polling Assignments) is not needed for SNMP Polling. The Generic SNMP Poller is 100% rules based, and will not take any of the configured items from the "Polling Assignments" UI into consideration while running. Configuring the SNMP Poller through "Polling Assignments" may generate incorrect metrics. The only time SNMP-based metrics should be used in the "Polling Assignments" UI is specifically for applying non-rules based thresholds to already defined metrics (thresholds will be covered in a subsequent section).
-
Navigate to Configuration -> Metrics -> Metric Types.
-
As described in the "Configuring Ping Polling" section, in this example, "Memory Used", "CPU Utilization" and "Disk Used" can be enabled for TopN Overview (TopN Scope: Utilization).
Note
This is an optional step and is purely for the purpose of showing these metrics in the TopN Overview. This will have no effect on the polling of these metrics.
-
Navigate to Configuration -> Broker Control -> Services.
-
Click on the "Metric Generic SNMP Poller" service to open the "Service (Edit)" form to the right of the UI.
-
In the edit form, change the Status value to "Enabled". Click the "Submit" button to save the changes. The Generic SNMP Poller service is now set to enabled.
-
Select the service again, then click on the Start button to start the SNMP poller immediately. The Generic SNMP Poller will then poll devices for various metrics every 5 minutes (assuming the default Assure1 application configuration is used), depending on the rules that are available for that device. These metrics will be viewable from a number of UI pages, such as the "Device Overview" dashboard and the "All Metrics Overview".
Note
If not manually started, the application will be automatically started within the next minute by the broker.
-
Navigate to Devices in the navigational bar, and then click on the "Metrics" icon for a device to view the data. This interface will display a list of all metrics that are being polled from that device. You can click on the filter icon in the top right of the UI to open the filter bar, in order to filter the list of metrics. Clicking on a metric from the list will open a performance graph for that metric.
Custom Metrics - UPS Metrics
This second example demonstrates how to write your own rules files for polling custom SNMP metrics. In this example, rules files are written to poll UPS data from a device. The metrics polled include:
-
Battery temperature
-
Battery runtime
-
Battery capacity
-
Input and output voltage
-
Output load (%)
Note
The custom UPS rules in this example are provided only as an example, demonstrating how to write your own custom metrics rules files. A default installation includes a set of rules files for polling a variety of devices from numerous different vendors. The following documentation has information regarding supported devices and other useful information:
Please contact Federos if there are devices that are not polled by the out-of-the-box Foundation rules.
-
Navigate to Configuration -> Metrics -> Metric Types.
-
Add new metric types to Assure1, giving them the values shown in the table below.
Note
In this example, the TopN Scope is set to "Disabled". If you wish for these metrics to be available in the TopN Overview, you can do so by setting TopN Scope to "Utilization" or "Value" respectively.
Name Metric Group Format Unit Name Abbreviation Name Value Type Unit Division Direction TopN Type TopN Scope UPS Battery Capacity None Float Capacity % Utilization SI (1000) Descending (Normal) Both Disabled UPS Battery Runtime None Integer Seconds s Raw Time Descending (Normal) Both Disabled UPS Battery Temperature None Float Celsius C Raw SI (1000) Descending (Normal) Both Disabled UPS Input Voltage None Float Volts V Raw None Descending (Normal) Both Disabled UPS Output Voltage None Float Volts V Raw None Descending (Normal) Both Disabled UPS Output Load % None Float Percentage % Raw SI (1000) Descending (Normal) Both Disabled -
Navigate to Configuration -> Rules.
-
The UI contains a list of rules directories and subdirectories.
-
Click on the "right arrow" symbol to the immediate left of a folder icon to expand that directory. Clicking on the "down arrow" symbol will collapse the directory.
-
-
Click to expand "Core Rules (core) -> Default read-write branch (default) -> collection -> metric -> snmp".
-
Click to select the "snmp" folder, then click the "Add" button, and click "Add File" to add a new rules file (the form will appear to the right of the UI).
-
Enter an appropriate name for the rules in the File Name field (e.g. ups-snmp.rules).
-
The rules logic (Perl syntax) is entered in the text area underneath the File Name field. The following is example code for the UPS rules file.
my $DeviceID = $DeviceHash->{DeviceID}; my $DeviceInfo = $DeviceHash->{DeviceID} . ':' . $DeviceHash->{DNS} . ':' . $DeviceHash->{IP}; my $PollInterval = $PollerConfig->{'PollTime'}; my $PolledTime = $DeviceHash->{PollTime}; $Log->Message("INFO","ups-snmp.rules -> [$DeviceInfo] -> Entering ups-snmp.rules");
Here you specify the OID's you wish to poll for data. These exact OID's used in this example were taken from the "PowerNet-MIB" MIB file.
# OID's to be polled my %OIDs = ( 'upsAdvBatteryCapacity' => '1.3.6.1.4.1.318.1.1.1.2.2.1.0', # Battery Capacity (%) # 'upsAdvBatteryRunTimeRemaining' => '1.3.6.1.4.1.318.1.1.1.2.2.3.0', # Battery run time remaining # 'upsAdvBatteryTemperature' => '1.3.6.1.4.1.318.1.1.1.2.2.2.0', # Temperature in Celcius # 'upsAdvInputLineVoltage' => '1.3.6.1.4.1.318.1.1.1.3.2.1.0', # Input Voltage # 'upsAdvOutputVoltage' => '1.3.6.1.4.1.318.1.1.1.4.2.1.0', # Output Voltage # 'upsAdvOutputLoad' => '1.3.6.1.4.1.318.1.1.1.4.2.3.0' # Output Load (%) # ); my %metricNames = reverse %OIDs;
Next, match the OID's to be polled with their corresponding MetricTypeIDs in Assure1 (created in steps 1 and 2). NOTE: The MetricTypeIDs shown in this rules file example will probably differ from the IDs of your own MetricTypes that you create.
# Matching MetricType ID's in Assure1 with OID's to poll my %MetricTypeIDs= ( 'upsAdvBatteryCapacity' => '1015', 'upsAdvBatteryRunTimeRemaining' => '1016', 'upsAdvBatteryTemperature' => '1017', 'upsAdvInputLineVoltage' => '1018', 'upsAdvOutputVoltage' => '1019', 'upsAdvOutputLoad' => '1020', ); $Session->translate([ -timeticks => 0 ]); # This tells the snmp client not to translate it into friendly time # Then, dividing $result by 100 will give the time in seconds
Next, grab the available metrics from the device for polling. This is done via $Session->get_request:
# Grab available metrics from device for polling my $DeviceData= $Session->get_request ( -varbindlist => [ $OIDs{'upsAdvBatteryCapacity'}, $OIDs{'upsAdvBatteryRunTimeRemaining'}, $OIDs{'upsAdvBatteryTemperature'}, $OIDs{'upsAdvInputLineVoltage'}, $OIDs{'upsAdvOutputVoltage'}, $OIDs{'upsAdvOutputLoad'}, ] );
Finally, iterate through the polled metrics and update their values in Assure1:
# Iterate through polled metrics and update each one in Assure1 foreach my $thisOID (keys(%{$DeviceData})) { my $result = $DeviceData->{$thisOID}; my $metricName = $metricNames{$thisOID}; my $MetricTypeID = $MetricTypeIDs{$metricName}; $Log->Message("DEBUG", "UPS rules -> [$metricName], oid: [$thisOID] value: [$result] type: [$MetricTypeID]"); $InstanceID = 0; #$Log->Message('DEBUG', "UPS rules -> Searching for InstanceID for [$InstanceName] on DeviceID[$DeviceID]"); #($InstanceID, $Error) = FindInstanceID($RulesDBH, $MetricHash, $Log, $DeviceID, $InstanceName, 1); # Not necessary, as InstanceID is already specified (0) #$Log->Message('DEBUG', "UPS rules -> Found InstanceID: $InstanceID for [$InstanceName]"); my ($MetricID, $Error) = FindMetricID($RulesDBH, $MetricHash, $Log, $DeviceID, $InstanceID, $MetricTypeID, $Factor, $max, $PollInterval); $Log->Message('DEBUG', "UPS rules -> created/updated metric [$MetricID] for [$InstanceID]"); # Converting Battery Runtime metric to minutes (default runtime metric looks like this example: [1 hour, 20:00.00]) if($thisOID eq '1.3.6.1.4.1.318.1.1.1.2.2.3.0') { my $convertedTime = $result/100; $Log->Message("DEBUG", "UPS rules -> [$DeviceID]DataQueue params: metricid[$MetricID], value[$convertedTime], status [$Status], polltime[$PolledTime]"); $DataQueue->enqueue($MetricID. ':' . $convertedTime . ':' . $Status . ':' . $PolledTime); $Log->Message("DEBUG", "UPS rules -> Finsihed with oid [$metricName]"); } else { $Log->Message("DEBUG", "UPS rules -> [$DeviceID]DataQueue params: metricid[$MetricID], value[$result], status [$Status], polltime[$PolledTime]"); $DataQueue->enqueue($MetricID. ':' . $result . ':' . $Status . ':' . $PolledTime); $Log->Message("DEBUG", "UPS rules -> Finsihed with oid [$metricName]"); } } $Log->Message("INFO", "Exiting ups-snmp.rules");
It is good practice to include log messages in your rules file, to aid in the debugging process, should anything not work as intended.
You will also need to update the 'base.rules' file and 'base.includes' file, to include the new rules file you just created:
base.rules
elsif($SysObjectID =~ '1.3.6.1.4.1.318') { # UPS $Log->Message("WARN","Base Rules -> [$DeviceInfo] -> Polling using ups-snmp rules"); UPSsnmpRules(); }
base.includes
UPSsnmpRules,metricStdPoller/snmp/ups-snmp.rules
-
Navigate to Configuration -> Broker Control -> Services.
-
Click to select the "Metric Generic SNMP Poller" and ensure that application configuration has the "LogLevel" set to DEBUG.
-
Click the restart button to restart the service.
- When the service is restarted, the new UPS rules file and new log level will be taken into account. The poller will now poll for UPS metrics using the UPS rules file.
-
Navigate to Logs.
-
Use the filter bar to enter the following, replacing the "
" with the value of the poller from the "Services" UI. This will filter the log file using the keyword "ups": event.dataset:"GenericSNMPPollerd(35)" and message : "ups"
-
Navigate to Devices in the navigational bar, and then click on the "Metrics" icon for a UPS device to view the data. This interface will display a list of all metrics that are being polled from that device. You can click on the filter icon in the top right of the UI to open the filter bar, in order to filter the list of metrics. Clicking on a metric from the list will open a performance graph for that metric.
Thresholds¶
This section covers the configuration of Thresholds, based on the UPS Metrics from the previous section.
Thresholds are used to detect and give early warning for problems that may exist for metric data being collected. The Threshold Engine analyzes the threshold definitions (defined in the Thresholds UI), looks at the metric database for the status and will create a notification or fault if the defined limit is breached. Several notification platforms are available for threshold alerting. For example, an alarm can be sent to the Event Engine, an email can be sent to an administrator, or a Syslog message can be generated.
-
Navigate to Configuration -> Metrics -> Thresholds -> Thresholds.
-
From this Thresholds UI you can define thresholds for metrics that will trigger an event or notification if the threshold value is breached.
-
For this example, thresholds will be defined for the UPS metrics set up previously.
-
-
Click on the "Add" button.
-
Fill in the form (to the right of the UI) to create a threshold that will trigger if the temperature of the battery reaches above 50 degrees centigrade.
-
Name => UPS High Battery Temp
-
Type => Standard
-
Measurement => UPS Battery Temperature
-
Metric Field => value
-
Time Range => 15m
-
Warning => (Checked)
-
Warning Operator => >=
-
Warning Value => 50
-
Warning Severity => Major
-
-
Critical => (Checked)
-
Critical Operator => >=
-
Critical Value => 70
-
Critical Severity => Critical
-
-
Message => Performance threshold violation: UPS High Battery Temp
-
Check Location => Threshold Engine
-
Status => Enabled
-
-
Add thresholds for the rest of the UPS metrics using the Threshold UI. The following are some examples that could be configured:
-
UPS Battery Runtime
-
UPS Output Load %
-
UPS Output Voltage Surge
-
UPS Input Voltage Surge
-
-
Navigate to Configuration -> Metrics -> Thresholds -> Threshold Groups.
- From the Threshold Groups UI, you can group individual thresholds together to form a threshold group for polling assignments.
-
Click the "Add" button to add a new threshold group. Call the group "Default UPS", and add the UPS thresholds to the group.
-
Click the "Submit" button to save the group.
-
Now that the threshold group for the UPS metrics has been set up, the Polling Assignments UI can be used to add the new thresholds to metrics. Navigate to Configuration -> Metrics -> Polling Assignments.
-
Add the UPS devices for polling, using the "Default UPS" threshold group.
-
Method => SNMP
-
Poller Template => Default UPS
-
Threshold Group => Default UPS
-
Poll Time => 300
-
Devices => Select any UPS devices on your system, but limited to the "Device" instance for each device
-
-
Click "Submit" to add the thresholds.
Polling Policies¶
The "Poller Discovery" scheduled job uses Polling Policy settings to search for devices to process, creates the types of Metrics to poll the devices for based on the selected Poller Template, and then assigns Thresholds based on the selected Threshold Group. Essentially, this is a simple, automated, and dynamic way to create and maintain Metrics and Threshold settings for certain devices and instances, rather than manually creating them using the "Polling Assignments" interface.
The following quick example is a Network Interface polling policy for Routers:
-
Navigate to Configuration -> Metrics -> Polling Policies.
-
Click the "Add" button to add a new polling policy.
-
Enter the following in these form fields (the other fields can be left as is):
-
Name => Router Network Interface
-
Description => Network Interface Metric Polling Policy
-
Policy Status => Enabled
-
Match
- IP Range => (optional)
Note
Here you can specify a specific range of IP addresses to search. If left blank, the scheduled job will search every device. For this example, an IP range of 192.168.10.* is used.
- Device Category => Router
Note
The scheduled job will only use this polling policy on Router devices.
- Instance
Note
The scheduled job will only process instances that match the provided name(s).
-
Match => LIKE
-
Name => eth
-
Assign
-
Method => SNMP
-
Poller Template => Default Network Interface
-
Threshold Group => Default Network Interface
-
-
-
Click "Submit" to save the new policy
-
Navigate to Configuration -> Broker Control -> Jobs.
-
Ensure the "Metric Poller Discovery" scheduled job is set to Enabled.
Calculation Policies¶
The "Metric Post-Collection Calculation Engine (PCCE)" allows for individual metrics to be combined to create a meta-metric. This meta-metric can then be used for thresholding, SLM monitoring, etc.
The Calculations interface (Configuration -> Metrics -> Calculations) is used to define the handling of the Meta Metrics defined in the 'Collections' interface (Configuration -> Metrics -> Collections). These calculation policies use perl-syntax code to do special processing on the metric data. An example of a meta-metric is the creation of a total inbound bandwidth metric, where the metric data for all inbound interfaces is summed up and saved as a separate metric.
The form for adding/editing a calculation policy has the following fields:
-
Name - The name of the calculation policy.
-
Description - The description of the calculation policy.
-
State - The state of the calculation policy.
-
Collections - The collections selected here will have the metrics retrieved and be processed by the calculation policy.
-
Policy - The Perl based processing logic of the calculation policy.
Exercise 4 - Performance Monitoring¶
Using the previous examples, set up and configure Performance Monitoring on your Assure1 system using your lab environment.