Troubleshooting and Support¶
Getting Support¶
This section of the documentation aims to help with with your interaction with Federos Technical Support. The role of the support team is to provide you with the information and tools you require to quickly resolve an incident. Federos works with you to ensure you get the support required to efficiently work with your implementation of Assure1.
Contacting Support¶
Customers with valid support contracts may contact support via the following methods from 8 AM to 5 PM Central U.S., Monday through Friday.
- Web Submission via MOS.
- Phone: (972) 532-7387.
- Customers with 24x7 contracts will be provided with additional contact information. Unlimited technical support is provided via the methods described above to Maintenance contract holders for the life of the contract.
Level Description
Federos prioritizes incidents as follows:
-
Application Outage: system completely non-functional.
-
Application Error without total loss of functionality; business impacting.
-
Application Error without total loss of functionality; not business impacting.
-
Application Inaccuracy.
Important Steps To Take Before You Contact Support
To get the most out of support, we recommend that you adhere to the following steps:
-
Review on-line product material.
-
Review Product Release Notes.
-
Check the support site for updates and upgrades.
-
Validate that the most recent versions are installed.
-
Review the training material.
-
Perform troubleshooting:
-
Isolate the problem.
-
Establish possible causes.
-
Test possible causes.
-
Document all troubleshooting activity with DEBUG logs, highlighting errors, and provide a screenshot.
-
Important Information To Gather Before Contacting MOS
-
Verify your Support Contract. Is your Maintenance directly with Federos or one of our Partners? Customers with Partner provided Maintenance will have tickets reassigned to the appropriate Partner.
-
List Product Versions and Platforms:
-
Product Versions
-
Hardware Platforms
-
Operating Systems and Versions
-
-
List Changes. Provide a list of any recent changes (patches, upgrades, hardware changes, hostname or DNS changes, network configuration changes, etc.) that have been made affecting the systems themselves, or the devices/systems being monitored.
-
What is the Ticket Type:
-
Question
-
Issue
-
Support Account Request
-
Feature Request
-
Documentation Request
-
-
Categorize The Issue:
-
Application Outage: system completely non-functional.
-
Application Error without total loss of functionality; business impacting.
-
Application Error without total loss of functionality; not business impacting.
-
Application Inaccuracy.
-
-
Describe the Issue:
-
Logs. Collect all relevant log files or excerpts. Whenever possible, set the application log level to DEBUG and capture the behavior at this level before collecting the logs. Provide screen shots of the behavior and/or error messages. Screen shots are often helpful because they show the complete error message as well as the context around which the error occurred.
-
Occurrence. Explain when the behavior first occurred.
-
Frequency. Where appropriate, the frequency of the behavior should be provided.
-
Reproducibility. If the problem is reproducible, please list the steps required to cause it. Also, if not set already, set the log level of the application to DEBUG and perform the steps necessary to reproduce the problem.
-
FAQs¶
This section provides you with the answers to frequently asked questions.
How Do I fix the Text File Busy Errors During Update?
While updating a package, you might get an error saying there is an problem with a post-commit file, and a text file is busy. Use the following steps to resolve the issue:
-
Manually delete the file that is referenced in the error (or move it out of the way if the file can not be deleted). For this example, the correct file to delete/move is $A1BASEDIR/var/repos/dashboards/hooks/post-commit file.
-
Copy the $A1BASEDIR/distrib/hooks/post-commit file to $A1BASEDIR/var/repos/dashboards/hooks/post-commit.
-
Finally, re-installing the package.
How Do I Represent Time Values as Metric Data?
The recommended way to store time values is to set the Unit Division to Time and store everything relative to 1 second. This allows our internal Unit Division calculations to appropriately scale different values of time. A value stored as .001 would display as 1ms. A value stored as 300 would display as 5min.
What Are the Limitations of the Out-of-the-box Ping Device Tool?
While viewing the event list, a Ping Device tool has been included a that can ping a device and show the results. However, the ping is being sent from the Presentation server. While this works very well in many environments, some environments have configured network segregation and the presentation server is not capable of reaching the devices, so the Ping Device tool will not show a valid result.
How Do I Safely Override Default Vendor Configurations?
The following link provides you with the guide to override default vendor configurations: Override Default Vendor Configuration
How Do I Change Default User Session Timeout?
The default user session will timeout 8 hours after the last API call is made by the user. This includes refresh calls made by interfaces that auto-refresh like Dashboards or the EventList. This can be extended for all users by updating the PHP variable in a custom Apache include file.
-
Add the file base-phptimeout.conf under $A1BASEDIR/etc/apache.
-
Add the following in the conf file to set a session time out of 86400 seconds or one day.
php_value session.gc_maxlifetime 86400
. -
Restart the web Application
systemctl restart assure1-web
.
How Do I Configure DNS Caching for Optimal Performance
DNS caching on Linux is disabled by default, which can cause performance issues (increased load on DNS servers, increased response latency, etc) in large scale implementations. For optimal performance, it is recommended that a DNS caching agent be configured on all Assure1 servers.
Recommended DNS Caching agents:
What is $A1BASEDIR?
This is the variable name of your Assure1 installation. Each deployment could have a different base directory.
Glossary¶
The following list defines the words and abbreviations that are used in this documentation:
-
AAA - AAA refers to Authentication, Authorization and Accounting. It is used to refer to a family of protocols that mediate network access.
-
ACD - Assure1 Cognitive Engine.
-
Beats - Lightweight shippers for Elasticsearch & Logstash. Beats are a collection of lightweight (resource efficient, no dependencies, small) and open source log shippers that act as agents installed on the different servers in your infrastructure for collecting logs or metrics. These can be log files (Filebeat), network data (Packetbeat), server metrics (Metricbeat), or any other type of data that can be collected by the growing number of Beats being developed by both Elastic and the community. Once collected, the data is sent either directly into Elasticsearch or to Logstash for additional processing.
-
AD - Active Directory is a directory service developed by Microsoft for Windows domain networks. It is included in Windows Server operating systems as a set of processes and services. Initially, Active Directory was only in charge of centralized domain management.
-
Aggregators - Aggregators are used as a method for event collection, processing the received messages into normalized events that are then passed into the Real-time Event engine where they are then de-duplicated. The most common events that are collected in this manner are syslogs and SNMP traps which contain various kinds of logging messages.
-
CAPE - Custom Action Policy Engine.
-
CentOS - CentOS is an enterprise-class Linux distribution.
-
Cronjob - The software utility cron is a time-based job scheduler in Unix-like computer operating systems. Users that set up and maintain software environments use cron to schedule jobs (commands or shell scripts) to run periodically at fixed times, dates, or intervals. It typically automates system maintenance or administration—though its general-purpose nature makes it useful for things like downloading files from the Internet and downloading email at regular intervals. Cron is most suitable for scheduling repetitive tasks. Scheduling one-time tasks can be accomplished using the associated at utility.
-
DNS - Domain Name System maintain a directory of domain names and translate them to Internet Protocol (IP) addresses. This is necessary because, although domain names are easy for people to remember, computers or machines, access websites based on IP addresses.
-
Elasticsearch - ElasticSearch is an open source, RESTful search engine built on top of Apache Lucene and released under an Apache license. It is Java-based and can search and index document files in diverse formats.
-
FIFO Aggregator - The Assure1 Event FIFO (Flat File) Aggregator is a generic integration that tails a file, then parses any lines written to the file with customizable rules and creates de-duplicated events within Assure1.
-
Filebeat - Filebeat forwards and centralizes logs and files. Filebeat comes with internal modules (auditd, Apache, NGINX, System, MySQL, and more) that simplify the collection, parsing, and visualization of common log formats down to a single command. This is achieved by combining automatic default paths based on your operating system, with Elasticsearch Ingest Node pipeline definitions, and with Kibana dashboards. Plus, a few Filebeat modules ship with pre-configured machine learning jobs. Filebeat is part of the Elastic Stack.
-
FQDN - Fully Qualified Domain Name. The domain name that specifies its exact location in the tree hierarchy of the Domain Name System (DNS).
-
Grafana - Grafana is an open source metric analytics & visualization suite. It is most commonly used for visualizing time series data for infrastructure and application analytics.
-
InfluxDB - InfluxDB is an open-source time series database developed by InfluxData. It is written in Go and optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics.
-
LDAP - Lightweight Directory Access Protocol is a software protocol for enabling anyone to locate organizations, individuals, and other resources such as files and devices in a network, whether on the public Internet or on a corporate intranet.
-
Kibana - Kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data.
-
NFS - Network File System
-
Nmap - Nmap is an open-source network scanner created. Nmap is used to discover hosts and services on a computer network by sending packets and analyzing the responses.
-
OrientDB - OrientDB is an open source NoSQL database management system written in Java. It is a multi-model database, supporting graph, document, key/value, and object models, but the relationships are managed as in graph databases with direct connections between records. OrientDB is the world’s fastest graph database.
-
RabbitMQ - RabbitMQ is an open-source message-broker software that originally implemented the Advanced Message Queuing Protocol (AMQP) and has since been extended with a plug-in architecture to support Streaming Text Oriented Messaging Protocol, Message Queuing Telemetry Transport, and other protocols. RabbitMQ is the most widely deployed message broker.
-
RADIUS - Remote Authentication Dial-In User Service is a networking protocol, operating on port 1812 that provides centralized authentication, authorization, and accounting management for users who connect and use a network service.
-
RBAC - Role-Based Access Control.
-
RESTful - RESTful web services are built to work best on the Web. Representational State Transfer (REST) is an architectural style that specifies constraints, such as the uniform interface, that if applied to a web service induce desirable properties, such as performance, scalability, and modifiability, that enable services to work best on the Web. In the REST architectural style, data and functionality are considered resources and are accessed using Uniform Resource Identifiers (URIs), typically links on the Web. The resources are acted upon by using a set of simple, well-defined operations. The REST architectural style constrains an architecture to a client/server architecture and is designed to use a stateless communication protocol, typically HTTP. In the REST architecture style, clients and servers exchange representations of resources by using a standardized interface and protocol.
-
RPM - Package Manager is a free and open-source package management system. The .rpm file format is the baseline package format of the Linux Standard Base.
-
SAN - A Storage Area Network is a dedicated high-speed network or subnetwork that interconnects and presents shared pools of storage devices to multiple servers.
-
SNMP - Simple Network Management Protocol is an internet standard protocol for collecting and organizing information about managed devices on IP networks and for modifying that information to change device behavior. Devices that typically support SNMP include cable modems, routers, switches, servers, workstations, and printers.
-
SOAP - Simple Object Access Protocol is a protocol for implementing Web services. SOAP features guidelines that allow communication via the Internet between two programs, even if they run on different platforms, use different technologies and are written in different programming languages.
-
SSL - Secure Sockets Layer (Succeeded by TLS – Transport Layer Security).
-
Syslog - Syslog is a standard for message logging. It allows separation of the software that generates messages, the system that stores them, and the software that reports and analyzes them. Each message is labeled with a facility code, indicating the software type generating the message, and assigned a severity level.
-
Telegraf - Telegraf is a plugin-driven server agent for collecting and sending metrics and events from databases, systems, and IoT sensors. Telegraf is written in Go and compiles into a single binary with no external dependencies, and requires a very minimal memory footprint.
-
TLS - Transport Layer Security is a protocol that provides authentication, privacy, and data integrity between two communicating computer applications.