System monitoring notifications

ASMS tracks various system metrics that trigger notifications when thresholds are exceeded. These notifications can be triggered as syslog messages, or events in the issues center.

AFA admins can modify the thresholds for each metric and the types of notifications triggered.

For more details, see Sending outgoing syslog messages and Manage ASMS issues.

Configure system notifications

This procedure describes how to configure the json file that determines how and which AFA system notifications are sent.

Do the following:

  1. Open a terminal and log in as user afa.

  2. Browse to an open the /data/algosec-ms/config/watchdog_configuration.json file for editing.

    The watchdog_configuration.json file includes the following properties:

    metrics

    An array that specifes AFA metrics.

    For more details, see Metric element.

    actions

    An array of possible actions to take upon a metric status change.

    Supported actions include:

    • publish_syslog
    • publish_issues_center

    Note: While all metrics can trigger syslog messages, only some can trigger messages in the AFA issues center.

    For more details, see System notifications enabled by default .

    metricsActions

    An array of objects that each define when a specific status change triggers an action.

    For more details, see MetricsAction.

  3. Modify the json file as needed, and save your changes.

Metric element

The Metric element in the watchdog_configuration.json file has the following properties:

Property

Description

enabled

Boolean. Determines whether the metric is enabled.

name

String. Read-only. A unique name for the metric.

For details, see System notifications enabled by default.

description

String. A description of the metric.

For details, see System notifications enabled by default.

frequency

A frequency object, which specifies the frequency for checking the metric.

Each frequency object includes the following properties:

  • value. Integer. Determines how often the metric is checked.

    0 = the metric is checked every time the collection service runs.

  • unit. String. One of the following time units:

    SECOND

    MINUTE

    HOUR

    DAY

Default = 10 SECONDS.

hostTypes

Array. List of appliances that check the metric.

One of the following:

  • MASTER
  • SLAVE
  • REMOTE_MANAGER

If you do not have a distributed architecture, this is always defined as [MASTER].

thresholdPolicy

An options object that specifies the metric's thresholds.

The options object is an array of objects that each specify a threshold for a specific status.

For more details, see Options object and Threshold sample configuration.

Options object

Each options object includes the following properties:

status

String. Determines the status of the metric if the threshold is met.

One of the following:

  • PASS
  • FAIL
  • WARNING
type

String. Determines the type of result returned by the metric collection.

One of the following:

  • STRING
  • INTEGER
  • FLOAT
  • BOOLEAN
condition

String. The comparison operator to use on the metric collection result.

One of the following:

  • EQ (=)
  • LT (<)
  • LTE (<=)
  • GT (>)
  • GTE (>=)
  • NOT (!=)
value

A type specified in the type property.

The value to compare to the metric collection result.

Set the value to zero (0) to cause the status to change if the threshold is met even once.

timeCondition

A timeCondition object, which determines a time period for which the threshold must be met in order for the metric status to change.

The timeCondition object includes the following properties:

  • value. Integer. Determines how often the metric is checked.

    0 = the metric is checked every time the collection service runs.

  • unit. String. One of the following time units:

    SECOND

    MINUTE

    HOUR

    DAY

Threshold sample configuration

The example below defines actions to take for PASS and FAIL statuses:

  • The metric status will change to PASS if the result is OK for more than 1 minute.
  • The metric status will change to FAIL if the result is not OK even once.
"thresholdPolicy": {
 "options": [
  {
   "status": "PASS",
   "type": "STRING",
   "condition": "EQ",
   "value": "OK",
   "timeCondition": {
    "value": 1,
    "unit": "MINUTE"
   }
  },
 {
  "status": "FAIL",
  "type": "STRING",
  "condition": "NOT",
  "value": "OK",
  "timeCondition": {
   "value": 0,
   "unit": "MINUTE"
  } 
 }
 ]
}		

MetricsAction

The MetricsAction element is an array that defines the statuses available for the threshold definition.

For example, the code sample shown above defines actions for the PASS and FAIL statuses, but not for WARNING statuses. In this scenario, the WARNING status should be disabled in the MetricAction array.

The MetricsAction array includes the following properties:

Property

Description

metric

String. Defines name of the metric, as stated in the metric's object in the metrics array.

action

String. The name of the action, as stated in the action's object in the actions array.

pass

Boolean. Determines whether the action should be triggered when the metric's status changes to pass.

warning

Boolean. Determines whether the action should be triggered when the metric's status changes to warning.

fail

Boolean. Determines whether the action should be triggered when the metric's status changes to fail.

Back to top

System notifications enabled by default

Some AFA messages can be triggered as syslog or Issues Center messages, and others can be triggered as syslog messages only.

The following table lists the notifications enabled in AFA by default:

Metric names

Description

Syslog

Issues Center

suite_disk_space_available

Available disk space in root partition

Notifications triggered:

  • Fail if < 5%
  • Warning if >=5% and < 10%
  • Pass if >10%
suite_nas_disk_space_available

Available disk space in NAS partition

Notifications triggered:

  • Fail if < 5%
  • Warning if >=5% and < 10%
  • Pass if >10%
suite_data_disk_space_available

Available disk space in data partition

Notifications triggered:

  • Fail if < 5%
  • Warning if >=5% and < 10%
  • Pass if >10%

suite_open_file_descriptors

Open file descriptors

Notifications triggered: Warning if more than 4000 for the last 5 minutes.

suite_memory_available

Available memory

Notifications triggered: Warning if less than 10% for the last 3 hours.

suite_cpu_usage

CPU usage

Notifications triggered: Warning if 90% or more for the last 16 hours.

The following:

  • suite_logstash_service
  • suite_crond_service,
  • suite_elasticsearch_service,
  • suite_httpd_service
  • suite_kibana_service,
  • suite_metro_service

  • suite_mongo_service,
  • suite_postgresql_service
  • suite_tomcat_service

Essential linux daemons

Notifications triggered:

  • Fail if down
  • Pass if up

The following:

  • afa_shallow_health_check
  • abf shallow health check
  • aff_shallow_health_check

Java processes health checks - shallow

Notifications triggered:

  • Fail if doesn't work for 20 seconds
  • Pass if works for 30 seconds

The following:

  • afa_deep_health_check
  • abf deep health check
  • aff_deep_health_check

Java processes health checks - deep

Notifications triggered:

  • Fail if at least one item fails for 10 minutes
  • Pass (immediately) if everything works
hadr_db_replication_health

Database replication health check, between primary and secondary nodes in a cluster

Relevant only when HA/DR and/or distributed architecture is enabled.

Notifications triggered:

  • Fail if replication failed
  • Pass if replication succeeded
dfs_connectivity_health_check

Distributed file system health check

Notifications triggered:

  • Fail if down
  • Pass if up
suite_dist_elements_connection_health

Connection health check between Central Manager and Remote Agents or Load Units in a distributed architecture

Relevant only when HA/DR and/or distributed architecture is enabled.

Notifications triggered:

  • Fail if down for 2 minutes
  • Pass if up for 1 minute
suite_cyberark_aim_service

Status of the CyberArk AIM service running on the ASMS host

Notifications triggered:

  • Fail if down
  • Pass if up
cyberark_connectivity_health_check

Connection health check between ASMS and CyberArk vault

Notifications triggered:

  • Fail if check failed
  • Pass if check succeeded

Analysis

Analysis results

Notifications triggered:

  • Fail if a device analysis failed
  • Pass if a device analysis succeeded

Note: Always retrieved, even if this metric is disabled in the configuration file.

Monitor

Monitoring results

Notifications triggered:

  • Fail if a device monitoring cycle failed
  • Pass if a device monitoring cycle succeeded

Note: Always retrieved, even if this metric is disabled in the configuration file.

Log Collection

Traffic log collection results

Notifications triggered:

  • Fail if a device traffic log collection failed
  • Pass if a device traffic log collection succeeded

Note: Always retrieved, even if this metric is disabled in the configuration file.

suite_traffic_logs_folder_size

Size of the traffic log collection folder

Notifications triggered:

  • Pass if the /home/afa/.fa/syslog folder size is lesser than or equal to 4000 Mbs
  • Warning if the /home/afa/.fa/syslog folder size is greater than 4000 Mbs
  • Fail if the /home/afa/.fa/syslog folder size is larger than 8000 Mbs

Audit logs

Audit log collection results

Notifications triggered:

  • Fail if a device audit log collection failed
  • Pass if a device audit log collection succeeded

Note: Always retrieved, even if this metric is disabled in the configuration file.

Scheduled Backup

System backup service

Notifications triggered:

  • Fail if a scheduled backup failed
  • Pass if a scheduled backup succeeded

Note: Always retrieved, even if this metric is disabled in the configuration file.

Back to top

SNMP platform monitoring

SNMP platform monitoring can be achieved using the Linux snmp.d service. SNMP configuration lies with the customer.