Upgrade/migration troubleshooting

This topic discusses troubleshooting errors and warnings during the upgrade/migration to A32.00 with CentOS 7.

Viewing migration status

Migration status and reports sync can be viewed via CLI that triggered the process.

In case the CLI that triggered the process is no longer available, you can re-join to the CLI by reentering the menu from algosec_conf to view the status.

Back to top

Migration logs

If any error messages appear (not including prerequisite checks failures), navigate to the log and resolve the issue. The logs include:

  • CM upgrade with migration

    /var/log/algosec-software-upgrade.log

    /var/log/migration/migration_cli.log

    /var/log/migration/algosec-migration-dr.log

    /var/log/algosec_hadr/ms-hadr.log

    /var/log/algosec_toolbox/algosec_conf.log

  • Device relocation

    /var/log/migration/device-relocation.log

    /home/afa/.fa-history

  • Sync reports

    /var/log/algosec_hadr/ms-hadr.log

    /var/log/migration/algosec-migration-dr.log (progress indication)

Back to top

Estimated time of the upgrade/migration of the source server

Note: Time estimates are rough. They are based on bandwidth between source to target and the amount of data on the source machine.

Run the first part of the upgrade routine (algosec_conf option 8 - Upgrade software) to get automatic migration time estimations for the upgrade/migration process. The time estimate is based on whether or not reports have already been synced.

Back to top

Resolve automatic system prerequisites checks issues

Upgrade prerequisites issues:

Text in CLI

Description

How to resolve

(log data about prerequisite checks are found in /var/log/algosec-software-upgrade.log)

Machine [machine IP] does not meet the minimal hardware requirements.

Checks system machine appliance specs: cores, memory.

Make sure the machine meets the system requirements. See Upgrade/migration prerequisites.

For details, see Checking cores and memory on [machine IP] in the log.

There is less than xx MB free disk space in OS partition on node [machine IP].

 

Insufficient disk space. xxxMB found for installation (Less than the required 5000 MB in the OS partition on node[machine IP])

 

Partition (/data) on local node must have at least <required> MB free space. This includes the amount of space needed to sync the monitor data directory, plus an additional 5 GB. You currently only have <avail> MB free space. 

Checks disk space on system machine.
See Disk space requirements .

Run auto-remove to free up disk or delete old run files.

To run auto-remove, in AFA Administration, go to the Options tab, Storage sub-tab, and click Clean-up now.

If the issue persists after running Clean-up now, contact AlgoSec support.

Insufficient disk speed. 

Checks source node disk speed.

We recommend disk write speed of at least 300MB/s. Minimum allowable is 80MB/s.

Contact your IT department to determine and adjust, if necessary, your node disk speed.

Tip:

Use the following command to check disk speed:

dd if=/dev/zero of=/data/test-big-file.bin bs=786432000 count=1 oflag=dsync 2>&1 ; rm -f /data/test-big-file.bin

An example of the output is:

786432000 bytes (786 MB) copied, 0.624098 s, 1.3 GB/s

Tip: If your source machine is an AlgoSec VM, make sure you are following VM best practices. See Best practices for your AlgoSec VMware Deployment . If you make changes, check your disk speed again to see if it has improved.

Tip: If your source machine is an AlgoSec AMI, make sure the instance is from the Amazon EC2 General Purpose M4 family (compatible with CentOS 6).

Distribution nodes machine time prerequisite check failed.

Compares Time between system machine and distribution nodes (Remote Agent and LDUs).

The machines can be in different time zones but they have to be at the same time relative to UTC:

  1. Compare time and date between CM and the distribution node by running this command on every node mentioned in the message :

    date +%s

    Acceptable results should be up to 180 difference (3 minutes). If a machine exceeds this limit:

    1. Configure NTP server. Use algosec_conf option 2 on the machine to be updated.

    2. Run this command as root user to force time sync:

      ntpdate -u $(awk '$1 =="server"  {print $2}' /etc/ntp.conf)
    3. Reboot the machine.

    4. To verify, rerun on the updated node:

      date +%s
NAS is configured, but directories are not mounted.

 

NAS mount is disabled due to fault detected.

Checks NAS status on Central Manager and LDUs.

Open algosec_conf menu on the node with the NAS issue. Run option option 11 - Configure NAS. Run option 3 - Re-enable NAS mount.

If issue persists on an LDU, in the algosec_conf menu, run option 15 - Distributed Architecture configuration.

If problem persists, contact AlgoSec support.

NAS is suspended

Open algosec_conf menu on the node with the NAS issue. Run option option 11 - Configure NAS. Run option 3 - Re-enable NAS mount.

If issue persists on an LDU, in the algosec_conf menu, run option 15 - Distributed Architecture configuration.

If problem persists, contact AlgoSec support.

The services listed below are not OK.

Checks status of services.

First, try to restart the services. Run for each service:

algosec_test_service -n <SERVICE NAME> -f

for example, algosec_test_service -n postgresql -f

If services do not restart, contact AlgoSec support.

Node: 10.20.8.95
* The path /home/afa/algosec should be non-broken symlink
Checks essential redirect links. Contact AlgoSec support.
Validation of upgrade files xxx failed. The files may be corrupted. Download the files again.
Checks for corrupted run files. Download run files again.
Distribution Architecture is not configured properly.
Checks for improperly configured distribution nodes. In the algosec_conf menu, run option 15 - Distributed Architecture configuration.
[product] version earlier than [version #] found on this machine.
Checks for product versions earlier than two versions before the version you want to upgrade to. Remove the product run file /root/Algosec_Upgrade/<product run file> or upgrade the product to a version not earlier than two versions before the version you want to upgrade to.
PostgreSQL is not synced between Cluster machine ([machine IP]) and the Primary machine ([machine IP]).
Checks PostgreSQL sync status between cluster machine and Primary. in the algosec_conf menu, go to option 13 - HA/DR Setup. Select 1 - View cluster status details.
Inconsistencies found between the devices list and database records. 
Checks for database inconsistencies.

To fix the inconsistency, see procedure in the knowledge base article: www.algosec.com/r/a32.00/42845777.

FireFlow configuration discrepancy. The FireFlow_configured parameter is set to 'no' but a FireFlow installation .run file was found. 
Checks for FireFlow installation inconsistencies.
  • To enable FireFlow: Set the value of the FireFlow_configured parameter to 'yes' in /home/afa/.fa/config. Make sure that FireFlow and AFA run file build numbers match.

  • To disable FireFlow: Delete the FireFlow installation .run file in the /root/AlgoSec_Upgrade directory,

Migration prerequisites issues:

Text in CLI

Description

How to resolve

(log data about migration prerequisite checks are found in:

CentOS 7 Migration: /var/log/algosec-software-upgrade.log

For CM migration: /var/log/algosec_toolbox/algosec_conf.log)

Version incompatible between nodes

Verifies that all installed products are in the same version + build on both source and migration target.

Update the source A32.00 build files that match build versions installed on the migration target.

For details, see raw version data and valuesfor comparison in the log.

All ASMS services on source node ([source node IP]) must be OK prior to migration 
The following services are not OK:

Checks status of services on migration target.

First, try to restart the services. Run for each service:

algosec_test_service -n <SERVICE NAME> -f

for example, algosec_test_service -n postgresql -f

If services do not restart, contact AlgoSec support.

The following nodes are unreachable from 
migrated server

Checks if all remote nodes are reachable from migration target.

Open ports. See Required port connections.

For details, see Failed to check connectivity between migration IP {​​}​​ and dist node {​​} in the log.​​

Remote node must be a 
standalone machine

Verifies that migration target machine is standalone.

The target machine must be a clean machine deployed directly from AlgoSec image installation.

License prerequisite check failed with:
License file is not installed

Verifies license on migration target IP.

Install license on target machine. See Migration target license and ASMS licensing

License prerequisite check failed with:
The target license includes fewer components 
than the source license in the following modules
Verifies license on migration target IP.

Contact AlgoSec to add missing components to the target license. Insufficient license may cause problems.

For details, Localhost license modules: and Migration [migration IP] license modules: see in the log.

Firewall(s) detected filtering traffic between source (x.x.x.x) and target (x.x.x.x) for ports: [ports]

Verifies that traffic on ports 443 (https), 5432 (postgresql), and 9595 (HA/DR) is not filtered between source & migration target.

Check with your IT department that firewalls allow SSH traffic on these ports (bi-directional).

Tip: You can check if the filter has been removed by:

  1. From source to target: run this command in SSH on the source:

nmap -n -p <port> <ip target> 2>&1| grep -q filtered && echo "Traffic is blocked" || echo "Traffic is allowed"
  1. From target to source: run this command in SSH on the target:

nmap -n -p <port> <ip source> 2>&1| grep -q filtered && echo "Traffic is blocked" || echo "Traffic is allowed"

This only works when this prerequisite finds an issue with the port

Traffic from target (x.x.x.x) to source (x.x.x.x) is NATed (x.x.x.x) 
or
 Traffic from source (x.x.x.x) to target (x.x.x.x) is NATed (x.x.x.x) 

Checks for NATed traffic is between source and target.

Contact your IT department to disable any NAT configuration between source and target machines.

 

Low bandwidth speed found between local node and [machine IP]: (%s) Mbit/s.
Average speed: 160 Mbit/ss Mbit/s
Minimum required: 64Mbit/s Mbit/s:
Checks bandwidth between source and target.

We recommend bandwidth of at least 1Gbit/s (125MB/s). Minimum allowable is 64Mbit/s (8MB/s).

Contact your IT department to determine and adjust, if necessary, your bandwidth between the source and the target machines.

Tip:

Use the following commands to check bandwidth:

  1. Prepare the source machine, enter:

    dd if=/dev/zero of=/data/test-file‬ bs=157286400 count=1 2>&1
    IP=[IP of the target]
  1. Run the test command (then if prompted, type in the password and hit <Enter>):

    scp -o StrictHostKeyChecking=no /data/test-file‬ root@${​​IP}​​:/data/test-file‬

    An example of the output is:

    test-file‬                          100%  150MB 219.9MB/s   00:00

    The network bandwidth in the example is 219.9MB/s.

  1. Clean up the test files by entering:

    rm -f /data/test-file ; ssh root@${​​​​IP}​​​​ rm -f /data/test-file
Not all files in the /var/lib/pgsql/data/ directory are owned by user postgres. 
Checks PostgreSQL files.

Reassign owner of all files in the /var/lib/pgsql/data/ directory to user postgres. Run the following code:

algosec_test_service -n postgresql -k
chown -R postgres:postgres /var/lib/pgsql/data/
algosec_test_service -n postgresql -s
Target machine [IP] could not connect to NAS server [IP] on mount path [/data/...] via NFS4.
Check NAS server connectivity. Check connectivity and make sure NAS is accepting connections from the target machine.
Target machine [IP] has insufficient permission level to NAS server [IP] on mount path [/data/...] via NFS4. 
Checks target machine server permissions level to NAS server. Configure NAS server to allow write permissions for the target machine.
NAS server is defined on target but not on source node. 
Checks target machine connection to NAS when no NAS is defined for the source. Disconnect target node from NAS.

The mail server [IP] on port [PORT] is unreachable from the target node [IP]

 
Checks connectivity from target node to mail server.

Check with your IT department that traffic is allowed from the target machine to the defined mail server in ASMS.

Tip: To check the connectivity, run the following:

echo 'exit' | timeout --signal=9 5 telnet <IP> <PORT> 2>&1 | grep -q 'Connected' && echo "Traffic to mail server is allowed" || echo "Traffic to mail server is blocked"

Partition (/data) on the remote node [IP] must have at least [amount] MB free space.

This includes the amount of space used on the primary node, plus an additional [amount]%.

You currently only have [amount] MB free space

 

Insufficient disk space on partition (/data) on the remote node [IP].

You currently have [amount] MB free space.

Minimum required: at least [amount] MB free space.

Checks disk space on target node.

Target node must have at least the same disk space as the source node.

Increase disk space as required.

For VMs, see Increase disk space of a new AlgoSec VM.

Machine {machine IP} does not meet the minimal hardware requirements.
Minimum disk speed required: 80 MB/s; Detected: [X] MB/s

Checks target node disk speed.

We recommend disk write speed of at least 300MB/s. Minimum allowable is 80MB/s.

Contact your IT department to determine and adjust, if necessary, your node disk speed.

Tip: Use the following command to check disk speed:

dd if=/dev/zero of=/data/test-big-file.bin bs=786432000 count=1 oflag=dsync 2>&1 ; rm -f /data/test-big-file.bin

An example of the output is:

786432000 bytes (786 MB) copied, 0.624098 s, 1.3 GB/s

Tip: If your target machine is an AlgoSec VM, make sure you are following VM best practices. See Best practices for your AlgoSec VMware Deployment . If you make changes, check your disk speed again to see if it has improved.

Tip:If your target machine is an AlgoSec AMI, make sure you are using recommended deployment. See Deploy ASMS on AWS.

Detected SSH rate limit through firewall(s) on connection betweensource (x.x.x.x) and target (x.x.x.x) 
Checks SSH traffic between source and target machine.

Ask your IT department to analyze event logs of firewall(s) to determine which encountered an SSH Brute Force Attack.

Disable SSH rate limit on these firewall(s) during the migration procedure.

Postgresql service is not running on source node [IP]. Reports sync cannot continue.
Checks reports sync utility readiness.

Restart postgresql service on source and try syncing reports again.

Tip: to restart postgresql service, run:

systemctl restart postgresql.service 

Back to top

Resolve upgrade/migration failures

If the upgrade/migration to A32.00 does not complete successfully, first fix problems as shown in the logs and get help, if required, from AlgoSec support. After this, re-run the process and it will continue from where it left off.

Only consider reverting to previous version as a last resort. See Revert to previous version when migration is not successful.

Upgrading PostgreSQL 12 failure

If during migration/upgrade you receive the message:

Upgrading PostgreSQL 12 (elapsed: 00:25:47)

******************************************

*** Upgrade failed. Data not migrated. ***

******************************************

Follow the steps in the following AlgoPedia article: Migration issues related to Upgrade to postgresql 12 on A32.0.

Revert to previous version when migration is not successful

If you have fixed problems as shown in the logs, contacted AlgoSec for help, and tried to re-run the upgrade/migration with no success, as a last resort:

  1. Revert your Central Manager (or standalone server) to the previous version and restore your data. See Restore your system data.

  2. Revert any nodes that have been upgraded to A32.00 with CentOS 6 to the same previous version and build as the Central manager.

Specific steps are based on the appliance type:

  • Virtual and host-based appliances: Revert the machine using the snapshot or deploy the machine using the OVF for the version you created a backup from.

  • Physical appliances:

    • For AlgoSec hardware appliances: 2063, 2203, 2403: Reset appliance to factory defaults. See Reset the appliance to factory defaults. Then, upgrade to the version and build of the backup you created prior to the upgrade to A32.00. Contact AlgoSec support to retrieve older versions of ASMS.
    • For AlgoSec hardware appliances: 2062, 2162, 2322 and 2xx1 appliances: Contact AlgoSec support.

Back to top

Resolve device relocation failures

If during migration of Remote Agents the device relocation fails and you receive the message:

Some devices were defined with an incorrect syslog host.

Do the following:

  1. In ASMS, in the toolbar, select Administration.

  2. In the Administration area, click the DEVICES SETUP tab.

  3. Select the device in the tree and click Edit.

  4. In the Log Collection and Monitoring area, make sure that the Syslog-ng server is configured to the correct localhost.

Back to top