Infoworks High Availability Tool

The Infoworks High Availability (HA) tool provides an active-passive setup option for high availability of Infoworks product. With two installations of Infoworks on different nodes, and configuration declaring installation details, the HA tool continuously monitors the active installation along with continuous metadata sync with the passive installation. If any services on the active node are stopped for any reason, the HA tool will first try to restart the service directly on the active node. After multiple attempts at a restart, the tool will then initiate failover to the passive setup, switching the roles of the two installations.

IMPORTANT

Failover implies complete transition of operation from active to passive node – not just the failed service. Since the passive node is in continuous sync with the active node, all metadata from the active node exist in the passive node at a few seconds latency. However, any jobs that were running on the active node at the time of failover will be abandoned. These jobs can be restarted on the new active node manually.

The HA tool itself can be installed anywhere (on the active node, on the passive node or on a completely different node.) See Configuration section below for more details.

Infoworks Metadata (MongoDB) HA

Infoworks achieves HA for metadata redundancy and data availability of metadata using replica set.

Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.

When a primary does not communicate with the other members of the set for more than the configured electionTimeoutMillis period (10 seconds by default), an eligible secondary calls for an election to nominate itself as the new primary. The cluster attempts to complete the election of a new primary and resume normal operations.

A replica set can have up to 50 members, but only 7 voting members. If the replica set already has 7 voting members, additional members must be non-voting members.

Fault Tolerance

No. of Members	Majority Required to Elect New Primary	Fault Tolerance
3	2	1
4	3	1
5	3	2
6	4	2

Setting Up Mongo HA

Prerequisites

IW_USER must be present on remote machines (secondary nodes).
SSH_USER must be have su permissions to IW_USER.
Any running instances of Mongo on the remote node must be stopped.

Procedure

Navigate to the <IW_HOME>/bin folder.
Run the following command: mongo-ha-setup.sh

The following occurs:

Pre-execution

A backup of the conf.properties file will be created in the <IW_HOME>/temp/conf.properties.YYYY-MM-DD-HH-MM file.
All Infoworks services will be stopped.
A tar file for the Mongo directory will be created in the <IW_HOME>/temp/<DATE>-pre directory. (Optional)
Prompt will be displayed to enter the ssh user, path to pem file, and mongo port which defaults to 27017.

User Creation

An oplogger user will be created with read access to local database.
An iw-ha admin user will be created, if it does not exist.

Replica-Set Creation

NOTE: Ensure that the replica set configurations, replSet and keyFile, exists with correct data in the Mongo configuration files on the secondary and tertiary nodes while the HA setup is in progress.

A backup tar of Mongo directory will be created in the <IW_HOME>/temp/<DATE> directory.
Prompt will be displayed to enter the IPs/hostnames of the other two nodes.
Backup tar will be copied from the primary node to the secondary nodes and Mongo will be started on the secondary nodes.
When Mongo is up and running on the remote nodes, replica set is initiated and remote nodes are added to the replica set.

Validation

On successful installation, the replica set will be online.
Remove the backup in the <IW_HOME>/temp/<DATE> folder.
Restart Infoworks services using the following steps:
Navigate to <IW_HOME>/bin.
Run the following commands:
./start.sh all
./start.sh orchestrator

NOTE: If the Mongo HA node is in unrecoverable state, perform the procedure mentioned in Resync Member of Replica Set.

Starting/Stoping/Monitoring Mongo HA

Ensure that the following files are available in <IW_HOME>/bin directory before setting up MongoHA:

NOTE: Ensure that the latest version of mongoha_start.sh and mongoha-start-stop.sh files are downloaded.

mongo-ha-reset.sh – used to reset the Mongo HA node to non-HA node. The script does not impact the other remote machines.
mongo-ha-setup.sh, mongoha_start.sh – used to setup the Mongo HA.
mongo-ha-start-stop.sh – used to start/stop Mongo remotely from the edge node.
Usage**: *mongo-ha-start-stop.sh {host} {stop/start}
mongo_start.sh mongo_stop.sh – used to stop/start mongo locally. Available in the <IW_HOME>/resources/mongodb/bin directory.
status.sh – used to monitor the status of all Infoworks services including Mongo replica. Check for the MongoDB Replica parameter.

Infoworks Platform Services HA

A load balancer, Nginx, has been introduced in setting up HA for platform services. Earlier, communication to the platform services were performed directly without any load balancer. Ngnix now routes the requests for the platform services across multiple instances. This ensures that the platform services are independent across multiple edge nodes.

Infoworks RabbitMQ HA

RabbitMQ can now be deployed in active-active clustering mode. This ensures continuous accessibility of Infoworks services that uses RabbitMQ.

Infoworks Postgres FailOver

Postgres can now be deployed in hot-standby mode with asynchronous replication. This ensures that the postgres data is continuously available on the standby host.

Once the master node is down, ensure that you promote the standby host as the master node using the following steps:

Navigate to the <IW_HOME>/bin/infoworks-ha-ansible folder.
Run the following command: ./postgres-failover.sh
The postgres will run in standalone mode.
Setup the postgres HA again with the current master (promoted) node. The setup procedure is mentioned below.

Setting Up RabbitMQ, Platform Services HA and Postgres FailOver

Prerequisites

Ensure the following:

IW_USER is present on remote machines (secondary nodes).
The remote machine is identical to the existing edge node (must be part of the cluster).
SSH_USER has su permissions to IW_USER.
Infoworks Home directory is created and write permissions are provided for IW_USER.

Procedure

Navigate to the <IW_HOME>/bin/infoworks-ha-ansible folder.
Run the following command: ./setup-iw-ha.sh
Setup HA for RabbitMQ, Postgres and Platform services to true, as required.
Enter the host details for the selected HA services.
Enter the ssh details for the hosts.
Copy the required files to the remote hosts using Infoworks tool.
All Infoworks services is stopped.
Enter the su password, if password is not set, press Enter.
Installation procedure is started.
After successful installation, HA is setup for the selected services and Infoworks services is started.

High Availability Tool

Setting Up the HA Tool

To set up the HA tool, follow these steps:

Set up configuration in <IW_HOME>/conf/ha.conf (see below for configurations).
Start Infoworks services on the two nodes in active and passive modes.
Start the HA process (refer to Starting the HA Process).
Check logs to ensure operation.

Configuring the HA Tool

The HA configurations are stored in IW_HOME/conf/ha.conf

Following shows a sample configuration of HA:

Starting HA Process

Before continuing, ensure that the tool is set up as described in Setting Up the HA Tool.

Run the following commands:

cd IW_HOME/bin
./start.sh ha

Checking Status

Run the following commands:

cd IW_HOME/bin
./status.sh

Stopping HA Process

Run the following commands:

cd IW_HOME/bin
./stop.sh ha

Starting/Stopping Infoworks on Active Node

To manually start Infoworks on the active node, run the following commands:

cd IW_HOME/bin
./stop.sh all orchestrator
./start.sh all orchestrator

In the node with HA tool, edit ha.conf and set host_type to correct settings and restart the HA tool.

General Notes on HA

Rsync utility must be installed on the machine where HA is set up for file synchronization.
Files and jars must be modified only in the active host. Synchronization is a one way process from active to passive.
File synchronization will not be performed while a failover is in progress. Synchronization occurs as long as the active host is strong. The interval of synchronization can be controlled using the health_status_check_interval_sec * file_sync_counter parameter in the ha.conf file.
The file_sync_whitelist parameter overwrites the complete content from active to passive host based on the path specified. If the path is a folder, a recursive overwrite of all folders and files within the folder will occur.
Extra files in the passive host will not be removed by default. This can be controlled using the rsync_options parameter.
INI or CFG file can also be synchronized using the conf_file_sync_whitelist parameter. This option reads the content of the file and excludes the keys provided in the conf_file_keys_blacklist parameter. By default, synchronisation of $IW_HOME/conf/conf.properties occurs in this mode as it contains edge node private IPs which will be excluded during the synchronization.
Any extra keys added in the conf_file_keys_blacklist parameter will not be synchronised and synchronisation must be ensured manually.
Any running Infoworks jobs will be stopped during a failover. It must be manually restarted on the passive host.
By default, the logs will not be replicated. If required, the folder can be added to the file_sync_whitelist parameter.
Postgres replication is not available.

Files Synchronisation

Infoworks File Synchronisation utility ensures automatic synchronization of files between the active and passive host in the HA configuration. This utility does not require any manual change to the configuration files or custom jars on the passive host after a failover.

Synchronization of configuration files,custom jars and post-hook scripts are handled automatically. Other configuration files or jars, if any, can be provided in the ha.conf file.

Setting File Synchronisation

In the ha.conf file, uncomment the following content:

Last updated on