Prerequisites v4
Before configuring a Failover Manager cluster, you must satisfy these prerequisites.
Install Java 11 (or later)
Before using Failover Manager, you must first install Java (version 11 or later). Failover Manager is tested with OpenJDK, and we strongly recommend installing that version of Java. Installation instructions for Java are platform specific.
Note
There's a temporary issue with OpenJDK version 11 on RHEL and its derivatives. When starting Failover Manager, you might see an error like the following:
java.lang.Error: java.io.FileNotFoundException: /usr/lib/jvm/java-11-openjdk-11.0.20.0.8-2.el8.x86_64/lib/tzdb.dat (No such file or directory)
If you see this message, the workaround is to manually install the missing package using the command sudo dnf install tzdata-java
.
Provide an SMTP server
You can receive notifications from Failover Manager as specified by a user-defined notification script, by email, or both.
- If you're using email notifications, an SMTP server must be running on each node of the Failover Manager scenario.
- If you provide a value in the
script.notification
property, you can leave theuser.email
field blank. An SMTP server isn't required.
If an event occurs, Failover Manager invokes the script (if provided) and can also send a notification email to any email addresses specified in the user.email
parameter of the cluster properties file. For more information about using an SMTP server, see the Red Hat deployment guide.
Configure streaming replication
Failover Manager requires that PostgreSQL streaming replication be configured between the primary node and the standby nodes. Failover Manager doesn't support other types of replication.
On database versions 11 or earlier, unless specified with the -sourcenode
option, a recovery.conf
file is copied from a random standby node to the stopped primary during switchover. Ensure that the paths in the recovery.conf
file on your standby nodes are consistent before performing a switchover. For more information about the -sourcenode
option, see Promoting a Failover Manager node.
On database version 12 or later, the primary_conninfo
and restore_command
properties are copied from a random standby node to the stopped primary during switchover unless otherwise specified with the -sourcenode
option.
Modify pg_hba.conf
You must modify pg_hba.conf
on the primary and standby nodes, adding entries that allow communication between all of the nodes in the cluster. This example shows entries you might make to the pg_hba.conf
file on the primary node:
Where:
efm
specifies the name of a valid database user.
fmdb
specifies the name of a database to which the efm user can connect.
By default, the pg_hba.conf
file resides in the data
directory under your Postgres installation. After modifying the pg_hba.conf
file, for the changes to take effect, you must reload the configuration file on each node. You can use the following command:
# systemctl reload edb-as-<x>
Where x
specifies the Postgres version.
Using autostart for the database servers
If a primary node restarts, Failover Manager might detect the database is down on the primary node and promote a standby node to the role of primary. If this happens, the Failover Manager agent on the restarted primary node doesn't get a chance to write the recovery.conf
file, and the recovery.conf
file prevents the database server from starting. In this case, the rebooted primary node returns to the cluster as a second primary node.
To prevent this condition, ensure that the Failover Manager agent auto starts before the database server. The agent starts in idle mode and checks to see if there's already a primary in the cluster. If there's a primary node, the agent verifies that a recovery.conf
or standby.signal
file exists. If neither file exits, the agent creates the recovery.conf
file.
Ensure communication through firewalls
If a Linux firewall (that is, iptables) is enabled on the host of a Failover Manager node, you might need to add rules to the firewall configuration that allow tcp communication between the Failover Manager processes in the cluster. For example:
This command opens the port 7800. Failover Manager connects by way of the port that corresponds to the port specified in the cluster properties file.
Ensure that the database user has sufficient privileges
The database user specified by the db.user
property in the efm.properties
file must have sufficient privileges to invoke the following functions on behalf of Failover Manager:
pg_current_wal_lsn()
pg_last_wal_replay_lsn()
pg_wal_replay_resume()
pg_wal_replay_pause()
If the reconfigure.num.sync
or reconfigure.sync.primary
property is set to true
, then:
- For database versions 11 and later, the db.user requires
pg_read_all_stats
privilege and permissions to runpg_reload_conf()
.
For detailed information about each of these functions, see the PostgreSQL core documentation.
If the update.physical.slots.period
property is used, then the db.user requires the REPLICATION
privilege. A database superuser can provide the permissions needed:
The user must also have permissions to read the values of configuration variables. A database superuser can use the PostgreSQL GRANT
command to provide the permissions needed:
For more information about pg_read_all_settings
, see the PostgreSQL core documentation.