Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

Multi-node Squirro deployments come with a Squirro Cluster service which takes care of electing a master node and making sure that the elected master node has the latest state of persisted data from the previous master in the event of the loss of the master node.

If the Squirro Cluster service fails to do so for mysql during a failover, please follow the operations guide below to fix the broken replication manually.

Replication repair steps

The guide assumes that you have SSH access to the Squirro servers and the password to access MariaDB.

...

Identify the current master

For each broken slave, stop the Squirro cluster service:

Code Block
sudo systemctl stop sqclusterd

...

Ensure you have a running system with one cluster node.

...

At the master:

Code Block
mysql> RESET MASTER;
Query OK, 0 rows affected (0.14 sec)

mysql> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |     12268 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
mysql> exit

$ mkdir /var/lib/squirro/cluster/mysql/<date>
$ mysqldump --host=127.0.0.1 --port=3306 --user=cluster --password=$(PASSWORD) --all-databases --master-data --single-transaction --result-file /var/lib/squirro/cluster/mysql/<date>/dump.db

$ mysql -u root -p
mysql> UNLOCK TABLES;
mysql> exit

$ cd /var/lib/squirro/cluster/mysql/<date>/
$ gzip dump.db
$ scp dump.db.gz $(SSH_USER)@$(IP_SLAVE1):/tmp/
$ scp dump.db.gz $(SSH_USER)@$(IP_SLAVE2):/tmp/

On each slave:

...

This page can now be found at Fixing MySQL/MariaDB Replication on the Squirro Docs site.