MXS-3800: Explain lost_slave events

Currently the state change explanations are only added to mariadbmon. They are less relevant for Galera clusters as they themselves explain why they change their states but should still be added to make them easier to analyze. The event that isn't explained and is most often encountered is the loss of a Slave status. Most often the loss of a Slave status happens because either the IO thread or the SQL thread has stopped. Printing the states of the threads as well as the latest error should hint at what caused the outage. The information can be added to the REST API in 2.5 where the monitors can add extra information to the server JSON.
2021-10-18 08:59:14 +03:00
parent 136d0271df
commit 0bf5641d80
8 changed files with 84 additions and 16 deletions
--- a/include/maxscale/monitor.hh
+++ b/include/maxscale/monitor.hh
@ -212,7 +212,7 @@ public:
     */
    mxs_monitor_event_t get_event_type() const;

-    void log_state_change();
+    void log_state_change(const std::string& reason);

    /**
     * Is this server ok to update disk space status. Only checks if the server knows of valid disk space
@ -516,6 +516,19 @@ protected:

    bool server_status_request_waiting() const;

+    /**
+     * Returns the human-readable reason why the server changed state
+     *
+     * @param server The server that changed state
+     *
+     * @return The human-readable reason why the state change occurred or
+     *         an empty string if no information is available
+     */
+    virtual std::string annotate_state_change(mxs::MonitorServer* server)
+    {
+        return "";
+    }
+
    /**
     * Contains monitor base class settings. Since monitors are stopped before a setting change,
     * the items cannot be modified while a monitor is running. No locking required.