MXS-3800: Explain lost_slave events
Currently the state change explanations are only added to mariadbmon. They are less relevant for Galera clusters as they themselves explain why they change their states but should still be added to make them easier to analyze. The event that isn't explained and is most often encountered is the loss of a Slave status. Most often the loss of a Slave status happens because either the IO thread or the SQL thread has stopped. Printing the states of the threads as well as the latest error should hint at what caused the outage. The information can be added to the REST API in 2.5 where the monitors can add extra information to the server JSON.
This commit is contained in:
@ -136,6 +136,7 @@ public:
|
||||
GtidList m_gtid_current_pos; /* Gtid of latest event. */
|
||||
GtidList m_gtid_binlog_pos; /* Gtid of latest event written to binlog. */
|
||||
SlaveStatusArray m_slave_status; /* Data returned from SHOW (ALL) SLAVE(S) STATUS */
|
||||
SlaveStatusArray m_old_slave_status; /* Data from the previous loop */
|
||||
NodeData m_node; /* Replication topology data */
|
||||
|
||||
/* Replication lag of the server. Used during calculation so that the actual SERVER struct is
|
||||
@ -169,6 +170,8 @@ public:
|
||||
*/
|
||||
std::string diagnostics() const;
|
||||
|
||||
std::string print_changed_slave_connections();
|
||||
|
||||
void update_server(bool time_to_update_disk_space,
|
||||
const mxs::MonitorServer::ConnectionSettings& conn_settings);
|
||||
|
||||
|
||||
Reference in New Issue
Block a user