MXS-2141: Retry query on master if it times out on slave

With causal_reads enabled, the query would return with an error if the
slave was not able to catch up to the master fast enough. By automatically
retrying the query on the master, we're guaranteed that a valid result is
always returned to the client.
This commit is contained in:
Markus Mäkelä
2018-11-06 12:25:57 +02:00
parent c661f5e838
commit e56372b153
4 changed files with 34 additions and 6 deletions

View File

@ -477,11 +477,11 @@ SELECT * FROM test.t1 WHERE id = 1;
The `SET` command will synchronize the slave to a certain logical point in
the replication stream (see
[MASTER_GTID_WAIT](https://mariadb.com/kb/en/library/master_gtid_wait/)
for more details). If the slave has not caught up to the master within the
configured time, an error will be returned. To the client side
application, this will appear as an error on the statement that they were
performing. This is caused by the fact that the synchronization command is
executed with the original command as a multi-statement command.
for more details).
If the slave has not caught up to the master within the configured time, it will
be retried on the master. In MaxScale 2.3.0 an error was returned to the client
when the slave timed out.
### `causal_reads_timeout`

View File

@ -1023,6 +1023,9 @@ GWBUF* RWSplitSession::add_prefix_wait_gtid(SERVER* server, GWBUF* origin)
snprintf(prefix_sql, prefix_len, gtid_wait_stmt, wait_func, gtid_position, gtid_wait_timeout);
GWBUF* prefix_buff = modutil_create_query(prefix_sql);
// Copy the original query in case it fails on the slave
m_current_query.copy_from(origin);
/* Trim origin to sql, Append origin buffer to the prefix buffer */
uint8_t header[MYSQL_HEADER_LEN];
gwbuf_copy_data(origin, 0, MYSQL_HEADER_LEN, header);
@ -1075,6 +1078,9 @@ bool RWSplitSession::handle_got_target(GWBUF* querybuf, SRWBackend& target, bool
// Perform the causal read only when the query is routed to a slave
send_buf = add_prefix_wait_gtid(target->server(), send_buf);
m_wait_gtid = WAITING_FOR_HEADER;
// The storage for causal reads is done inside add_prefix_wait_gtid
store = false;
}
if (m_qc.load_data_state() != QueryClassifier::LOAD_DATA_ACTIVE

View File

@ -275,7 +275,7 @@ GWBUF* RWSplitSession::discard_master_wait_gtid_result(GWBUF* buffer)
else if (MYSQL_GET_COMMAND(header_and_command) == MYSQL_REPLY_ERR)
{
// The MASTER_WAIT_GTID command failed and no further packets will come
m_wait_gtid = NONE;
m_wait_gtid = RETRYING_ON_MASTER;
}
return buffer;
@ -524,6 +524,10 @@ void RWSplitSession::manage_transactions(SRWBackend& backend, GWBUF* writebuf)
}
}
}
else if (m_wait_gtid == RETRYING_ON_MASTER)
{
// We're retrying the query on the master and we need to keep the current query
}
else
{
/** Normal response, reset the currently active query. This is done before
@ -575,6 +579,23 @@ void RWSplitSession::clientReply(GWBUF* writebuf, DCB* backend_dcb)
mxb_assert(backend->get_reply_state() == REPLY_STATE_DONE);
MXS_INFO("Reply complete, last reply from %s", backend->name());
if (m_wait_gtid == RETRYING_ON_MASTER)
{
m_wait_gtid = NONE;
// Discard the error
gwbuf_free(writebuf);
writebuf = NULL;
// Retry the query on the master
GWBUF* buf = m_current_query.release();
buf->hint = hint_create_route(buf->hint, HINT_ROUTE_TO_MASTER, NULL);
retry_query(buf, 0);
// Stop the response processing early
return;
}
ResponseStat& stat = backend->response_stat();
stat.query_ended();
if (stat.is_valid() && (stat.sync_time_reached()

View File

@ -73,6 +73,7 @@ public:
{
NONE,
WAITING_FOR_HEADER,
RETRYING_ON_MASTER,
UPDATING_PACKETS
};