If transaction safety was disabled and a large event sent in multiple SQL
packets was received, the distribution of that event to the slaves would fail.
The empty packet sent after a large event which fits into exactly one packet
was written to disk and the writing of no bytes caused it to be treated as
an error.
The router->last_written is used to store the position where the last event was
written. The replication header is also stored in a separate structure in
the router which is used later when the last packet of a multi-packet event
arrives.
In blr_slave_callback the bits of slave->cstate are reset and
set as one transaction. Earlier they were reset in one and
set in another, leading to a situation where slave->cstate did
not contain a sensible value for a short period of time.
Further, it is now explicitly checked in blr_distribute_binlog_record
that slave->cstate indeed contains a meaningful value.
The unsafe slave position is no longer an error and will be treated the
same way if no events are available i.e. the slaves are no longer disconnected.
The log messages now have more information such as the current committed
transaction event being processed and the number of events sent by the
current thread.
Changed burst_size to long instead of unsigned long.
This way check burst_size > 0 is now effective.
Setting "burstsize" option in router_options may be required.
i.e.: burstsize=10M
Slave request for a log_pos behind binlog file size may result in a
disconnection or replication error:
if binlog file is latest one slave get disconnected otherwise an error
message is returned and replication stops
The binlog file is now always opened when it is needed and closed
when we are finished with it. That will remove any potential
file concurrency issues between different threads dealing with
the same slave.
In blr_open_binlog the refcnt increase of file which is already
open is protected by router->fileslock. In blr_close_binlog the
decrease of the refcnt was protected by file->lock.
This lead to a situation where it was possible that a file was
closed and the file instance freed, even though it just had been
taken into use by somebody else.
This is now fixed by solely using the router->fileslock for protecting
the increase and decrease of the refcnt.
In blr_slave.c under certain conditions, two locks were not released.
That was fixed in another change, and with this change a notice will be
logged if that branch is entered. That way it will be possible to find
out whether this may have been the cause of earlier lock-ups.
Since localtime is not thread-safe it should not be used in multithreaded
contexts. For this reason all calls to localtime were changed to localtime_r
in code where concurrency issues were possible.
Internal tests were left unchanged because they aren't multithreaded.
Claiming that the loading of maxscale.cnf failed in case of any
error was misleading. Maxscale may not succeed in opening it,
reading it or processing it.
Use of skygw_log_write() in ss_dassert and ss_info_dassert replaced
with the use of MXS_ERROR(). In addition, ss_dassert and ss_info_dassert
are now expressions that require a trailing ;.