The event distribution error messages were not truly errors but only rare
corner cases that usually don't happen. This doesn't mean that they should
be treated as errors and treating them as warnings is more helpful.
When a slave transitions from catchup mode to up-to-date mode, it is possible
that events are being distributed and the slave appears to be lagging behind.
The added comment explain why it happens and how it is handled to make the code
clearer.
When a slave transitions from catchup mode to up-to-date mode, an error
message is logged because the slave is at an unexpected position. This
error message should not be logged because it is a possible and an expected
situation.
The check for rotate event conditions was wrong which led to false error
message about unexpected binlog file and position combinations.
The position of the last event was reset every time a file was opened which
caused problems when the binlog file was rotated. The slave's current positions
were compared to the position where the last event started and because the
last_event_pos variable didn't point to the rotate event of the previous binlog,
the slave's never got the rotate event.
The name of the binlog file was added to the log message where a slave
is behind the master but the same file is used. This makes debugging the problem
a bit easier.
The decision to send an event to a slave can now only be made in one place.
This will force all events to pass the same checks before they are sent to
the slaves.
The position of the next event to be written was used as the position
of the current event. This caused the checks for the position of the current
safe event to fail and the non-transaction safe version was used.
This only happened with events that are not done inside a transaction i.e.
DDL statements.
The message is logged when a DDL statement is executed. It should not be
logged if trx_safe is on since the current_safe_event should always point
at the event we are sending. The current_safe_event is set to the wrong value
which causes this message to be logged.
Due to the false positives caused by this, the message is removed.
The message now states the location where it was called from and the amount
of events received from the master. In addition to this, new logging was
added when unsafe events are sent to slaves when transaction safety is enabled.
The duplicate event error message now logs the length of the slave's
write queue. This will tell how much data is still buffered inside MaxScale
when duplicate events are detected.
If a duplicate event is detected the state of the slave is set
to BLRS_ERRORED and the connection is closed. That way the
duplicate event will not break the slave, and it will pick
up its state when it reconnects.
When an event is sent to a slave, we store information about the
event and who sent it, so that we can detect if the same event is
sent twice. If a duplicate event is detected, we log information
about it.
With the linker flags "-Wl,-z,defs", all symbols used by a library
are resolved at link-time. Otherwise they will be resolved at runtime.
The use of these flags ensures that missing symbols are found as
early as possible.
Case in point, the binlog router test-cases failed, because the loading
of the binlog router failed due to missing symbols my_uuid_init and
my_uuid. The reason was that when maxscale no longer was linked with
the embedded library, those symbols were not available.
Now we know that the loading of the binlog router will not fail due
to missing symbols.
Binlog router uses my_uuid_init and my_uuid, which are non-public
functions available in the embedded library. Consequently, blr
must currently be linked with the embedded library.
A custom implementation of these functions should be provided, in
order to break that dependency.
If transaction safety was disabled and a large event sent in multiple SQL
packets was received, the distribution of that event to the slaves would fail.
The empty packet sent after a large event which fits into exactly one packet
was written to disk and the writing of no bytes caused it to be treated as
an error.
The router->last_written is used to store the position where the last event was
written. The replication header is also stored in a separate structure in
the router which is used later when the last packet of a multi-packet event
arrives.
The include directories previously used by MaxScale were from the embedded
library. All parts of MaxScale apart from the query classifier now use
the client libraries.