|
|
|
@ -4,12 +4,6 @@ This tutorial is a short introduction to the
|
|
|
|
|
[Avrorouter](../Routers/Avrorouter.md), how to set it up and how it interacts
|
|
|
|
|
with the binlogrouter.
|
|
|
|
|
|
|
|
|
|
The avrorouter can also be deployed directly on the master server which removes
|
|
|
|
|
the need to use the binlogrouter. This does require a lot more disk space on the
|
|
|
|
|
master server as both the binlogs and the Avro format files are stored there. It
|
|
|
|
|
is recommended to deploy the avrorouter and the binlogrouter on a remove server
|
|
|
|
|
so that the data streaming process has a minimal effect on performance.
|
|
|
|
|
|
|
|
|
|
The first part configures the services and sets them up for the binary log to Avro
|
|
|
|
|
file conversion. The second part of this tutorial uses the client listener
|
|
|
|
|
interface for the avrorouter and shows how to communicate with the the service
|
|
|
|
@ -22,9 +16,9 @@ over the network.
|
|
|
|
|
## Preparing the master server
|
|
|
|
|
|
|
|
|
|
The master server where we will be replicating from needs to have binary logging
|
|
|
|
|
enabled, the binary log format set to row based replication and the binary log
|
|
|
|
|
row image needs to contain all the changed. These can be enabled by adding the
|
|
|
|
|
two following lines to the _my.cnf_ file of the master.
|
|
|
|
|
enabled, `binlog_format` set to `row` and `binlog_row_image` set to
|
|
|
|
|
`full`. These can be enabled by adding the two following lines to the _my.cnf_
|
|
|
|
|
file of the master.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
binlog_format=row
|
|
|
|
@ -57,6 +51,8 @@ passwd=maxpwd
|
|
|
|
|
type=service
|
|
|
|
|
router=avrorouter
|
|
|
|
|
source=replication-service
|
|
|
|
|
filestem=binlog
|
|
|
|
|
start_index=15
|
|
|
|
|
|
|
|
|
|
# The listener for the replication-service
|
|
|
|
|
[replication-listener]
|
|
|
|
@ -84,16 +80,19 @@ protocol=maxscaled
|
|
|
|
|
socket=default
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
You can see that the `source` parameter in the _avro-service_ points to the
|
|
|
|
|
_replication-service_ we defined before. This service will be the data source
|
|
|
|
|
for the avrorouter. The _filestem_ is the prefix in the binlog files. For more
|
|
|
|
|
information on the avrorouter options, read the
|
|
|
|
|
[Avrorouter Documentation](../Routers/Avrorouter.md).
|
|
|
|
|
The `source` parameter in the _avro-service_ points to the _replication-service_
|
|
|
|
|
we defined before. This service will be the data source for the avrorouter. The
|
|
|
|
|
_filestem_ is the prefix in the binlog files and _start_index_ is the binlog
|
|
|
|
|
number to start from. With these parameters, the avrorouter will start reading
|
|
|
|
|
events from binlog `binlog.000015`.
|
|
|
|
|
|
|
|
|
|
After the services were defined, we added the listeners for the
|
|
|
|
|
_replication-service_ and the _avro-service_. The _CDC_ protocol is a new
|
|
|
|
|
protocol added with the avrorouter and it is the only supported protocol for the
|
|
|
|
|
avrorouter.
|
|
|
|
|
Note that the _filestem_ and _start_index_ must point to the file that is the
|
|
|
|
|
first binlog that the binlogrouter will replicate. For example, if the first
|
|
|
|
|
file you are replicating is `my-binlog-file.001234`, set the parameters to
|
|
|
|
|
`filestem=my-binlog-file` and `start_index=1234`.
|
|
|
|
|
|
|
|
|
|
For more information on the avrorouter options, read the [Avrorouter
|
|
|
|
|
Documentation](../Routers/Avrorouter.md).
|
|
|
|
|
|
|
|
|
|
# Preparing the data in the master server
|
|
|
|
|
|
|
|
|
@ -104,53 +103,23 @@ binary logs before the conversion process is started.
|
|
|
|
|
|
|
|
|
|
If the binary logs contain data modification events for tables that aren't
|
|
|
|
|
created in the binary logs, the Avro schema of the table needs to be manually
|
|
|
|
|
created. There are two ways to do this:
|
|
|
|
|
created. There are multiple ways to do this:
|
|
|
|
|
|
|
|
|
|
- Dump the database to a slave, configure it to replicate from the master and
|
|
|
|
|
point MaxScale to this slave (this is the recommended method as it requires no
|
|
|
|
|
extra steps)
|
|
|
|
|
|
|
|
|
|
- Use the [_cdc_schema_ Go utility](../Routers/Avrorouter.md#avro-schema-generator)
|
|
|
|
|
and copy the generated .avsc files to the _avrodir_
|
|
|
|
|
|
|
|
|
|
- Manually create the schema
|
|
|
|
|
- Use the [_cdc_schema_ Go utilty](../Routers/Avrorouter.md#avro-schema-generator)
|
|
|
|
|
- Use the [Python version of the schema generator](../../server/modules/protocol/examples/cdc_schema.py)
|
|
|
|
|
and copy the generated .avsc files to the _avrodir_
|
|
|
|
|
|
|
|
|
|
All Avro file schemas follow the same general idea. They are in JSON and follow
|
|
|
|
|
the following format:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
{
|
|
|
|
|
"namespace": "MaxScaleChangeDataSchema.avro",
|
|
|
|
|
"type": "record",
|
|
|
|
|
"name": "ChangeRecord",
|
|
|
|
|
"fields":
|
|
|
|
|
[
|
|
|
|
|
{
|
|
|
|
|
"name": "name",
|
|
|
|
|
"type": "string",
|
|
|
|
|
"real_type": "varchar",
|
|
|
|
|
"length": 200
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"name":"address",
|
|
|
|
|
"type":"string",
|
|
|
|
|
"real_type": "varchar",
|
|
|
|
|
"length": 200
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"name":"age",
|
|
|
|
|
"type":"int",
|
|
|
|
|
"real_type": "int",
|
|
|
|
|
"length": -1
|
|
|
|
|
}
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The avrorouter uses the schema file to identify the columns, their names and
|
|
|
|
|
what type they are. The _name_ field contains the name of the column and the
|
|
|
|
|
_type_ contains the Avro type. Read the [Avro specification](https://avro.apache.org/docs/1.8.1/spec.html)
|
|
|
|
|
for details on the layout of the schema files.
|
|
|
|
|
|
|
|
|
|
All Avro schema files for tables that are not created in the binary logs need to
|
|
|
|
|
be in the location pointed by the _avrodir_ router_option and must use the
|
|
|
|
|
following naming: `<database>.<table>.<schema_version>.avsc`. For example, the
|
|
|
|
|
schema file name of the _test.t1_ table would be `test.t1.0000001.avsc`.
|
|
|
|
|
If you used the schema generator scripts, all Avro schema files for tables that
|
|
|
|
|
are not created in the binary logs need to be in the location pointed to by the
|
|
|
|
|
_avrodir_ parameter. The files use the following naming:
|
|
|
|
|
`<database>.<table>.<schema_version>.avsc`. For example, the schema file name of
|
|
|
|
|
the _test.t1_ table would be `test.t1.0000001.avsc`.
|
|
|
|
|
|
|
|
|
|
# Starting MariaDB MaxScale
|
|
|
|
|
|
|
|
|
@ -161,7 +130,7 @@ executing a few commands.
|
|
|
|
|
```
|
|
|
|
|
CHANGE MASTER TO MASTER_HOST='172.18.0.1',
|
|
|
|
|
MASTER_PORT=3000,
|
|
|
|
|
MASTER_LOG_FILE='binlog.000001',
|
|
|
|
|
MASTER_LOG_FILE='binlog.000015',
|
|
|
|
|
MASTER_LOG_POS=4,
|
|
|
|
|
MASTER_USER='maxuser',
|
|
|
|
|
MASTER_PASSWORD='maxpwd';
|
|
|
|
@ -169,30 +138,30 @@ CHANGE MASTER TO MASTER_HOST='172.18.0.1',
|
|
|
|
|
START SLAVE;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**NOTE:** GTID replication is not currently supported and file-and-position
|
|
|
|
|
replication must be used.
|
|
|
|
|
|
|
|
|
|
This will start the replication of binary logs from the master server at
|
|
|
|
|
172.18.0.1:3000. For more details about the details of the commands, refer
|
|
|
|
|
to the [Binlogrouter](../Routers/Binlogrouter.md) documentation.
|
|
|
|
|
172.18.0.1 listening on port 3000. The first file that the binlogrouter
|
|
|
|
|
replicates is `binlog.000015`. This is the same file that was configured as the
|
|
|
|
|
starting file in the avrorouter.
|
|
|
|
|
|
|
|
|
|
For more details about the SQL commands, refer to the
|
|
|
|
|
[Binlogrouter](../Routers/Binlogrouter.md) documentation.
|
|
|
|
|
|
|
|
|
|
After the binary log streaming has started, the avrorouter will automatically
|
|
|
|
|
start converting the binlogs into Avro files.
|
|
|
|
|
start processing the binlogs.
|
|
|
|
|
|
|
|
|
|
For the purpose of this tutorial, create a simple test table using the following
|
|
|
|
|
statement and populated it with some data.
|
|
|
|
|
# Creating and Processing Data
|
|
|
|
|
|
|
|
|
|
Next, create a simple test table and populated it with some data by executing
|
|
|
|
|
the following statements.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
CREATE TABLE test.t1 (id INT);
|
|
|
|
|
INSERT INTO test.t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This table will be replicated through MaxScale and it will be converted into an
|
|
|
|
|
Avro file, which you can inspect by using the _maxavrocheck_ utility program.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
[markusjm@localhost avrodata]$ ../bin/maxavrocheck test.t1.000001.avro
|
|
|
|
|
File sync marker: caaed7778bbe58e701eec1f96d7719a
|
|
|
|
|
/home/markusjm/build/avrodata/test.t1.000001.avro: 1 blocks, 1 records and 12 bytes
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
To use the _cdc.py_ command line client to connect to the CDC service, we must first
|
|
|
|
|
create a user. This can be done via maxadmin by executing the following command.
|
|
|
|
|
|
|
|
|
@ -201,8 +170,29 @@ maxadmin call command cdc add_user avro-service maxuser maxpwd
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This will create the _maxuser:maxpwd_ credentials which can then be used to
|
|
|
|
|
request a data stream of the `test.t1` table that was created earlier.
|
|
|
|
|
request a JSON data stream of the `test.t1` table that was created earlier.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
cdc.py -u maxuser -p maxpwd -h 127.0.0.1 -P 4001 test.t1
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The output is a stream of JSON events describing the changes done to the
|
|
|
|
|
database.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
{"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "name": "ChangeRecord", "fields": [{"name": "domain", "type": "int"}, {"name": "server_id", "type": "int"}, {"name": "sequence", "type": "int"}, {"name": "event_number", "type": "int"}, {"name": "timestamp", "type": "int"}, {"name": "event_type", "type": {"type": "enum", "name": "EVENT_TYPES", "symbols": ["insert", "update_before", "update_after", "delete"]}}, {"name": "id", "type": "int", "real_type": "int", "length": -1}]}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 1, "timestamp": 1537429419, "event_type": "insert", "id": 1}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 2, "timestamp": 1537429419, "event_type": "insert", "id": 2}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 3, "timestamp": 1537429419, "event_type": "insert", "id": 3}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 4, "timestamp": 1537429419, "event_type": "insert", "id": 4}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 5, "timestamp": 1537429419, "event_type": "insert", "id": 5}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 6, "timestamp": 1537429419, "event_type": "insert", "id": 6}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 7, "timestamp": 1537429419, "event_type": "insert", "id": 7}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 8, "timestamp": 1537429419, "event_type": "insert", "id": 8}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 9, "timestamp": 1537429419, "event_type": "insert", "id": 9}
|
|
|
|
|
{"domain": 0, "server_id": 3000, "sequence": 11, "event_number": 10, "timestamp": 1537429419, "event_type": "insert", "id": 10}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The first record is always the JSON format schema for the table describing the
|
|
|
|
|
types and names of the fields. All records that follow it represent the changes
|
|
|
|
|
that have happened on the database.
|
|
|
|
|