Update Avrorouter tutorial

The tutorial now displays a single configuration example which should be easier
to understand.

Added a sub-section about configuring the master server with proper replication
settings and explained the need for the CREATE TABLE statements in the binary
logs.

Display an example Avro JSON schema and provide links to the schema definition
documents. Mention the cdc_schema utility as the second option for creating the
schema files.
This commit is contained in:
Markus Makela
2016-08-12 11:23:20 +03:00
committed by Johan Wikman
parent 54c50c5a52
commit 33fb8effa1

View File

@ -1,7 +1,8 @@
# Avrorouter Tutorial # Avrorouter Tutorial
This tutorial is a short introduction to the [Avrorouter](../Routers/Avrorouter.md), how to set it up and This tutorial is a short introduction to the
how it interacts with the binlogrouter. [Avrorouter](../Routers/Avrorouter.md), how to set it up and how it interacts
with the binlogrouter.
The avrorouter can also be deployed directly on the master server which removes The avrorouter can also be deployed directly on the master server which removes
the need to use the binlogrouter. This does require a lot more disk space on the need to use the binlogrouter. This does require a lot more disk space on
@ -16,49 +17,55 @@ over the network.
# Configuration # Configuration
We start by adding two new services into the configuration file. The first ## Preparing the master server
service is the binlogrouter service which will read the binary logs from
the master server. The master server where we will be replicating from needs to have binary logging
enabled, the binary log format set to row based replication and the binary log
row image needs to contain all the changed. These can be enabled by adding the
two following lines to the _my.cnf_ file of the master.
``` ```
binlog_format=row
binlog_row_image=full
```
_You can find out more about replication formats from the [MariaDB Knowledge Base](https://mariadb.com/kb/en/mariadb/binary-log-formats/)_
## Configuring MaxScale
We start by adding two new services into the configuration file. The first
service is the binlogrouter service which will read the binary logs from the
master server. The second service will read the binlogs as they are streamed
from the master and convert them into Avro format files.
```
# The Replication Proxy service
[replication-service] [replication-service]
type=service type=service
router=binlogrouter router=binlogrouter
router_options=server-id=4000, router_options=server-id=4000,
master-id=3000, master-id=3000,
binlogdir=/home/markusjm/binlogs, binlogdir=/var/lib/maxscale/binlog/,
mariadb10-compatibility=1, mariadb10-compatibility=1,
filestem=binlog
user=maxuser user=maxuser
passwd=maxpwd passwd=maxpwd
```
The second service will read the binlogs as they are streamed from the master
and convert them into Avro format files.
```
# The Avro conversion service # The Avro conversion service
[avro-service] [avro-service]
type=service type=service
router=avrorouter router=avrorouter
source=replication-service source=replication-service
``` router_options=avrodir=/var/lib/maxscale/avro/,
filestem=binlog
You can see that the `source` parameter points to the service we defined before. # The listener for the replication-service
This service will be the data source for the avrorouter.
After the services have been defined, we add the listeners for the _replication-service_
and the _avro-service_.
```
# The listener for the Binlog Server
[replication-listener] [replication-listener]
type=listener type=listener
service=replication-router service=replication-router
protocol=MySQLClient protocol=MySQLClient
port=4000 port=4000
# The client listener for the Avro conversion service # The client listener for the avro-service
[avro-listener] [avro-listener]
type=listener type=listener
service=avro-service service=avro-service
@ -66,19 +73,70 @@ protocol=CDC
port=4001 port=4001
``` ```
The _CDC_ protocol is a new protocol added with the avrorouter and, at the time You can see that the `source` parameter in the _avro-service_ points to the
of writing, it is the only supported protocol for the avrorouter. _replication-service_ we defined before. This service will be the data source
for the avrorouter. The _filestem_ is the prefix in the binlog files and the
additional _avrodir_ router_option is where the converted Avro files are stored.
For more information on the avrorouter options, read the [Avrorouter Documentation](../Routers/Avrorouter.md).
The _binlogdir_ is the location where the binary logs are stored and After the services were defined, we added the listeners for the
where the avrorouter will read them. The _filestem_ is the name prefix of _replication-service_ and the _avro-service_. The _CDC_ protocol is a new
the binary logs and this should be the same as the `log-bin` value in the master protocol added with the avrorouter and, at the time of writing, it is the only
server. These parameters should be the same for both services. The _avrodir_ is supported protocol for the avrorouter.
where the converted Avro files are stored.
# Preparing the data in the master server
Before starting the MaxScale process, we need to make sure that the binary logs
of the master server contain the DDL statements that define the table
layouts. What this means is that the `CREATE TABLE` statements need to be in the
binary logs before the conversion process is started.
If the binary logs contain data modification events for tables that aren't
created in the binary logs, the Avro schema of the table needs to be manually
created. There are two ways to do this:
- Manually create the schema
- Use the [_cdc_schema_ Go utilty](../Routers/Avrorouter.md#avro-schema-generator)
All Avro file schemas follow the same general idea. They are in JSON and follow
the following format:
```
{
"Namespace": "MaxScaleChangeDataSchema.avro",
"Type": "record",
"Name": "ChangeRecord",
"Fields":
[
{
"Name": "name",
"Type": "string"
},
{
"Name":"address",
"Type":"string"
},
{
"Name":"age",
"Type":"int"
}
]
}
```
The avrorouter uses the schema file to identify the columns, their names and
what type they are. The Name fiels contains the name of the column and the Type
contains the Avro type. Read the [Avro specification](https://avro.apache.org/docs/1.8.1/spec.html)
for details on the layout of the schema files.
All Avro schema files for tables that are not created in the binary logs need to
be in the location pointed by the _avrodir_ router_option and must use the following naming: `<database>.<table>.<schema_version>.avsc`. For example, the schema file name of the _test.t1_ table would be `test.t1.0000001.avsc`.
# Starting MariaDB MaxScale # Starting MariaDB MaxScale
The next step is to start MariaDB MaxScale and set up the binlogrouter. We do that by connecting The next step is to start MariaDB MaxScale and set up the binlogrouter. We do
to the MySQL listener of the _replication_router_ service and executing a few commands. that by connecting to the MySQL listener of the _replication_router_ service and
executing a few commands.
``` ```
CHANGE MASTER TO MASTER_HOST='172.18.0.1', CHANGE MASTER TO MASTER_HOST='172.18.0.1',
@ -96,8 +154,18 @@ This will start the replication of binary logs from the master server at
to the [Binlogrouter](../Routers/Binlogrouter.md) documentation. to the [Binlogrouter](../Routers/Binlogrouter.md) documentation.
After the binary log streaming has started, the avrorouter will automatically After the binary log streaming has started, the avrorouter will automatically
start converting the binlogs into Avro files. You can inspect the Avro files start converting the binlogs into Avro files.
by using the _maxavrocheck_ utility program.
For the purpose of this tutorial, create a simple test table using the following
statement and populated it with some data.
```
CREATE TABLE test.t1 (id INT);
INSERT INTO test.t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
```
This table will be replicated through MaxScale and it will be converted into an
Avro file, which you can inspect by using the _maxavrocheck_ utility program.
``` ```
[markusjm@localhost avrodata]$ ../bin/maxavrocheck test.t1.000001.avro [markusjm@localhost avrodata]$ ../bin/maxavrocheck test.t1.000001.avro