 25f9cc14b4
			
		
	
	25f9cc14b4
	
	
	
		
			
			The avrorouter now uses the parameters from the source service. This removes the need for redundant parameter definition in the avrorouter service when they are defined in the binlogrouter service as parameters. Added some missing configuration sanity checks and updated the tutorial to reflect the new configuration method introduced in 2.1.
		
			
				
	
	
		
			209 lines
		
	
	
		
			6.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			209 lines
		
	
	
		
			6.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Avrorouter Tutorial
 | |
| 
 | |
| This tutorial is a short introduction to the
 | |
| [Avrorouter](../Routers/Avrorouter.md), how to set it up and how it interacts
 | |
| with the binlogrouter.
 | |
| 
 | |
| The avrorouter can also be deployed directly on the master server which removes
 | |
| the need to use the binlogrouter. This does require a lot more disk space on the
 | |
| master server as both the binlogs and the Avro format files are stored there. It
 | |
| is recommended to deploy the avrorouter and the binlogrouter on a remove server
 | |
| so that the data streaming process has a minimal effect on performance.
 | |
| 
 | |
| The first part configures the services and sets them up for the binary log to Avro
 | |
| file conversion. The second part of this tutorial uses the client listener
 | |
| interface for the avrorouter and shows how to communicate with the the service
 | |
| over the network.
 | |
| 
 | |
| 
 | |
| 
 | |
| # Configuration
 | |
| 
 | |
| ## Preparing the master server
 | |
| 
 | |
| The master server where we will be replicating from needs to have binary logging
 | |
| enabled, the binary log format set to row based replication and the binary log
 | |
| row image needs to contain all the changed. These can be enabled by adding the
 | |
| two following lines to the _my.cnf_ file of the master.
 | |
| 
 | |
| ```
 | |
| binlog_format=row
 | |
| binlog_row_image=full
 | |
| ```
 | |
| 
 | |
| _You can find out more about replication formats from the
 | |
| [MariaDB Knowledge Base](https://mariadb.com/kb/en/mariadb/binary-log-formats/)_
 | |
| 
 | |
| ## Configuring MaxScale
 | |
| 
 | |
| We start by adding two new services into the configuration file. The first
 | |
| service is the binlogrouter service which will read the binary logs from the
 | |
| master server. The second service will read the binlogs as they are streamed
 | |
| from the master and convert them into Avro format files.
 | |
| 
 | |
| ```
 | |
| # The Replication Proxy service
 | |
| [replication-service]
 | |
| type=service
 | |
| router=binlogrouter
 | |
| server_id=4000
 | |
| master_id=3000
 | |
| filestem=binlog
 | |
| user=maxuser
 | |
| passwd=maxpwd
 | |
| 
 | |
| # The Avro conversion service
 | |
| [avro-service]
 | |
| type=service
 | |
| router=avrorouter
 | |
| source=replication-service
 | |
| 
 | |
| # The listener for the replication-service
 | |
| [replication-listener]
 | |
| type=listener
 | |
| service=replication-service
 | |
| protocol=MySQLClient
 | |
| port=3306
 | |
| 
 | |
| # The client listener for the avro-service
 | |
| [avro-listener]
 | |
| type=listener
 | |
| service=avro-service
 | |
| protocol=CDC
 | |
| port=4001
 | |
| 
 | |
| # The MaxAdmin service and listener for MaxScale administration
 | |
| [CLI]
 | |
| type=service
 | |
| router=cli
 | |
| 
 | |
| [CLI Listener]
 | |
| type=listener
 | |
| service=CLI
 | |
| protocol=maxscaled
 | |
| socket=default
 | |
| ```
 | |
| 
 | |
| You can see that the `source` parameter in the _avro-service_ points to the
 | |
| _replication-service_ we defined before.  This service will be the data source
 | |
| for the avrorouter. The _filestem_ is the prefix in the binlog files.  For more
 | |
| information on the avrorouter options, read the
 | |
| [Avrorouter Documentation](../Routers/Avrorouter.md).
 | |
| 
 | |
| After the services were defined, we added the listeners for the
 | |
| _replication-service_ and the _avro-service_. The _CDC_ protocol is a new
 | |
| protocol added with the avrorouter and it is the only supported protocol for the
 | |
| avrorouter.
 | |
| 
 | |
| # Preparing the data in the master server
 | |
| 
 | |
| Before starting the MaxScale process, we need to make sure that the binary logs
 | |
| of the master server contain the DDL statements that define the table
 | |
| layouts. What this means is that the `CREATE TABLE` statements need to be in the
 | |
| binary logs before the conversion process is started.
 | |
| 
 | |
| If the binary logs contain data modification events for tables that aren't
 | |
| created in the binary logs, the Avro schema of the table needs to be manually
 | |
| created. There are two ways to do this:
 | |
| 
 | |
| - Manually create the schema
 | |
| - Use the [_cdc_schema_ Go utilty](../Routers/Avrorouter.md#avro-schema-generator)
 | |
| - Use the [Python version of the schema generator](../../server/modules/protocol/examples/cdc_schema.py)
 | |
| 
 | |
| All Avro file schemas follow the same general idea. They are in JSON and follow
 | |
| the following format:
 | |
| 
 | |
| ```
 | |
| {
 | |
|     "namespace": "MaxScaleChangeDataSchema.avro",
 | |
|     "type": "record",
 | |
|     "name": "ChangeRecord",
 | |
|     "fields":
 | |
|     [
 | |
|         {
 | |
|             "name": "name",
 | |
|             "type": "string",
 | |
|             "real_type": "varchar",
 | |
|             "length": 200
 | |
|         },
 | |
|         {
 | |
|             "name":"address",
 | |
|             "type":"string",
 | |
|             "real_type": "varchar",
 | |
|             "length": 200
 | |
|         },
 | |
|         {
 | |
|             "name":"age",
 | |
|             "type":"int",
 | |
|             "real_type": "int",
 | |
|             "length": -1
 | |
|         }
 | |
|     ]
 | |
| }
 | |
| ```
 | |
| 
 | |
| The avrorouter uses the schema file to identify the columns, their names and
 | |
| what type they are. The _name_ field contains the name of the column and the
 | |
| _type_ contains the Avro type. Read the [Avro specification](https://avro.apache.org/docs/1.8.1/spec.html)
 | |
| for details on the layout of the schema files.
 | |
| 
 | |
| All Avro schema files for tables that are not created in the binary logs need to
 | |
| be in the location pointed by the _avrodir_ router_option and must use the
 | |
| following naming: `<database>.<table>.<schema_version>.avsc`. For example, the
 | |
| schema file name of the _test.t1_ table would be `test.t1.0000001.avsc`.
 | |
| 
 | |
| # Starting MariaDB MaxScale
 | |
| 
 | |
| The next step is to start MariaDB MaxScale and set up the binlogrouter. We do
 | |
| that by connecting to the MySQL listener of the _replication_router_ service and
 | |
| executing a few commands.
 | |
| 
 | |
| ```
 | |
| CHANGE MASTER TO MASTER_HOST='172.18.0.1',
 | |
|        MASTER_PORT=3000,
 | |
|        MASTER_LOG_FILE='binlog.000001',
 | |
|        MASTER_LOG_POS=4,
 | |
|        MASTER_USER='maxuser',
 | |
|        MASTER_PASSWORD='maxpwd';
 | |
| 
 | |
| START SLAVE;
 | |
| ```
 | |
| 
 | |
| This will start the replication of binary logs from the master server at
 | |
| 172.18.0.1:3000. For more details about the details of the commands, refer
 | |
| to the [Binlogrouter](../Routers/Binlogrouter.md) documentation.
 | |
| 
 | |
| After the binary log streaming has started, the avrorouter will automatically
 | |
| start converting the binlogs into Avro files.
 | |
| 
 | |
| For the purpose of this tutorial, create a simple test table using the following
 | |
| statement and populated it with some data.
 | |
| 
 | |
| ```
 | |
| CREATE TABLE test.t1 (id INT);
 | |
| INSERT INTO test.t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
 | |
| ```
 | |
| 
 | |
| This table will be replicated through MaxScale and it will be converted into an
 | |
| Avro file, which you can inspect by using the _maxavrocheck_ utility program.
 | |
| 
 | |
| ```
 | |
| [markusjm@localhost avrodata]$ ../bin/maxavrocheck test.t1.000001.avro
 | |
| File sync marker: caaed7778bbe58e701eec1f96d7719a
 | |
| /home/markusjm/build/avrodata/test.t1.000001.avro: 1 blocks, 1 records and 12 bytes
 | |
| ```
 | |
| 
 | |
| To use the _cdc.py_ command line client to connect to the CDC service, we must first
 | |
| create a user. This can be done via maxadmin by executing the following command.
 | |
| 
 | |
| ```
 | |
| maxadmin call command cdc add_user avro-service maxuser maxpwd
 | |
| ```
 | |
| 
 | |
| This will create the _maxuser:maxpwd_ credentials which can then be used to
 | |
| request a data stream of the `test.t1` table that was created earlier.
 | |
| 
 | |
| ```
 | |
| cdc.py -u maxuser -p maxpwd -h 127.0.0.1 -P 4001 test.t1
 | |
| ```
 |