The purpose is to recognize e.g. /_utf8mb4 0xD091D092D093/ as
a valid string. The rule actually accepts /id integer/, but in
case the statement is something else but an '_' immediately
followed by a character set, followed by a hex number, it will
be rejected by the server so no harm done.
With this change, a parenthesized top-level SELECT, such as
"(SELECT f FROM t)" will be fully parsed. Before this change,
the statement was classified as invalid and would thus have
been sent to the master.
With this change also statements like
(SELECT f FROM t1) UNION (SELECT f FROM t2)
will be correctly classified, although only partially parsed.
With these changes
SET @saved_cs_client= @@character_set_client;
will be classified as QUERY_TYPE_USERVAR_WRITE and
SELECT 1 AS c1 FROM t1 ORDER BY ( SELECT 1 AS c2 FROM
t1 GROUP BY GREATEST(LAST_INSERT_ID(), t1.a) ORDER BY
GREATEST(LAST_INSERT_ID(), t1.a) LIMIT 1);
will be classified as QUERY_TYPE_READ|QUERY_TYPE_MASTER_READ
When a statement like 'DESCRIBE tbl' is classified, the table
name will now be available so that a router can check whether the
table is a temporary one. In that case, the statement must be sent
to the master.
Recognize the XA keyword and classify the statement as write.
Needs to be dealt with explicitly as sqlite3 assumes there are
no keywords starting with the letter X.
A non version specific executable comment, such as "/*! SELECT 1; */"
is during classification handled as if it would not be a comment. That
is, the contained statement will *always* be parsed.
A version specific executable comment, such as "/*!99999 CREATE PROCEDURE
bypass BEGIN */ SELECT ... " is during classification handled as it would
be a general comment. That is, the contained statement will *never* be
parsed.
In addition, in the latter case the parse result will never be better than
QC_QUERY_PARTIALLY_PARSED. The rationale is that since the comment is version
specific, we cannot know how the server will actually interpret the statement.
This will have an impact on the masking filter and the database firewall that
now will reject statements containing _version specific_ executable comments.
A statement like
SELECT ... INTO OUTFILE|DUMPFILE ...
is now classified as a QUERY_TYPE_WRITE, instead of as
QUERY_TYPE_GSYSVAR_WRITE so that it will be sent only to the
master.
SELECT...FOR UPDATE locks the rows for update, but only if
autocommit==0 or a transaction is active, so in principle even if
it were classified as READ it'd still be sent to master when it
actually matters.
However, even if autocommit==1 and/or no transaction is active, a
slave in read only mode will reject the statement if the user is
subject to the read only restriction (a user with super privileges
is not), which might be considered a server bug. By classifying the
statement as a write, it'll be sent to master and always succeed.
sqlite does not treat # as the start of a to-end-of-line
comment. It cannot trivially be treated as such because at
startup sqlite parses statements containing the #-character.
Thus, only after sqlite has been initialized can it be treated
the same way as --.
The two operations return different types of results and need to be
treated differently in order for them to be handled correctly in 2.2.
This fixes the unexpected internal state errors that happened in all 2.2
versions due to a wrong assumption made by readwritesplit. This fix is not
necessary for newer versions as the LOAD DATA LOCAL INFILE processing is
done with a simpler, and more robust, method.
Earlier only "SELECT NEXT VALUE FOR SEQ" was parsed
properly, while "SELECT PREVIOUS VALUE FOR SEQ" was not.
Now the latter statement is also parsed properly.
ENGINE is a keyword but not a reserved word, so it must
silently convert into an identifier if it is used in a
context where it cannot be used as a keyword.
"INTERVAL N <unit>" is now handled as an expression in itself and
as asuch will cause both statements such as
"SELECT '2008-12-31 23:59:59' + INTERVAL 1 SECOND;"
and
"select id from db2.t1 where DATE_ADD("2017-06-15", INTERVAL 10 DAY) < "2017-06-15";"
to be handled correctly. The compare test program contains some
heuristic checking, as the the embedded parser will in all cases
report date manipulation as the use of the add_date_interval()
function.
Basically it would be trivial to report far more operations
explicitly, but for the fact that the values in qc_query_op_t
currently, quite unnecessarily, form a bitmask.
In 2.2 that is no longer the case, so other operations will be
added there.
In qc_sqlite the fields that a particular function refers to are now
collected and reported. Qc_mysqlembedded needs to be updated accordingly
and also the compare utility. For subsequent commits.
With sqlite3 3110100, which is used in MaxScale, the the generation
of the used op-codes could sometime generate code that did not define
all opcodes. That resulted then in a compilation error like:
.../sqlite-bld-3110100/sqlite3.c: In function 'sqlite3VdbeExec':
.../sqlite-bld-3110100/sqlite3.c:75427:6: error: 'OP_Real' undeclared
(first use in this function)
case OP_Real: { /* same as TK_FLOAT, out2 */
^
The reason seems to be that if a particular op-code was not used, the
generation stopped at that point:
#define OP_Explain 160
#define OP_NotUsed_161 161
With mkopcodeh.tcl from sqlite3 version 3200000, the generated code
looks like
#define OP_NotUsed_161 161
#define OP_Real 162 /* same as TK_FLOAT,
synopsis: r[P2]=P4 */
and the code compiles.
Thus, mkopcodeh.tsl is updated from the newer sqlite3 version.
The sqlite3 initialization is done a bit more properly now.
It is also ensured that issues are logged at most once, even
if a statement is parsed twice.
MXS-1070
Now both qc_mysqlembedded and qc_sqlite return the same stuff
for the same statement, and both include also operators in
addition to pure functions. Whether that is the right approach,
is still subject to debate.
However, if we want to make it possible to disable e.g. the
use of concat as in "select concat(a) from t", where a is a string,
to prevent the bypassing of the masking filter, then conceptually
it should be possible to prevent "select a+0 from t", where a is an
int, as well.
When a backslash is encountered, the backslash should not be
copied but only the character after that.
For the sake of completeness, a few more characters would have
to be handled explicitly, but as the content of a string will not
affect the statement's classification there is not much point in
doing that.
Together with the field names, now qc_get_field_info also returns
field usage information, that is, in what context a field is used.
This allows, for instance, the cache to take action if a a particular
field is selected (SELECT a FROM ...), but not if it is used in a
GROUP BY clause (...GROUP BY a).
This caused a significant modifications of qc_mysqlembedded that
earlier did not walk the parse-tree, but instead looped over of a
list of st_select_lex instances that, the name notwithstanding,
also contain information about other things but SELECTs. The former
approach lost all contextual information, so it was not possible
to know where a particular field was used.
Now the parse tree is walked, which means that the contextual
information is known, and thus the field usage can be updated.
When a DESCRIBE <table> or a SHOW COLUMNS IN <table> query is done, the
actual query is performed on tables in the information_schema
database. This might be what actually happens on the backend server but
this information is not really useful when we need to know which database
the query targets.
By passing the actual table names instead of the underlying table names,
the schemarouter is able to detect where these statements should be
routed.
Now more information about a transaction start is provided. When
a transaction start statement is parsed, the type of the statement
with be QUERY_TYPE_BEGIN_TRX anded with QUERY_TYPE_READ or
QUERY_TYPE_WRITE if the transaction was explicitly started as READ
ONLY or READ WRITE.
Now also BEGIN WORK and [COMMIT|ROLLBACK] WORK are recognized.
"AND CHAIN" will still need to be recognized.