MXS-2732 Recognize character set names

In the tokenizer we will now recognize the character set names
of MariaDB and return a specific token for those. However, where
a character set name is not expected, it will automatically be
treated as an identifier.

Note that when the character set name is explicitly specified
for a literal string, the name must be prefixed with an underscore.
That is, if the character set name is "latin1", when used when
specifying a literal string, it's used as "_latin1 'a'".

Note that this does not fix the sqlite3 bug causing a leak, but
since the statement will now correctly be parsed, the leak will
not manifest itself.
This commit is contained in:
Johan Wikman
2019-10-31 12:46:49 +02:00
parent 177d95c3bc
commit 6cba7e8201
2 changed files with 104 additions and 2 deletions

View File

@ -620,7 +620,7 @@ columnid(A) ::= nm(X). {
// TODO: BINARY is a reserved word and should not automatically convert into an identifer.
// TODO: However, if not here then rules such as CAST need to be modified.
BINARY
/*CASCADE*/ CAST CLOSE COLUMNKW COLUMNS COMMENT CONCURRENT /*CONFLICT*/ CONNECTION
/*CASCADE*/ CAST CHARSET_NAME_KW CLOSE COLUMNKW COLUMNS COMMENT CONCURRENT /*CONFLICT*/ CONNECTION
DATA DATABASE DEALLOCATE DEFERRED /*DESC*/ /*DETACH*/ DUMPFILE
/*EACH*/ END ENGINE ENUM EXCLUSIVE /*EXPLAIN*/ EXTENDED
FIELDS FIRST FLUSH /*FOR*/ FORMAT
@ -1907,6 +1907,7 @@ expr(A) ::= nm(X) DOT nm(Y) DOT nm(Z). {
}
term(A) ::= INTEGER|FLOAT|BLOB(X). {spanExpr(&A, pParse, @X, &X);}
term(A) ::= STRING(X). {spanExpr(&A, pParse, @X, &X);}
term(A) ::= CHARSET_NAME_KW(X) STRING(Y). {spanExpr(&A, pParse, @X, &Y);}
expr(A) ::= VARIABLE(X). {
if( X.n>=2 && X.z[0]=='#' && sqlite3Isdigit(X.z[1]) ){
/* When doing a nested parse, one can include terms in an expression
@ -1926,7 +1927,7 @@ expr(A) ::= VARIABLE(X). {
spanSet(&A, &X, &X);
}
%ifdef MAXSCALE
expr(A) ::= id(X) INTEGER(Y). {
expr(A) ::= CHARSET_NAME_KW(X) INTEGER(Y). {
// The sole purpose of this is to interpret something like '_utf8mb4 0xD091D092D093'
// as a string. It does not matter that any identifier followed by an integer will
// be interpreted as a string, as invalid usage will be caught by the server.