cache: Update documentation and add rule handling

The concept of 'allowed_references' was removed from the documentation and the code. Now that COM_INIT_DB is tracked, we will always know what the default database is and hence we can create a cache key that distinguises between identical queries targeting different default database (that is not implemented yet in this change). The rules for the cache is expressed using a JSON object. There are two decisions to be made; when to store data to the cache and when to use data from the cache. The latter is obviously dependent on the former. In this change, the 'store' handling is implemented; 'use' handling will be in a subsequent change.
2016-09-27 15:56:15 +03:00
parent 20b57b1577
commit 7f24f12cfc
6 changed files with 1246 additions and 163 deletions
--- a/Documentation/Filters/Cache.md
+++ b/Documentation/Filters/Cache.md
@ -1,4 +1,4 @@
-#Cache
+# Cache

 ## Overview
 The cache filter is capable of caching the result of SELECTs, so that subsequent identical
@ -16,6 +16,8 @@ module=cache
 ttl=5
 storage=...
 storage_options=...
+rules=...
+debug=...

 [Cached Routing Service]
 type=service
@ -57,36 +59,6 @@ depend upon the specific module. For instance,
 storage_options=storage_specific_option1=value1,storage_specific_option2=value2
 ```

-#### `allowed_references`
-
-Specifies whether any or only fully qualified references are allowed in
-queries stored to the cache.
-```
-allowed_references=[qualified|any]
-```
-The default is `qualified`, which means that only queries where
-the database name is included in the table name are subject to caching.
-```
-select col from db.tbl;
-```
-If `any` is specified, then also queries where the table name is not
-fully qualified are subject to caching.
-```
-select col from tbl;
-```
-Care should be excersized before this setting is changed, because, for
-instance, the following is likely to produce unexpected results.
-```
-use db1;
-select col from tbl;
-...
-use db2;
-select col from tbl;
-```
-The setting can be changed to `any`, provided fully qualified names
-are always used or if the names of tables in different databases are
-different.
-
 #### `max_resultset_rows`

 Specifies the maximum number of rows a resultset can have in order to be
@ -119,6 +91,181 @@ If nothing is specified, the default _ttl_ value is 10.
 ttl=60
 ```

-#Storage
+#### `rules`
+
+Specifies the path of the file where the caching rules are stored. A relative
+path is interpreted relative to the _data directory_ of MariaDB MaxScale.
+
+```
+rules=/path/to/rules-file
+```
+
+#### `debug`
+
+An integer value, using which the level of debug logging made by the cache
+can be controlled. The value is actually a bitfield with different bits
+denoting different logging.
+
+   * `0` (`0b0000`) No logging is made.
+   * `1` (`0b0001`) A matching rule is logged.
+   * `2` (`0b0010`) A non-matching rule is logged.
+   * `4` (`0b0100`) A decision to use data from the cache is logged.
+   * `8` (`0b1000`) A decision not to use data from the cache is logged.
+
+Default is `0`. To log everything, give `debug` a value of `15`.
+
+```
+debug=2
+```
+
+# Rules
+
+The caching rules are expressed as a JSON object.
+
+There are two decisions to be made regarding the caching; in what circumstances
+should data be stored to the cache and in what circumstances should the data in
+the cache be used.
+
+In the JSON object this is visible as follows:
+
+```
+{
+    store: [ ... ],
+    use: [ ... ]
+}
+```
+
+The `store` field specifies in what circumstances data should be stored to
+the cache and the `use` field specifies in what circumstances the data in
+the cache should be used. In both cases, the value is a JSON array containg
+objects.
+
+## When to Store
+
+By default, if no rules file have been provided or if the `store` field is
+missing from the object, the results of all queries will be stored to the
+cache, subject to `max_resultset_rows` and `max_resultset_size` cache filter
+parameters.
+
+By providing a `store` field in the JSON object, the decision whether to
+store the result of a particular query to the cache can be controlled in
+a more detailed manner. The decision to cache the results of a query can
+depend upon
+
+   * the database,
+   * the table,
+   * the column, or
+   * the query itself.
+
+Each entry in the `store` array is an object containing three fields,
+
+```
+{
+    "attribute": <string>,
+    "op": <string>
+    "value": <string>
+}
+```
+
+where,
+   * the _attribute_ can be `database`, `table`, `column` or `query`,
+   * the _op_ can be `=`, `!=`, `like` or `unlike`, and
+   * the _value_ a string.
+
+If _op_ is `=` or `!=` then _value_ is used verbatim; if it is `like`
+or `unlike`, then _value_ is interpreted as a _pcre2_ regular expression.
+
+The objects in the `store` array are processed in order. If the result
+of a comparison is _true_, no further processing will be made and the
+result of the query in question will be stored to the cache.
+
+If the result of the comparison is _false_, then the next object is
+processed. The process continues until the array is exhausted. If there
+is no match, then the result of the query is not stored to the cache.
+
+Note that as the query itself is used as the key, although the following
+queries
+```
+select * from db1.tbl
+```
+and
+```
+use db1;
+select * from tbl
+```
+target the same table and produce the same results, they will be cached
+separately. The same holds for queries like
+```
+select * from tbl where a = 2 and b = 3;
+```
+and
+```
+select * from tbl where b = 3 and a = 2;
+```
+as well. Although they conceptually are identical, there will be two
+cache entries.
+
+### Examples
+
+Cache all queries targeting a particular database.
+```
+{
+    "store": [
+        {
+            "attribute": "database",
+            "op": "=",
+            "value": "db1"
+        }
+    ]
+}
+```
+
+Cache all queries _not_ targeting a particular table
+```
+{
+    "store": [
+        {
+            "attribute": "table",
+            "op": "!=",
+            "value": "tbl1"
+        }
+    ]
+}
+```
+
+That will exclude queries targeting table _tbl1_ irrespective of which
+database it is in. To exclude a table in a particular database, specify
+the table name using a qualified name.
+```
+{
+    "store": [
+        {
+            "attribute": "table",
+            "op": "!=",
+            "value": "db1.tbl1"
+        }
+    ]
+}
+```
+
+Cache all queries containing a WHERE clause
+```
+{
+    "store": [
+        {
+            "attribute": "query",
+            "op": "like",
+            "value": ".*WHERE.*"
+        }
+    ]
+}
+```
+
+Note that that will actually cause all queries that contain WHERE anywhere,
+to be cached.
+
+## When to Use
+
+# Storage

 ## Storage RocksDB