pick from master #35873
Update tpcds tools table customer_demographics's bucket column as its
primary key column, avoid performance issue due to data skew.
update tpcds sf1000 bucket number from 64 to 32, for workarounding fdb issue under storage-compute separation arch.
Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>
### how to get profile.png
1. execute a sql file, and draw its profile
python3 profile_viewer.py -f[path to sql file] -t [query title]
2. draw a given profile
python3 profile_viewer.py -qid [query_id] -t [query title]
graphviz is required(https://graphviz.org/)
on linux: apt install graphviz
on mac: brew install graphviz
### related changes
reimplement rest api: /profile/json/{query_id} to return profile in json format. currently, json profile only contains two counters: RowsReturned and TotalTime
update tpch tools:
1) extend data scale to sf1/sf100/sf1000/sf10000
2) add table schema, sql, opt config for all different scale.
3) refine result output
The `CreateRowPolicyCommand` is implemented with overriding `run()` method.
So when executing `create row policy` in non-master FE, and forward it to Master FE,
it will call `execute(TUniqueId queryId)` method and go through `executeByNereids()`.
And because without `run()` method, it will do nothing and return OK.
So after `show row policy`, user will get empty result.
This PR fix it by implmenting the `run()` method but throw an Exception, so that it will
fallback to old planner, to do the creating row policy command normally.
The full implement of `run()` method should be implemented later.
This is just a tmp fix.
* run-tpch-query shell add analyze database with sync and calculate total time
* run-tpch-query shell add analyze database with sync and calculate total time
Refine tpcds test tools, including split 99 cases into separate files, and refine 100g schema with range partition format.
---------
Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>
1. In the past, we use a BE table named `analysis_jobs` to persist the status of analyze jobs/tasks, however there are many flaws such as, if BE crashed analyze job/task would failed however the status of analyze job/task couldn't get updated.
2. Support `DROP ANALYZE JOB [job_id]` to delete analyze job
3. Support `SHOW ANALYZE TASK STATUS [job_id] ` to get the task status of specific job
4. Restrict the execute condition of auto analyze, only when the last execution of auto analyze job finished a while ago could be executed again
5. Support analyze whole DB