doris

Author	SHA1	Message	Date
AKIRA	10792ca0f7	[fix](nereids) Mistaken stats when analyzing table incrementally and partition number less than 512 #23507 Fix bug that mistaken stats when analyzing table incrementally and partition number less than 512 Fix bug that cron expression lost during analyzing Mark system job as running after registered to AnalysisManager to avoid submit same jobs if previous one take long time	2023-08-28 17:31:36 +08:00
TengJianPing	1312c12236	Revert "[fix](testcase) fix test case failure of insert null value into not null column (#20963 )" (#23462 ) * Revert "[fix](testcase) fix test case failure of insert null value into not null column (#20963)" This reverts commit 55a6649da962fb170ddb40fea8ef26bdc552a51a. Mannual Revert "fix in strict mode, return error for insert if datatype convert fails (#20378)" This mannual reverts commit 1b94b6368f5e871c9a0fe53dd7c64409079a4c9d * fix case failure	2023-08-25 16:47:14 +08:00
AKIRA	35d0c9e71e	[refactor](nereids) Refactor stats collection framework (#22963 ) * remove auto analyze grammer * refactor ResultRow	2023-08-23 10:05:57 +08:00
AKIRA	9f92861c91	[fix](stats) Load partition stats unexpectedly (#22589 ) syncLoadColStats method invoke stale method to deserialize columnstats after supporting load part stats,	2023-08-04 18:50:38 +08:00
AKIRA	ed410034c6	[enhancement](nereids) Sync stats across FE cluster after analyze #21482 Before this PR, if user connect to follower and analyze table, stats would not get cached in follower FE, since Analyze stmt would be forwarded to master, and in follower it's still lazy load to cache.After this PR, once analyze finished on master, master would sync stats to all followers and update follower's stats cache Load partition stats to col stats	2023-07-11 20:09:02 +08:00
AKIRA	acba8648a5	[enhancement](nereids) Add log for stats (#21164 ) 1. LOG sql when analyze failed 2. Return directly for analyze_test suite when there is more than one frontend 3. Set query_timeout for tpcds suites to avoid unneccessary failed caused by analyze sync	2023-06-27 19:17:22 +08:00
TengJianPing	55a6649da9	[fix](testcase) fix test case failure of insert null value into not null column (#20963 )	2023-06-20 20:46:07 +08:00
AKIRA	e63739e729	[fix](nereids) add regression test for stats analyze and fix some bugs (#20865 ) 1. Add regression test case for analyze to make sure show/drop/analyze stats would work as expected 2. Remove useless code, which would block the clean for expired stats 3. Fix bug of DropStats, before this PR drop the whole table stats would casuse a NPE exception when parsing stmt	2023-06-16 16:43:49 +08:00
AKIRA	cc47ee480c	[feat](stats) delete data size stat and Made task timeout configurable (#20090 ) 1. Delete the stats for data size, since it would cost too much time but useless 2. Make task time out configurable since when it's common to analyze a quite huge table that the default 10 min is not suitable	2023-05-29 16:40:59 +08:00
ElvinWei	d6998723e8	Comment stats unstable cases (#20034 )	2023-05-25 21:08:00 +08:00
ElvinWei	a713c225a5	[regressiontest](statistics) Collate and supplement statistics regression test (#19901 ) This pr is mainly supplement statistics regression test. include the following： analyze stats p0 tests: 1. Universal analysis analyze stats p1 tests: 1. Universal analysis 2. Sampled analysis 3. Incremental analysis 4. Automatic analysis 5. Periodic analysis manage stats p0 tests: 1. Alter table stats 2. Show table stats 3. Alter column stats 4. Show column stats and histogram 5. Drop column stats 6. Drop expired stats TODO： 1. Supplement related documents 2. Optimize for unstable cases encountered during testing 3. Add other cases For pr related to statistics, should ensure that all of these cases pass!	2023-05-24 20:17:28 +08:00
AKIRA	3e0b661267	[fix](test) Comment unstable stats test #19729	2023-05-19 08:55:28 +08:00
ElvinWei	c37d781942	[enchancement](statistics) manually inject table level statistics (#19495 ) supports users to manually inject table level statistics. table stats type: - row_count Modify table or partition statistics: ```SQL ALTER TABLE table_name SET STATS ('k1' = 'v1', ...) ``` TODO： - support other table stats type if necessary - update statistics cache if necessary	2023-05-12 17:03:12 +08:00
ElvinWei	fae2e5fd22	[enchancement](statistics) implement automatically analyzing statistics and support table level statistics #19420 Add table level statistics, support SHOW TABLE STATS statement to show table level statistics. Implement automatically analyze statistics, support ANALYZE... WITH AUTO ... statement to automatically analyze statistics. TODO: collate relevant p0 tests Supplement the design description to README.md Issue Number: close #xxx	2023-05-10 11:47:34 +08:00
ElvinWei	3f6e5118e6	[enchancement](statistics) support periodic collection of statistics (#19247 ) This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents： support periodic collection of statistics. Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability. Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing. The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.	2023-05-06 14:53:06 +08:00
ElvinWei	718297d3c1	[test](statistics) add p0 test of sampling statistics (#19176 ) 1. Added test p0 for sampling collection statistics 2. Modify the uniqueKeys of table analysis_jobs for deletion based on relevant conditions 3. Solve the problem that incremental statistics p0 is less stable	2023-04-28 15:50:05 +08:00
ElvinWei	484612a0af	[opt](statistics) optimize Incremental statistics collection and statistics cleaning (#18971 ) This pr mainly optimizes the following items: - the collection of statistics: clear up invalid historical statistics before collecting them, so as not to affect the final table statistics. - the incremental collection of statistics: in the case of incremental collection, only the corresponding partition statistics need to be collected. TODO: Supports incremental collection of materialized view statistics.	2023-04-27 11:51:47 +08:00
AKIRA	d3a0b94602	[feature](stats) Support to kill analyze #18901 1. Report error if submit analyze jobs when stats table is not available 2. Support kill analyze 3. Support cancel sync analyze	2023-04-26 14:23:44 +08:00
AKIRA	a4a85f2476	[feat](stats) Return job id for async analyze stmt (#18800 ) 1. Return job id from async analysis 2. Sync analysis jobs don't save to analysis_jobs anymore	2023-04-25 14:43:54 +08:00
ElvinWei	1a6401d682	[enchancement](statistics) support sampling collection of statistics (#18880 ) 1. Supports sampling to collect statistics 2. Improved syntax for collecting statistics 3. Support histogram specifies the number of buckets 4. Tweaked some code structure --- The syntax supports WITH and PROPERTIES, using the same syntax as before. Column Statistics Collection Syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] [ [WITH SYNC] \| [WITH INCREMENTAL] \| [WITH SAMPLE PERCENT \| ROWS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Column histogram collection syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] UPDATE HISTOGRAM [ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT \| ROWS ][ WITH BUCKETS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Illustrate： - sync：Collect statistics synchronously. Return after collecting. - incremental：Collect statistics incrementally. Incremental collection of histogram statistics is not supported. - sample percent \| rows：Collect statistics by sampling. Scale and number of rows can be sampled. - buckets：Specifies the maximum number of buckets generated when collecting histogram statistics. - table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`. - column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas. - properties：Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement) - 'sync' = 'true' - 'incremental' = 'true' - 'sample.percent' = '50' - 'sample.rows' = '1000' - 'num.buckets' = 10 --- TODO: - Supplement the complete p0 test - `Incremental` statistics see #18653	2023-04-21 13:11:43 +08:00
AKIRA	031d35d4a1	[fix](stats) Stats still in cache after user dropped it (#18720 ) 1. Evict the dropped stats from cache 2. Remove codes for the partition level stats collection 3. Disable analyze whole database directly 4. Fix the potential death loop in the stats cleaner 5. Sleep thread in each loop when scanning stats table to avoid excessive IO usage by this task.	2023-04-18 16:41:10 +08:00
AKIRA	362b5a34ae	[feat](stats) Support to delete expired stats periodically (#18614 ) Support to delete expired stats periodically and manually. default cleaner running interval is 2 days Manually clean syntax is ```sql DROP EXPIRED STATS ``` TODO: 1. process external catalog's stats 2. run drop at the appointed time 3. sleep a short time after drop one batch	2023-04-14 17:32:51 +08:00
AKIRA	db44970685	[feature](stats) Support sync analyze (#18567 ) Gammer: ``` ANALYZE [SYNC] TABLE .... ``` Add this feature so that we could test and tune stats framework conveniently.	2023-04-12 17:49:30 +08:00
ElvinWei	5dfdacd278	[enhancement](histogram) add histogram syntax and perstist histogram statistics (#15490 ) Histogram statistics are more expensive to collect and we collect and persist them separately. This PR does the following work: 1. Add histogram syntax and add keyword `TABLE` 2. Add the task of collecting histogram statistics 3. Persistent histogram statistics 4. Replace fastjson with gson 5. Add unit tests... Relevant syntax examples： > Refer to some databases such as mysql and add the keyword `TABLE`. ```SQL -- collect column statistics ANALYZE TABLE statistics_test; -- collect histogram statistics ANALYZE TABLE statistics_test UPDATE HISTOGRAM ON col1,col2; ``` base on #15317	2023-01-07 00:55:42 +08:00
Mingyu Chen	ed96442b85	[fix](multi-catalog) fix persist issue about jdbc catalog and class loader issue #14794 Fix a bug that JDBC catalog/database/table should be add to GsonUtil Fix a class loader issue that sometime it will cause ClassNotFoundException Fix regression test to use different catalog name. Comment out 2 regression tests: regression-test/suites/query_p0/system/test_query_sys.groovy regression-test/suites/statistics/alter_col_stats.groovy Need to be fixed later	2022-12-05 09:05:13 +08:00
Kikyou1997	33ad616839	[fix](statistics) Fix potential NPE in ShowStatisticsStmt #14679 When required cache hasn't been loaded yet, cache would always return ColumnStatistics.DEFAULT which not define the max/min literal expr, add judge for that.	2022-11-30 08:38:20 +08:00
Kikyou1997	a3062c662c	[feature-wip](statistics) support statistics injection and show statistics (#14201 ) 1. Reduce the configuration options for statistics framework, and add comment for those rest. 2. Move the logic of creation of analysis job to the `StatisticsRepository` which defined all the functions used to interact with internal statistics table 3. Move AnalysisJobScheduler to the statistics package 4. Support display and injections manually for statistics	2022-11-15 11:29:51 +08:00

27 Commits