doris

Author	SHA1	Message	Date
Dongyang Li	f018b00646	[ci](perf) add new pipeline of tpch-sf100 (#26334 ) * [ci](perf) add new pipeline of tpch-sf100 Co-authored-by: stephen <hello-stephen@qq.com>	2023-11-08 15:32:02 +08:00
minghong	576972ee79	[opt](tools) analyze with full in tools scripts #25873	2023-10-30 08:19:36 +08:00
xzj7019	46158a6555	[tpcds-tools](nereids) remove cascades specific control for global config (#25595 ) remove sf100 cascades specific control for global config, which is only for q72	2023-10-18 06:47:19 -05:00
minghong	7edc00a78f	[tools](tpc)make tpch-tools and tpcds-tools default scale factor 100 (#25002 ) default sf change to 100G	2023-10-07 23:13:46 +08:00
minghong	4c94820ff9	[opt](nereids) adjust column stats in filter estimation (#24973 ) TPCDS before query4 9335 8113 8070 8070 query13 3104 1386 1385 1385 query18 1704 1216 1151 1151 query48 840 840 839 839 query61 435 379 383 379 query71 715 570 579 570 query85 2822 2627 2612 2612 query88 1897 1816 1793 1793 Total cold run time: 20852 ms Total hot run time: 16799 ms after: query4 9610 8287 8249 8249 query13 1721 1013 1042 1013 query18 1585 1186 1155 1155 query48 789 777 778 777 query61 384 387 381 381 query71 713 610 584 584 query85 2020 1867 1843 1843 query88 1859 1812 1805 1805 Total cold run time: 18681 ms Total hot run time: 15807 ms	2023-09-28 21:34:17 +08:00
谢健	a574f29d76	[enhancement](Nereids): use enforcer to choose the n-th plan (#22929 )	2023-09-28 15:16:24 +08:00
minghong	ca73684d10	[feature](tools)draw a graphic profile (#24495 ) ### how to get profile.png 1. execute a sql file, and draw its profile python3 profile_viewer.py -f[path to sql file] -t [query title] 2. draw a given profile python3 profile_viewer.py -qid [query_id] -t [query title] graphviz is required(https://graphviz.org/) on linux: apt install graphviz on mac: brew install graphviz ### related changes reimplement rest api: /profile/json/{query_id} to return profile in json format. currently, json profile only contains two counters: RowsReturned and TotalTime	2023-09-21 10:24:35 +08:00
minghong	73722ad1cc	[fix](tools) tpch-tools and tpcds-tools update #24650 move analyze from run-query script to load-data script fix some errors in scripts	2023-09-20 23:59:11 +08:00
xzj7019	eb2db1bfb0	[enhance](Tools) update tpch tools (#24291 ) update tpch tools: 1) extend data scale to sf1/sf100/sf1000/sf10000 2) add table schema, sql, opt config for all different scale. 3) refine result output	2023-09-14 09:47:50 +08:00
Mingyu Chen	301a1d97e1	[fix](row-policy) fix creating row policy with forward issue (#23801 ) The `CreateRowPolicyCommand` is implemented with overriding `run()` method. So when executing `create row policy` in non-master FE, and forward it to Master FE, it will call `execute(TUniqueId queryId)` method and go through `executeByNereids()`. And because without `run()` method, it will do nothing and return OK. So after `show row policy`, user will get empty result. This PR fix it by implmenting the `run()` method but throw an Exception, so that it will fallback to old planner, to do the creating row policy command normally. The full implement of `run()` method should be implemented later. This is just a tmp fix.	2023-09-04 15:19:37 +08:00
zhangguoqiang	c25e1b7d95	[fix](tools)fix perf tools analyze db (#23370 )	2023-08-23 18:01:40 +08:00
xzj7019	9a7d4f906b	[enhance](Tools): update tpcds tools #23308 Update tpcds tools: add sf1000 & sf10000 related sql and config.	2023-08-22 17:10:50 +08:00
slothever	f0f3548dfe	[regression](multi-catalog)add EMR cloud env test tools (#21788 ) add emr test tools for aliyun, huawei cloud, tencent cloud.	2023-07-28 09:45:10 +08:00
Gabriel	e78afedd0a	[minor](refactor) refine function logics (#22280 )	2023-07-27 15:09:23 +08:00
Dongyang Li	8180cde83b	[tools](tpcds) Update README.md, use default gcc (#21159 ) compile with gcc-11 is not ok, compile with gcc 9.40 or below is ok, default gcc often meet requirements.	2023-07-24 21:47:51 +08:00
郭小龙	24290799c4	[improvement](tpch) run-tpch-query.sh add analyze database with sync and calculate total time (#21652 ) * run-tpch-query shell add analyze database with sync and calculate total time * run-tpch-query shell add analyze database with sync and calculate total time	2023-07-10 22:04:57 +08:00
郭小龙	9f3bc11b04	[improvement](ssb) run-ssb-queries.sh and run-ssb-flat-queries.sh add analyze database with sync and calculate total time #21653	2023-07-10 11:45:45 +08:00
xzj7019	70f473f32c	[improvement](nereids) Refine tpcds tools (#21421 ) Refine tpcds test tools, including split 99 cases into separate files, and refine 100g schema with range partition format. --------- Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>	2023-07-04 09:28:02 +08:00
Gabriel	7075bcc526	[tools](refactor) remove unused session variables (#21405 )	2023-07-01 16:14:36 +08:00
HappenLee	379a5a8299	[Benchmark](clickbench) change query q29 to right sql (#21398 )	2023-07-01 01:40:13 +08:00
yagagagaga	c3e6db827c	[typo][docs] remove unuse config `mysql_service_nio_enabled` (#20862 )	2023-06-16 09:58:33 +08:00
Liqf	033f64de93	[tools](tpch)add analyze in run-tpch-queries.sh (#20733 )	2023-06-13 14:11:45 +08:00
ElvinWei	22eec4148b	[fix](conf) fix fe host in doris-cluster.conf #20422	2023-06-06 09:15:36 +08:00
AKIRA	e32eba8fdf	[refactor](stats) Persist status of analyze task to FE meta data (#20264 ) 1. In the past, we use a BE table named `analysis_jobs` to persist the status of analyze jobs/tasks, however there are many flaws such as, if BE crashed analyze job/task would failed however the status of analyze job/task couldn't get updated. 2. Support `DROP ANALYZE JOB [job_id]` to delete analyze job 3. Support `SHOW ANALYZE TASK STATUS [job_id] ` to get the task status of specific job 4. Restrict the execute condition of auto analyze, only when the last execution of auto analyze job finished a while ago could be executed again 5. Support analyze whole DB	2023-06-02 12:33:31 +08:00
Gabriel	631494e05d	[regression](decimalv3) Fix output for P1 regression (#20213 )	2023-05-30 15:21:29 +08:00
Gabriel	851886cc18	[minor](datev2) remove datev2 because datev2 is used by default (#19777 )	2023-05-18 13:36:11 +08:00
zhangdong	b129c9901b	[improvement](FQDN)Change the implementation of fqdn (#19123 ) Main changes: 1. If fqdn is enabled in the configuration file, when fe starts, localAddr will obtain fqdn instead of IP, priority_ Networks will fail 2. The IP and host names of Backend and Front are combined into one field, host. When fqdn is enabled, it represents the host name, and when not enabled, it represents the IP address 3. The communication between clusters directly uses fqdn, and various Connection pool add authentication mechanisms to prevent the IP address of the domain name from changing and the connection between nodes from making errors 4. No longer requires polling to verify if the IP has changed, delete fqdnManager 5. Change the method of verifying the legitimacy of nodes between FEs from obtaining client IP to displaying the identity of the transmitting node itself in the HTTP request header or the message body of the throttle 6. When processing the heartbeat, if BE finds that the host stored by itself is inconsistent with the host stored by the master, after verifying the legitimacy of the host, it will change its own host instead of directly reporting an error 7. Simplify the generation logic of fe name Scope of influence: 1. Establishing communication connections between clusters 2. Determine whether it is the same node through attributes such as IP 3. Print Log 4. Information display 5. Address Splicing 6. k8s deployment 7. Upgrade compatibility Test plan: 1. Change the IP address of the node, while keeping the fqdn unchanged, change the IP addresses of fe and be, and verify whether the cluster can read and write data normally 2. Use the master code to generate metadata, and use the previous metadata on the current pr to verify whether it is compatible with the old version (upgrading is no longer supported if fqdn has been enabled before) 3. Deploy fe and be clusters using k8s to verify whether the cluster can read and write data normally 4. According to https://doris.apache.org/zh-CN/docs/dev/admin-manual/cluster-management/fqdn?_highlight=fqdn#%E6%97%A7%E9%9B%86%E7%BE%A4%E5%90%AF%E7%94%A8fqdn Upgrading old clusters 5. Use streamload to specify the fqdn of fe and be to import data separately 6. Use different users to start transactions and write data using insert statements	2023-05-11 00:44:48 +08:00
Gabriel	208d21b01d	[tools](tpch) use origin TPCH qurries (#19479 )	2023-05-10 14:29:45 +08:00
Dongyang Li	6c21df6324	[tools](tpch) run mode like clickbench (#19339 )	2023-05-06 23:33:26 +08:00
Pxl	ec517a53a8	[Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155 ) upgrade clang-format version to 16 move thrift to fe-common fix core dump on pipeline engine when operator canceled and not prepared	2023-04-28 14:14:51 +08:00
TengJianPing	34ce946f5b	[tools](profile) add script file to get all tree profiles off a query (#18587 ) Add a tool script that output query profiles of all fragment instances in tree form.	2023-04-22 22:10:57 +08:00
谢健	63994e351f	[fix](Nereids) extract complicated time string in evaluating cost model framework (#17864 ) 1. The time string in the profile can be "xx s xx ms". The framework should extract time with re package to support more complicated time string 2. Add stats for sortNode and AggNode in `withChildren`	2023-04-07 15:33:04 +08:00
WenYao	c3fe113894	rename PaloFe to DorisFE (#18167 )	2023-03-29 00:30:16 +08:00
AKIRA	34dc7e57c1	[ehancement](stats) Tune for stats framework (#18035 ) 1. Estimate timearithmeticexpr instead of setting Double.MAX Double.MIN directly 2. Enable histogram to derive stats 3. Loose the condition for histogram usage 4. Improve the accuracy for agg on TPC-H 1G greatly 5. Fix avg qerror calculation	2023-03-23 16:03:58 +08:00
AKIRA	f600f70619	[ehancement](fe) Tune for stats framework (#17860 )	2023-03-22 11:07:56 +08:00
AKIRA	bece027135	[ehancement](profile) Add HTTP interface for q-error (#17786 ) 1. Add Http interface for query q-error 2. Fix the selectivity calculation of inner join, it would always be 0 if there is only one join condition before	2023-03-16 12:19:23 +08:00
Dongyang Li	67b7128e8a	[tools](tpcds) fix bug of generating and loading data (#17835 ) --------- Co-authored-by: stephen <hello_stephen@@qq.com>	2023-03-16 11:59:39 +08:00
Gabriel	079e6a3e12	[regression-test](vectorized) remove unused vectorization flag (#17662 )	2023-03-15 17:59:22 +08:00
Dongyang Li	cd7e03575b	[tools](tpc-ds) add script tools to run tpc-ds conveniently (#17366 ) build-tpcds-tools.sh gen-tpcds-data.sh gen-tpcds-queries.sh create-tpcds-tables.sh load-tpcds-data.sh run-tpcds-queries.sh generate data and queries support specify SCALE, create table may need to be edited handly to specify BUCKETS or change int to bigint if SCALE is too big. --------- Co-authored-by: stephen <hello_stephen@@qq.com>	2023-03-03 08:24:07 +08:00
谢健	e3d7f7c8d8	[feature](Nereids) add test framework for cost model (#17071 ) add test-frame-work for cost model according paper Testing the Accuracy of Query Optimizers	2023-02-28 20:59:07 +08:00
ZhangJian He	8e179d3a54	[minor][typo] fix typo in load-clickbench-data script (#17133 )	2023-02-26 10:56:04 +08:00
Adonis Ling	e7f9819168	[chore](tools) Fix NoSuchMethodError while loading data by http requests (#17075 ) When we used the tool multi-fe to start multiple FEs cluster and loaded data by stream load way, the request failed. See the following log. The issue was caused by the netty libraries. There are multiple netty libraries in classpath and the FE used the newer version netty library which made these errors.	2023-02-25 12:28:35 +08:00
ZhangYu0123	56ebbf8bc9	[chore](tools) fix load-clickbench-data script cannot be interrupted #17000	2023-02-22 19:34:40 +08:00
Adonis Ling	0b3e18d060	[chore](macOS) Support LLVM Clang 15 (#16991 ) Remove the deprecated classes std::codecvt_utf8_utf16<char16_t> and std::wstring_convert. Use libiconv to convert UTF-8 strings to UTF-16LE ones.	2023-02-22 15:04:48 +08:00
Adonis Ling	0950a08efd	[chore](tools) Support starting multiple FEs on single node (#16787 ) Introduce a tool to start multiple FEs on single node. Use case: ``` $ ./multi-fe ./multi-fe start\|stop\|clean [OPTIONS ...] start -n <NUM> -l <LIBRARY_PATH> -p <BASE_PORT> Start the FE cluster. -n The number of FEs. -l The FE library path (default: doris/output/fe/lib) -p The base port to generate all needed ports (default: 9030). stop Stop the FE cluster. clean Stop the data (rm -rf "$(pwd)"/fe*). ```	2023-02-21 10:55:36 +08:00
奕冷	cf739e7496	[Enhancement](Stmt) Set insert_into timeout session variable separately (#16343 )	2023-02-12 16:56:10 +08:00
zbtzbtzbt	1b2f882d24	[fix](terminal) remove echo database passwd (#15876 ) * remove echo passwd * add timer for data load	2023-01-16 22:00:52 +08:00
HappenLee	816e12db6a	[Bench](mem) some benchmark over the query limit (#15408 )	2022-12-28 09:29:53 +08:00
Gabriel	5c7964d396	[minor](tools) delete unused script (#14752 )	2022-12-02 15:20:01 +08:00
minghong	5ca6596ca3	[fix](tpch-tools) disable join reorder for Q12 in TPC-H tools (#14728 )	2022-12-01 20:55:17 +08:00

1 2

98 Commits