doris

Author	SHA1	Message	Date
Gabriel	e21ffac419	[Improvement](dateformat) Improve efficiency for function `date_format` (#12811 )	2022-09-21 22:38:16 +08:00
yuanyuan8983	35f07ede26	[typo](docs)Changing the Jump Address of BrokerLoad in SparkLoad (#12735 ) * [typo](docs)Changing the Jump Address of BrokerLoad in SparkLoad Changing the Jump Address of BrokerLoad in SparkLoad * Update spark-load-manual.md	2022-09-21 22:03:28 +08:00
zy-kkk	b09cc95701	[typo](docs) fix get-starting doc err (#12777 )	2022-09-21 21:58:41 +08:00
morrySnow	1c98c3a8f0	[fix](Nereids) GroupExpression never be optimize if it run with exploration job (#12815 ) Exploration job only do explore, but never call optimize. So the GroupExpression explored by exploration only job will never do implementation.	2022-09-21 21:03:37 +08:00
Jibing-Li	fbdebe2424	[feature-wip](new-scan)Add load counter for VFileScanner (#12812 ) The new scanner (VFileScanner) need a counter to record two values in load job. 1. The number of rows unselected by pre-filter, and 2. The number of rows filtered by unmatched schema or other error. This pr is to implement the counter.	2022-09-21 20:59:13 +08:00
Xinyi Zou	c55d08fa2f	[fix](memtracker) Refactor load channel mem tracker to improve accuracy (#12791 ) The mem hook record tracker cannot guarantee that the final consumption is 0, nor can it guarantee that the memory alloc and free are recorded in a one-to-one correspondence. In the life cycle of a memtable from insert to flush, the memory free of hook is more than that of alloc, resulting in tracker consumption less than 0. In order to avoid the cumulative error of the upper load channel tracker, the memtable tracker consumption is reset to zero on destructor.	2022-09-21 20:16:19 +08:00
Xinyi Zou	b41eaa5ac0	[fix](memtracker) Introduce orphan mem tracker to verify memory tracking accuracy (#12794 ) The mem hook consumes the orphan tracker by default. If the thread does not attach other trackers, by default all consumption will be passed to the process tracker through the orphan tracker. In real time, consumption of all other trackers + orphan tracker consumption = process tracker consumption. Ideally, all threads are expected to attach to the specified tracker, so that "all memory has its own ownership", and the consumption of the orphan mem tracker is close to 0, but greater than 0.	2022-09-21 15:47:10 +08:00
Jerry Hu	8f4bb0f804	[improvement](agg) iterate aggregation data in memory written order (#12704 ) Following the iteration order of the hash table will result in out-of-order access to aggregate states, which is very inefficient. Traversing aggregate states in memory write order can significantly improve memory read efficiency. Test hash table items count: 3.35M Before this optimization: insert keys into column takes 500ms With this optimization only takes 80ms	2022-09-21 14:58:50 +08:00
zhannngchen	27f7ae258d	[Enhancement](load) optimize flush policy to avoid small segments #12706 In current policy, if mem-limit exceeded, load channel will pick tablets that consume most memory, but mem_consumption contains memory in flush, if some delta writer flushing a full memtable(default 200MB), the current memtable might be very small, we should avoid flush such memtable, which can generate a very small segment.	2022-09-21 14:33:05 +08:00
Jibing-Li	ec2b3bf220	[feature-wip](new-scan)Refactor VFileScanner, support broker load, remove unused functions in VScanner base class. (#12793 ) Refactor of scanners. Support broker load. This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner. Work still in progress.	2022-09-21 12:49:56 +08:00
morrySnow	7b46e2400f	[enhancement](Nereids) add all necessary PhysicalDistribute on Join's child to ensure get correct cost (#12483 ) In an earlier PR #11976 , we add shuffle join and bucket shuffle support. But if join's right child's distribution spec satisfied join's require, we do not add distribute on right child. Instead of, do it in plan translator. It is hard to calculate accurate cost in this way, since we some distribute cost do not calculated. In this PR, we introduce a new shuffle type BUCKET, and change the way of add enforce to ensure all necessary distribute will be added in cost and enforcer job.	2022-09-21 12:18:37 +08:00
ChPi	a7993755ae	[typo](docs)rename doc file name (#12783 ) Co-authored-by: chenjie <chenjie@cecdat.com>	2022-09-21 11:25:38 +08:00
jakevin	52a0da1f5c	[improve](Nereids): add check validator during post. (#12702 )	2022-09-21 11:25:04 +08:00
luozenglin	b6e20db997	[fix](outfile) select OBJECT and HLL columns into outfile as null. (#12734 )	2022-09-21 11:24:31 +08:00
Gabriel	632867c1c1	[Bug](datetimev2) Fix lost precision for datetimev2 (#12723 )	2022-09-21 11:15:02 +08:00
Gabriel	3cfaae0031	[Improvement](sort) Use heap sort to optimize sort node (#12700 )	2022-09-21 10:01:52 +08:00
Xin Liao	a5643822de	[feature-wip](unique-key-merge-on-write) fix calculate delete bitmap when has sequence column (#12789 ) when the rowset has multiple segments with sequence column, we should compare sequence id with previous segment.	2022-09-21 09:21:07 +08:00
Xinyi Zou	bd4bfa8f00	[fix](memtracker) Fix thread mem tracker try consume accuracy #12782	2022-09-21 09:20:41 +08:00
AlexYue	c72a19f410	[BugFix](VExprContext) capture error status to prevent incorrect func call which causes coredump #12779	2022-09-21 09:20:16 +08:00
AlexYue	f1539761e8	[Bugfix](string_functions) rearrange code to avoid global buffer overflow in FindInSetOp::execute (#12677 )	2022-09-21 09:19:38 +08:00
camby	c5b6056b7a	[fix](lateral_view) fix lateral view explode_split with temp table (#12643 ) Problem describe: follow SQL return wrong result: WITH example1 AS ( select 6 AS k1 ,'a,b,c' AS k2) select k1, e1 from example1 lateral view explode_split(k2, ',') tmp as e1; Wrong result: +------+------+ \| k1 \| e1 \| +------+------+ \| 0 \| a \| \| 0 \| b \| \| 0 \| c \| +------+------+ Correct result should be: +------+------+ \| k1 \| e1 \| +------+------+ \| 6 \| a \| \| 6 \| b \| \| 6 \| c \| +------+------+ Why? TableFunctionNode::outputSlotIds do not include column k1. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-21 09:19:18 +08:00
lsy3993	b0b876f640	[typo](docs) vectorization needs to be turned off to use native udf #12739	2022-09-21 09:13:48 +08:00
Zhengguo Yang	11e0151445	[chore](build) add an option to disable strip thridparty libs (#12772 )	2022-09-21 09:11:25 +08:00
Jerry Hu	7dfbb7c639	[chore](regression-test) add order by column in tpch_sf1_p1/tpch_sf1/nereids/q11.groovy (#12770 )	2022-09-20 22:26:24 +08:00
Gabriel	d5486726de	[Bug](date) Fix wrong result produced by date function (#12720 )	2022-09-20 21:09:26 +08:00
Gabriel	cc072d35b7	[Bug](date) Fix wrong type in TimestampArithmeticExpr (#12727 )	2022-09-20 21:08:48 +08:00
pengxiangyu	b550985df6	fix thirdparty builder (#12768 )	2022-09-20 19:41:00 +08:00
Zhengguo Yang	e70c298e0c	[Bugfix](mem) Fix memory limit check may overflow (#12776 ) This bug is because the result of subtracting signed and unsigned numbers may overflow if it is negative.	2022-09-20 18:18:23 +08:00
caiconghui	bb7206d461	[refactor](SimpleScheduler) refactor code for getting available backend in SimpleScheduler (#12710 )	2022-09-20 18:08:29 +08:00
Ashin Gau	b837b2eb95	[feature-wip](parquet-reader) filter rows by page index (#12664 ) # Proposed changes [Parquet v1.11+ supports page skipping](https://github.com/apache/parquet-format/blob/master/PageIndex.md), which helps the scanner reduce the amount of data scanned, decompressed, decoded, and insertion. According to the performance FlameGraph, decompression takes up 20% cpu time. If a page can be filtered as a whole, the page can not be decompressed. However, the row numbers between pages are not aligned. Columns containing predicates can be filtered by page granularity, but other columns need to be skipped within pages, so non predicate columns can only save the decoding and insertion time. Array column needs the repetition level to align with other columns, so the array column can only save the decoding and insertion time. ## Explore `OffsetIndex` in the column metadata can locate the page position. Theoretically, a page can be completely skipped, including the time of reading from HDFS. However, the average size of a page is around 500KB. Skipping a page requires calling the `skip`. The performance of `skip` is low when it is called frequently, and may not be better than continuous reading of large blocks of data (such as 4MB). If multiple consecutive pages are filtered, `skip` reading can be performed according to`OffsetIndex`. However, for the convenience of programming and readability, the data of all pages are loaded and filtered in turn.	2022-09-20 15:55:19 +08:00
mch_ucchi	47797ad7e8	[feature](Nereids) Push down not slot references expression of on clause (#11805 ) pushdown not slotreferences expr of on clause. select * from t1 join t2 on t1.a + 1 = t2.b + 2 and t1.a + 1 > 2 project() +---join(t1.a + 1 = t2.b + 2 && t1.a + 1 > 2) \|---scan(t1) +---scan(t2) transform to project() +---join(c = d && c > 2) \|---project(t1.a -> t1.a + 1) \| +---scan(t1) +---project(t2.b -> t2.b + 2) +---scan(t2)	2022-09-20 13:41:54 +08:00
minghong	d83eb13ac5	[enhancement](nereids) use Literal promotion to avoid unnecessary cast (#12663 ) Instead of add a cast function on literal, we directly change the literal type. This change could save cast execution time and memory. For example: In SQL: "CASE WHEN l_orderkey > 0 THEN ...", 0 is a TinyIntLiteral. Before this PR: "CASE WHEN l_orderkey > CAST (TinyIntLiteral(0) AS INT)` With this PR: "CASE WHEN l_orderkey > IntegerLiteral(0)"	2022-09-20 11:15:47 +08:00
morrySnow	954c44db39	[enhancement](Nereids) compare LogicalProperties with output set instead of output list (#12743 ) We used output list to compare two LogicalProperties before. Since join reorder will change the children order of a join plan and caused output list changed. the two join plan will not equals anymore in memo although they should be. So we must add a project on the new join to keep the LogicalProperties the same. This PR changes the equals and hashCode funtions of LogicalProperties. use a set of output to compare two LogicalProperties. Then we do not need add the top peoject anymore. This help us keep memo simple and efficient.	2022-09-20 10:55:29 +08:00
slothever	d435f0de41	[feature-wip](parquet-reader) add page index row range (#12652 ) Add some utils and provide the candidate row range (generated with skipped row range of each column) to read for page index filter this version support binary operator filter todo: - use context instead of structures in close() - process complex type filter - use this instead of row group minmax filter - refactor _eval_binary() for row group filter and page index filter	2022-09-20 10:36:19 +08:00
starocean999	ca3e52a0bb	[fix](agg)the output of window function's nullability should be consistent with output slot (#12607 ) FE may force window function to output a nullable value in some case, be should follow this and change the nullability accordingly.	2022-09-20 09:29:44 +08:00
starocean999	4f27692898	[fix](inlineview)the inlineview's slots' nullability property is not set correctly (#12681 ) The output slots of inline view may come from an outer join nullable side table. So it's should be nullable.	2022-09-20 09:29:15 +08:00
Xin Liao	41cf94498d	[feature-wip](unique-key-merge-on-write) fix that incremental clone may lead to loss of delete bitmap (#12721 )	2022-09-20 09:08:06 +08:00
Jerry Hu	f5f6a852fe	[chore](regression-test) add order by in test_rollup_agg_date.groovy (#12737 )	2022-09-20 09:06:13 +08:00
ElvinWei	e1d2f82d8e	[feature](statistics) template for building internal query SQL statements (#12714 ) Template for building internal query SQL statements，it mainly used for statistics module. After the template is defined, the executable statement will be built after the given parameters. For example, template and parameters: - template: `SELECT ${col} FROM ${table} WHERE id = ${id};`, - parameters: `{col=colName, table=tableName, id=1}` - result sql: `SELECT colName FROM tableName WHERE id = 1;` usage: ``` String template = "SELECT * FROM ${table} WHERE id = ${id};"; Map<String, String> params = new HashMap<>(); params.put("table", "table0"); params.put("id", "123"); // result: SELECT * FROM table0 WHERE id = 123; String result = InternalSqlTemplate.processTemplate(template, params); ```	2022-09-19 22:10:28 +08:00
luozenglin	43d6be8c4d	[docs](function) add a series of date function documents (#12713 ) * [docs](function) add a series of date function documents add docs for `hours_add`, `hours_sub`, `minutes_add`, `minutes_sub`, `seconds_add`, `seconds_sub`, `years_sub`, `years_add`, `months_add`, `months_sub`, `days_add`, `days_add`, `weeks_add`, `weeks_sub` functions.	2022-09-19 21:42:35 +08:00
TaoZex	a5d11dce3b	[typo](docs) Add docs of math function (#12532 ) * docs of math function	2022-09-19 21:41:59 +08:00
mch_ucchi	94d73abf2a	[test](Nereids) runtime filter unit cases not rely on NereidPlanner to generate PhysicalPlan anymore (#12740 ) This PR: 1. add rewrite and implement method to PlanChecker 2. improve unit tests of runtime filter	2022-09-19 19:53:55 +08:00
ElvinWei	1339eef33c	[fix](statistics) remove statistical task multiple times in one loop cycle (#12741 ) There is a problem with StatisticsTaskScheduler. The peek() method obtains a reference to the same task object, but the for-loop executes multiple removes.	2022-09-19 19:28:51 +08:00
jakevin	4b5cc62348	[refactor](Nereids) rename transform to applyExploration UT helper class PlanChecker (#12725 )	2022-09-19 16:49:56 +08:00
ElvinWei	08a71236a9	[feature](statistics) Internal-query, execute SQL query statement internally in FE (#9983 ) Execute SQL query statements internally(in FE). Internal-query mainly used for statistics module, FE obtains statistics by SQL from BE, such as column maximum value, minimum value, etc. This is a tool module as statistics, it will not affect the original code, also will not affect the use of users. The simple usage process is as follows(the following code does no exception handling): ``` String dbName = "test"; String sql = "SELECT * FROM table0"; InternalQuery query = new InternalQuery(dbName, sql); InternalQueryResult result = query.query(); List<ResultRow> resultRows = result.getResultRows(); for (ResultRow resultRow : resultRows) { List<String> columns = resultRow.getColumns(); for (int i = 0; i < resultRow.getColumns().size(); i++) { resultRow.getColumnIndex(columns.get(i)); resultRow.getColumnName(i); resultRow.getColumnType(columns.get(i)); resultRow.getColumnType(i); resultRow.getColumnValue(columns.get(i)); resultRow.getColumnValue(i); } } ```	2022-09-19 16:26:54 +08:00
jakevin	399af4572a	[improve](Nereids) improve join cost model (#12657 )	2022-09-19 16:25:30 +08:00
Jibing-Li	5978fd9647	[refactor](file scanner)Refactor file scanner. (#12602 ) Refactor the scanners for hms external catalog, work in progress. Use VFileScanner, will remove NewFileParquetScanner, NewFileOrcScanner and NewFileTextScanner after fully tested. Query for parquet file has been tested, still need to add readers for orc file, text file and load logic as well.	2022-09-19 15:23:51 +08:00
luozenglin	d68b8cce1a	[fix](intersect) fix intersect query failed in row storage code (#12712 )	2022-09-19 11:47:50 +08:00
jakevin	75d7de89a5	[improve](Nereids) Add all slots used by onClause to project when reorder and fix reorder mark (#12701 ) 1. Add all slots used by onClause in project ``` (A & B) & C like join(hash conjuncts: C.t2 = A.t2) \|---project(A.t2) \| +---join(hash conjuncts: A.t1 = B.t1) \| +---A \| +---B +---C transform to (A & C) & B join(hash conjuncts: A.t1 = B.t1) \|---project(A.t2) \| +---join(hash conjuncts: C.t2 = A.t2) \| +---A \| +---C +---B ``` But projection just include `A.t2`, can't find `A.t1`, we should add slots used by onClause when projection exist. 2. fix join reorder mark Add mark `LAsscom` when apply `LAsscom` 3. remove slotReference use `Slot` instead of `SlotReference` to avoid cast.	2022-09-19 11:01:25 +08:00
yiguolei	415721ef20	[enhancement](pred column) improve predicate column insert performance (#12690 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-09-19 10:53:48 +08:00

... 3 4 5 6 7 ...

6608 Commits