Commit Graph

7106 Commits

Author SHA1 Message Date
1ef85ae1f2 [Improvement](join) Support nested loop outer join (#13965) 2022-11-10 19:50:46 +08:00
6c13126e5c [enhancement](Nereids) analyze check input slots must in child's output (#14107) 2022-11-10 19:28:01 +08:00
ae4f2aead7 [fix](nereids) column stats min/max missing (#14091)
in the result of SHOW COLUMN STATS tbl, min/max value is not displayed.
2022-11-10 17:08:44 +08:00
6bd5378f66 [feature-wip](multi-catalog) lazy read for ParquetReader (#13917)
Read predicate columns firstly, and use VExprContext(push-down predicates)
to generate the select vector, which is then applied to read the non-predicate columns.
The data in non-predicate columns may be skipped by select vector, so the value-decode-time can be reduced.
If a whole page can be skipped, the decompress-time can also be reduced.
2022-11-10 16:56:14 +08:00
724cf1cdb8 [chore][build] add instructions to build version string (#14067) 2022-11-10 16:23:34 +08:00
9b5b411112 [fix](schemeChange) fe oom because replicas too many when schema change (#12850) 2022-11-10 16:17:25 +08:00
151a72d158 [feature](Nereids) support circle graph (#14082) 2022-11-10 15:54:21 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
a73f4dfdc1 [fix](memtracker) Fix scanner thread ending after fragment thread causing mem tracker null pointer #14143 2022-11-10 15:42:53 +08:00
4cde9c4765 [enhance](Nereids): add missing hypergraph rule. (#14087) 2022-11-10 15:23:31 +08:00
90bfd87660 [feature](function) add new function uuid() (#14092) 2022-11-10 14:55:41 +08:00
0dfdbe4508 [feature](Nereids): InnerJoinLeftAssociate, InnerJoinRightAssociate and JoinExchange. (#14051) 2022-11-10 12:21:06 +08:00
8c5c6d9d7f [fix](ctas) fix wrong string column length after executing ctas from external table (#14090) 2022-11-10 11:36:56 +08:00
17867e446f [feature](nereids) let user define right deep tree penalty by session variable (#14040)
it is hard for us to find a proper factor for all queries.
default is 0.7
2022-11-10 11:25:02 +08:00
57225d69f3 [Fix] add hll param for if function (#12366)
* [Fix] add hll param for if function

* add ut

Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>
2022-11-10 11:20:58 +08:00
84b969a25c [fix](grouping)the grouping expr should check col name from base table first, then alias (#14077)
* [fix](grouping)the grouping expr should check col name from base table first, then alias

* fix fe ut, the behavior would be same as mysql
2022-11-10 11:10:42 +08:00
994d563f52 [fix](nereids) cannot collect decimal column stats (#13961)
When execute analyze table, doris fails on decimal columns.
The root cause is the scale in decimalV2 is 9, but 2 in schema.
There is no need to check scale for decimalV2, since it is not a float point type.
2022-11-10 11:06:38 +08:00
184cee2d2b [Bug](outfile) Fix wrong decimal format for ORC (#14124) 2022-11-10 11:01:30 +08:00
43eb946543 [feature](table-valued-function)S3 table valued function supports parquet/orc/json file format #14130
S3 table valued function supports parquet/orc/json file format.
For example: parquet format
2022-11-10 10:33:12 +08:00
10df61b5bf [improvement](join) Share hash table in fragments for broadcast join (#13921) 2022-11-10 09:48:34 +08:00
df622d8b7d [Bug](udf) fix java-udaf process string type error and add some tests (#14106) 2022-11-10 09:30:57 +08:00
55cae6202f [typo](docs)add udf doc and optimize udf regression test (#14000) 2022-11-10 09:24:45 +08:00
3690c4dbe7 [fix](load) fix that load channel failed to be released in time (#14119) 2022-11-09 22:38:08 +08:00
Pxl
794a551b0f [Enhancement][fix](profile)() modify some profiles (#14074)
1. add RemainedDownPredicates
2. fix core dump when _scan_ranges is empty
3. fix invalid memory access on vLiteral's debug_string()
4. enlarge mv test wait time
2022-11-09 21:59:28 +08:00
322ac5cf89 [refractor](array) refractor DataTypeArray from_string (#13905)
refractor DataTypeArray from_string, make it more clear;
support ',' and ']' inside string element, for example: ['hello,,,', 'world][]']
support empty elements, such as [,] ==> [0,0]
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-09 16:58:08 +08:00
3117ac9289 [enhancement](Nereids) use post-order to generate runtime filter in RuntimeFilterGenerator (#13949)
change runtime filter generator from pre-order to post-order, it maybe change the quantity of generated runtime filters.
and the ut will be corrected.
2022-11-09 14:28:49 +08:00
b74d0a4747 [feature](table-valued-function) Support desc from s3() and modify the syntax of tvf (#14047)
This pr does two things:

Support desc function s3()
modify the syntax of tvf
2022-11-09 14:12:43 +08:00
f912d4e392 [fix](compile) fix compile error #14103
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-09 14:10:06 +08:00
e692636b4f [performance-wip] (vectorization) Opt HashJoin Performance (#12390) 2022-11-09 14:07:49 +08:00
84bb82acc0 [fix](Nereids) aggregate disassemble generate error output list on GLOBAL phase aggregate (#14079)
we must use localAggregateFunction as key of globalOutputSMap, because we use local output exprs to generate global output in disassembleDistinct
2022-11-09 13:43:12 +08:00
b144d2b4f4 [improve](Nereids): remove redundant code, add annotation in Memo. (#14083) 2022-11-09 13:39:20 +08:00
aff62655c4 [feature](Nereids) binding slot in order by that not show in project (#14042)
1. binding slot in order by that not show in project, such as:
SELECT c1 FROM t WHERE c2 > 0 ORDER BY c3

2. not check unbound when bind slot reference. Instead, do it in analysis check.
2022-11-09 13:25:41 +08:00
7362460525 [docs](array-type) update the docs to specify how to use array function when import data (#13995)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-11-09 12:21:26 +08:00
a3c5fa8c01 [Compile](join) Boost compiling and linking (#14081) 2022-11-09 11:27:46 +08:00
55ca810445 [fix](Vectorized)fix json_object and json_array function return wrong result on vectorized engine (#13775)
Issue Number: close #13598
2022-11-09 11:26:55 +08:00
aec214b4b0 [bug](ColumnDecimal)call set_decimalv2_type when cloning ColumnDecimal (#14061)
* call set_decimalv2_type when cloning ColumnDecimal

* clang format
2022-11-09 11:23:43 +08:00
572f491756 [fix](ctas) text column type len = 1 when create table as select (#13906)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-09 09:09:34 +08:00
291fa499e9 [fix](JSON) Fail to parse JSONPath (libc++) (#13941) 2022-11-09 08:58:01 +08:00
287c3893b9 [typo](docs)update array type doc #14057 2022-11-09 08:40:38 +08:00
6a1c7fac9d [enhancement](load) shrink reserved buffer for page builder (#14012) (#14014)
* [enhancement](load) shrink reserved buffer for page builder (#14012)

For table with hundreds of text type columns, flushing its memtable may cost huge memory.
These memory are consumed when initializing page builder, as it reserves 1MB for each column.
So memory consumption grows in proportion with column number. Shrinking the reservation may
reduce memory substantially in load process.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

* response to the review

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

* Update binary_plain_page.h

* Update binary_dict_page.cpp

* Update binary_plain_page.h

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-11-09 08:40:07 +08:00
a0f136a0bc [docs](odbc) fix docs for sqlserver odbc table (#14017)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-09 08:39:39 +08:00
cd8f0713ea [refactor](new-scan) remove old vectorized scan node (#14029) 2022-11-09 08:39:20 +08:00
75b6b267ea [opt](ssb) Add query hint for the SSB queries (#14089) 2022-11-09 08:37:31 +08:00
151842a1fe [feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430)
Introduce a SQL syntax for creating inverted index and related metadata changes.

```
-- create table with INVERTED index 

CREATE TABLE httplogs (
  ts datetime,
  clientip varchar(20),
  request string,
  status smallint,
  size int,
  INDEX idx_size (size) USING INVERTED,
  INDEX idx_status (status) USING INVERTED,
  INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none")
)
DUPLICATE KEY(ts)
DISTRIBUTED BY RANDOM BUCKETS 10

-- add an INVERTED index  to a table

CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english");
```
2022-11-08 23:46:53 +08:00
826cfdaf93 [feature](information_schema) add backends information_schema table (#13086) 2022-11-08 22:15:10 +08:00
Pxl
ae3c513d74 use extern template to date_time_add (#13970) 2022-11-08 22:11:41 +08:00
115c6bd411 [fix](keyranges) fix the split error of keyranges (#14049)
fix the split error of keyranges
2022-11-08 22:09:16 +08:00
3f3f2eb098 [Nereids][Improve] infer predicate after push down predicate (#12996)
This PR implements the function of predicate inference

For example:

``` sql
select * from student left join score on student.id = score.sid where score.sid > 1
```
transformed logical plan tree:

                    left join
             /                    \
       filter(sid >1)     filter(id > 1) <---- inferred predicate
         |                           |
      scan                      scan  

See `InferPredicatesTest`  for more cases

 The logic is as follows:
  1. poll up bottom predicate then infer additional predicates
    for example:
    select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id
    1. poll up bottom predicate
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1
    2. infer
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 and t2.id = 1
    finally transformed sql:
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t2.id = 1
  2. put these predicates into `otherJoinConjuncts` , these predicates are processed in the next
    round of predicate push-down


Now only support infer `ComparisonPredicate`.

TODO: We should determine whether `expression` satisfies the condition for replacement
             eg: Satisfy `expression` is non-deterministic
2022-11-08 21:36:17 +08:00
b6f91b6eff [improvement](profile) support ordinary user to get query profile via http api (#14016) 2022-11-08 20:39:01 +08:00
ecfdf0320d [fix](statistics) ColumnStatistics was changed unexpectedly when show stats (#14068)
The logic of show stats would change the internal collected ColumnStat unexpectedly which would cause inaccurate cost and inefficient plan
2022-11-08 20:26:37 +08:00