Commit Graph

7257 Commits

Author SHA1 Message Date
034aa20b0a [fix](regression)when using regression-conf-custom.groovy, properties in regression-conf.groovy are missing #14458 2022-11-22 08:44:50 +08:00
ca486cdfbc [Enhancement](storage) optimize segment compaction log (#14448) (#14449)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-11-22 08:43:51 +08:00
74f694753b Fix the en docs of benchmark (#14459) 2022-11-22 08:40:51 +08:00
e3d764aac5 [test](jdbc) add new jdbc case in other source (#14443) 2022-11-21 21:33:06 +08:00
7624c80d83 [Feature](Kafka) Add kerberos support for kafka (#14431)
Compile librdkafka with Kerberos SASL GSSAPI support.
2022-11-21 20:45:50 +08:00
730cd1a0c1 [Feature](Nereids) Simplify range of predicate (#14113)
Simplify range of predicate

for example:
1. `a > 1 or a > 2` => `a > 1`
2. `a in (1,2,3) or a (3,4,5)` => `a in (1,2,3,4,5)`
2022-11-21 20:24:03 +08:00
91bd76a902 [enhancement](FE) use forEach() to replace stream().forEach() (#14039) 2022-11-21 15:40:43 +08:00
a91fe11b4d [feature](Nereids) Add random test framework (#14388) 2022-11-21 15:16:03 +08:00
b36f3d7e61 [typo](docs) fix typo in schema-change.md (#14311) 2022-11-21 13:38:47 +08:00
Pxl
bcd641877f [Enhancement](scan) disable build key range and filters when push down agg work (#14248)
disable build key range and filters when push down agg work
2022-11-21 12:47:57 +08:00
ff197b0fa5 [chore](macOS) Fix linker errors (#14410) 2022-11-21 10:38:36 +08:00
ce489cf723 [Feature](JDBC)support clickhouse jdbc external table (#14244) 2022-11-21 10:33:53 +08:00
41dae8b6bb [improvement](load) add a log when close OlapTableSink with error (#14257) 2022-11-21 10:33:37 +08:00
a9a6fdd8c3 [fix](insert) fix insert into table which contains column name prefix mv_ (#14361) 2022-11-21 10:31:01 +08:00
0613ccda74 [feature](tools)profile viewer (#14429)
It is a painful work to read profile, especially there are multi-parallel instances.
This tool helps us to grasp the main information of profile in a graphical view.

The profile is represented by a tree.
Sql operation nodes contains operation type(join, scan...), its node id, its fragment id. The number on the arrow edge means how many rows output by child node. This tool will sum the output rows of the same node in multi-parallel instances, that is if there are 4 parallel instance, and each ScanNode on lineitem table output 10 rows, the label on the arrow beginning with ScanNode(lineitem) is 40.

Here is a demo for tpch Q2
tpch q2 profile viewer

Issue Number: close #xxx
2022-11-21 10:29:54 +08:00
4976021bf7 [Enhancement] Doris broker support aliyun-oss #13665 (#14305) 2022-11-21 10:29:14 +08:00
Pxl
c18a471303 [Optimize](predicate) update inplace on VcompoundPred (#14402)
select count(*) from lineorder where lo_orderkey<100000000 OR lo_orderkey>100000000 AND lo_orderkey<200000000 OR lo_orderkey >200000000;

0.6s -> 0.5s
2022-11-21 09:12:30 +08:00
3f29e3bff6 [bug](test) fix regression test of jdbc postgresql table core (#14417) 2022-11-20 23:03:14 +08:00
98cea90950 [typo](docs)benchmark doc fix number (#14427) 2022-11-20 22:51:42 +08:00
c29975d347 [Docs](function) Add some function do not in sidebars (#14426) 2022-11-20 22:50:52 +08:00
71e80e8957 [typo](docs)Performance test documentation update (#14147)
* Performance test documentation update
2022-11-20 09:40:57 +08:00
2ccb5209a0 (improvement)[doc] add document version tag instruction (#14406) 2022-11-20 00:05:53 +08:00
3489f4826c [fix](test) sync conf used in pipeline and in repository (#14414) 2022-11-20 00:05:08 +08:00
3e1e8db173 [fix](exec) fix thread token shutdown (#14418)
Fix Thread pool token was shut down error.
This is because when there are more than 1 fragment of a query on one BE, the thread token maybe
reset incorrectly, causing thread token shutdown earlier.
cherry-pick from master
Introduced from #13021
2022-11-20 00:04:48 +08:00
5dfe5ef965 [test](hive catalog)add hive catalog test case (#14217) 2022-11-19 17:26:18 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
1482ab32b6 [tools](tpch)fix invalid download url (#14329) 2022-11-19 13:29:33 +08:00
1f2c06dd6e [enhancement](rewrite) Remove unused wide common factors to improve scan performance in ExtractCommonFactorsRule (#14381)
* [enhancemeng](sql) Remove unused wide common factors to improve scan performance in ExtractCommonFactorsRule

* fix regression test

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-11-19 13:23:49 +08:00
f5f2e84e31 [refactor](planner) remove the limit return rows of order by (#12478)
Originally, Order By Limit returned a maximum of 65535 rows of data by default during the query,
but now many businesses do not apply this limit.
It is necessary to add larger data after the query statement to complete the full data query,
which is extremely inconvenient, so adjustments have been made.

At the same time, I added the variable DEFAULT_ORDER_BY_LIMIT to the SessionVariable,
the default value is -1, if the user does not use the LIMIT keyword or the LIMIT value is a negative integer,
the default query return value is Long.MAX_VALUE. If the corresponding maximum query value is set,
the number of data items is returned according to the maximum query value or the value followed by the
LIMIT keyword.
2022-11-19 12:45:44 +08:00
1b6e872a8a [improvement](common) table name length exceeds limit error message (#14368)
For the table name check, the regular match error and the length exceeds the limit, both of which display the message "Incorrect table name 'xxx'. Table name regex is 'xxx'".
Obviously, the message cannot clearly point out what kind of error it is.
So it is a better way to separate the two error messages.
2022-11-19 11:36:08 +08:00
512b787559 [fix](parquet-reader) fix stack-use-after-return error (#14411) 2022-11-19 10:52:50 +08:00
b4aef889f2 [feature-array](array-function) add array constructor function array() (#14250)
* [feature-array](array-function) add array constructor function `array()`

```
mysql>  select array(qid, creationDate) from nested_c_2  limit 10;
+------------------------------+
| array(`qid`, `creationDate`) |
+------------------------------+
| [1000038, 20090616074056]    |
| [1000069, 20090616075005]    |
| [1000130, 20090616080918]    |
| [1000145, 20090616081545]    |
+------------------------------+
10 rows in set (0.01 sec)
```
2022-11-19 10:49:50 +08:00
02372ca2ea [test](jdbc external table) add new jdbc mysql external table (#14323) 2022-11-19 09:46:48 +08:00
eb76160b48 [chore](third-party) Use GNU official mirror to boost the download speed (#14358)
According to the description in https://www.gnu.org/server/mirror.html, using the address http://ftpmirror.gnu.org/ to download GNU packages is recommended. It can boost the download speed worldwide.
2022-11-19 00:04:52 +08:00
63a2344e68 [Enhancement](Nereids) Refactor AggregateFunction and support explain plan (#14380)
# Proposed changes

- Refactor AggregateFunction
    1. AggregateFunction implement ComputeSignature
    3. Add a CustomSignature to dynamic compute signature, we can check input type and compute implicit cast type in the `customSignature` method
    2. Add PartialAggType to record some type information before disassemble aggregate
    4. Refine and create a custom catalog function when translate AggregateFunction, without `finalizeForNereids`
-  Support explain plan
    1. explain parsed plan select ...
    5. explain analyzed plan select ...
    6. explain rewritten/logical plan select ...
    7. explain optimized/physical plan select ...
    8. explain all plan select ...
2022-11-18 23:40:33 +08:00
c4bade71c8 [refactor](nereids) remove ColumnStatistics.UNKNOWN from StatsDerive (#14343)
ColumnStatistics.UNKNOWN can be replaced by ColumnStatistics.DEFAULT
2022-11-18 23:40:00 +08:00
a82896f420 [fix](broker-load) fix that broker load don not set be exec version and limit node channel memory (#14399) 2022-11-18 23:38:37 +08:00
21416f9947 [enhancement](memory) Support Jemalloc metrics and default allocator changed to Jemalloc (#14384) 2022-11-18 21:02:54 +08:00
68da6bccb7 [fix](type) fix DECIMAL scale when cast function on fe (#12877)
before:
MySQL [test]> select cast('135.759999999' as DECIMAL(10,3));
+----------------------------------------+
| CAST('135.759999999' AS DECIMAL(10,3)) |
+----------------------------------------+
| 135.759999999 |
+----------------------------------------+
1 row in set (0.00 sec)

now:
MySQL [stage]> select cast('135.759999999' as DECIMAL(10,3));
+----------------------------------------+
| CAST('135.759999999' AS DECIMAL(10,3)) |
+----------------------------------------+
| 135.759 |
+----------------------------------------+
1 row in set (0.01 sec)
2022-11-18 19:36:14 +08:00
eab0af7afe [optimization](array-type) optimize the export precision of floating point numbers (#14261)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-11-18 18:24:11 +08:00
bd5882d08a [fix](datax)doris writer write error (#14276)
* doris writer write error
2022-11-18 18:20:13 +08:00
Pxl
734525de86 [Bug](runtime filter) fix minmax filter not copy rightly on shared hash join (#14367)
fix minmax filter not copy rightly on shared hash join
2022-11-18 17:52:45 +08:00
2c4236fd24 [improvement](ctas) use string type for varchar/char/string (#14382)
When executing create table as select stmt,
the varchar/char/string type of column in created table will be unified to string type.

Because when select from external table (mysql/pg, etc), the length of varchar in external database
is calculated by "char" length, not "byte" length.
So if there is a column with varchar(10) in external table, then there will be a same varchar(10)
in created table. But the byte length of data in external table may be larger than 10, causing failure of CTAS.

Change to string will not impact performance of the capacity of disk storage.
And notice that if a string type column is the first column, it will be changed to varchar(65535),
because we do not allow string type column as sort key column.
2022-11-18 14:20:13 +08:00
a1d02f36ac [feature](table-valued-function) support hdfs() tvf (#14213)
This pr does two things:
1. support `hdfs()` table valued function.
2. add regression test
2022-11-18 14:17:02 +08:00
1f326fc0d6 [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node (#14321)
* [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node

* use only one chunk memory when serializing keys in agg node
2022-11-18 12:31:52 +08:00
7952bce03f [compatibility](Nereids) process escape in string literal (#14294) 2022-11-18 11:24:00 +08:00
9e25aa8d3e [feature](Nereids): Add subgraph enumerator #14291
Add subgraph enumerator to find the best plan

For DPHyp, we need an enumerator for all csg-cmp pairs to find the best plan
2022-11-18 10:33:30 +08:00
2b6f85ab96 [chore](macOS) Fix BE UT (#14307)
#13195 left some unresolved issues. One of them is that some BE unit tests fail.
This PR fixes this issue. Now, we can run the command ./run-be-ut.sh --run successfully on macOS.
2022-11-18 10:13:38 +08:00
da0b09caea [fix](Nereids) DateTimeType migrate to DateType is wrong when hour, minute and second all zero (#14327)
1. fix DateTimeType migrate to DateType is wrong when hour, minute and second all zero
2. add TPC-H regression test with DATEV2 type
2022-11-18 01:38:03 +08:00
bd5a593403 [enhancement](memtracker) Use proc/meminfo MemAvailable to control memory and optimize MemTracker log printing (#14335) 2022-11-17 22:46:07 +08:00