Commit Graph

7066 Commits

Author SHA1 Message Date
a0f136a0bc [docs](odbc) fix docs for sqlserver odbc table (#14017)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-09 08:39:39 +08:00
cd8f0713ea [refactor](new-scan) remove old vectorized scan node (#14029) 2022-11-09 08:39:20 +08:00
75b6b267ea [opt](ssb) Add query hint for the SSB queries (#14089) 2022-11-09 08:37:31 +08:00
151842a1fe [feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430)
Introduce a SQL syntax for creating inverted index and related metadata changes.

```
-- create table with INVERTED index 

CREATE TABLE httplogs (
  ts datetime,
  clientip varchar(20),
  request string,
  status smallint,
  size int,
  INDEX idx_size (size) USING INVERTED,
  INDEX idx_status (status) USING INVERTED,
  INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none")
)
DUPLICATE KEY(ts)
DISTRIBUTED BY RANDOM BUCKETS 10

-- add an INVERTED index  to a table

CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english");
```
2022-11-08 23:46:53 +08:00
826cfdaf93 [feature](information_schema) add backends information_schema table (#13086) 2022-11-08 22:15:10 +08:00
Pxl
ae3c513d74 use extern template to date_time_add (#13970) 2022-11-08 22:11:41 +08:00
115c6bd411 [fix](keyranges) fix the split error of keyranges (#14049)
fix the split error of keyranges
2022-11-08 22:09:16 +08:00
3f3f2eb098 [Nereids][Improve] infer predicate after push down predicate (#12996)
This PR implements the function of predicate inference

For example:

``` sql
select * from student left join score on student.id = score.sid where score.sid > 1
```
transformed logical plan tree:

                    left join
             /                    \
       filter(sid >1)     filter(id > 1) <---- inferred predicate
         |                           |
      scan                      scan  

See `InferPredicatesTest`  for more cases

 The logic is as follows:
  1. poll up bottom predicate then infer additional predicates
    for example:
    select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id
    1. poll up bottom predicate
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1
    2. infer
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 and t2.id = 1
    finally transformed sql:
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t2.id = 1
  2. put these predicates into `otherJoinConjuncts` , these predicates are processed in the next
    round of predicate push-down


Now only support infer `ComparisonPredicate`.

TODO: We should determine whether `expression` satisfies the condition for replacement
             eg: Satisfy `expression` is non-deterministic
2022-11-08 21:36:17 +08:00
b6f91b6eff [improvement](profile) support ordinary user to get query profile via http api (#14016) 2022-11-08 20:39:01 +08:00
ecfdf0320d [fix](statistics) ColumnStatistics was changed unexpectedly when show stats (#14068)
The logic of show stats would change the internal collected ColumnStat unexpectedly which would cause inaccurate cost and inefficient plan
2022-11-08 20:26:37 +08:00
a58ac48a6e [chore](bin) do not set heap limit for tcmalloc until doris does not allocates large unused memory (#13761)
We set heap limit for tcmalloc to avoid oom introduced by tcmalloc which allocates memory for cache even free memory of a machine is little. However, doris allocates large memory unused in some cases, so tcmalloc would throw an oom exception even ther are a lot free memory in a machine.

We can set the limit after we fix the problem again.
2022-11-08 19:26:30 +08:00
cdc635610b [enhancement](Nereids) tpch q21 anti and semi join reorder (#14037)
estimation of anti and semi join need re-work. we just let tpch q21 pass.
2022-11-08 17:21:50 +08:00
54c07f8782 [regression](Nereids) add back tpch regression test cases (#13826)
1. add back TPC-H regression test cases
2. fix decimal problem on aggregate function sum and agg introduced by #13764 
3. fix memo merge group NPE introduced by #13900
2022-11-08 16:40:46 +08:00
Pxl
df89e46761 [fix](build) fix compile fail on Segment::open (#14058) 2022-11-08 14:38:40 +08:00
f7ecb6d79f [Bug](Bitmap) fix sub_bitmap calculate wrong result to return null (#13978)
fix sub_bitmap calculate wrong result to return null
2022-11-08 14:10:12 +08:00
1c07a01038 [feature](multi-catalog) Support data on s3-compatible oss and support aliyun DLF (#13994)
Support Aliyun DLF
Support data on s3-compatible object storage, such as aliyun oss.
Refactor some interface of catalog, to make it more tidy.
Fix bug that the default text format field delimiter of hive should be \x01
Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.
2022-11-08 14:02:41 +08:00
61d4974ba1 [fix](Nereids) Use simple cost to calculate benefit and avoid unuseless calculation (#14056)
In GraphSimplifier, we can use simple cost to calculate the benefit.
And only when the best neighbor of the apply step is the processing edge, we need to update recursively.
2022-11-08 13:11:38 +08:00
c2a01e84b4 [feature-wip](multi-catalog) fix page index filter bug (#14015)
Fix page index filter not take effect when multiple columns
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-11-08 12:10:12 +08:00
63ea233ae2 [thirdpart](lib) Add lock free queue of concurrentqueue (#14045) 2022-11-08 11:34:23 +08:00
e6b12ce8e8 [feature](Nereids) support query that group by use alias generated in aggregate output (#14030)
support query having alias in group by list, such as:
SELECT c1 AS a, SUM(c2) FROM t GROUP BY a;
2022-11-08 11:02:42 +08:00
Pxl
9d8b4bc176 [Enhancement](Dictionary-codec) update dict once on same segment (#13936)
update dict once on same segment
2022-11-08 10:59:35 +08:00
b09e5ced97 [fix](priv) fix meta replay bug when upgrading from 1.1.x to 1.2.x (#14046) 2022-11-08 10:43:33 +08:00
0b945fe361 [enhancement](memtracker) Refactor mem tracker hierarchy (#13585)
mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc.

type includes

enum Type {
        GLOBAL = 0,        // Life cycle is the same as the process, e.g. Cache and default Orphan
        QUERY = 1,         // Count the memory consumption of all Query tasks.
        LOAD = 2,          // Count the memory consumption of all Load tasks.
        COMPACTION = 3,    // Count the memory consumption of all Base and Cumulative tasks.
        SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks.
        CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots.
        BATCHLOAD = 6,  // Count the memory consumption of all EngineBatchLoadTask.
        CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask.
    }
Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated.

other fix:

In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.
2022-11-08 09:52:33 +08:00
e1654bc6ef [Enhancement](function) add to_bitmap() function with int type (#13973)
to_bitmap function only support string param only,add to_bitmap() function with int type, this can avoid convert int type to string and then convert string to int
2022-11-08 09:15:26 +08:00
34f43ac781 [bug](like function)fix like '' (empty string) get wrong result with all rows #14035 2022-11-08 08:51:39 +08:00
6ed443c7e8 [enhancement](profile) add instanceNum, tableIds to profile. (#13985) 2022-11-08 08:49:16 +08:00
95591ce49a [refactor](cv)wait on condition variable more gently (#12620) 2022-11-08 08:40:31 +08:00
17a4746a08 [enhancement](Nereids) support otherJoinConjuncts in cascades join reorder (#13681) 2022-11-08 00:08:44 +08:00
d1cbaa1de8 [fix](load) fix a bug that reduce memory work on hard limit might be triggered twice (#13967)
When the load mem hard limit reached, all load channel should wait on the lock of LoadChannelMgr, util current reduce mem work finished. In current implementation, there's a bug might cause some threads be woke up before reduce mem work finished:

thread A found that soft limit reached, picked a load channel and waiting for reduce memory work finish.
The memory keep increasing
thread B found that hard limit reached (either the load mem hard limit, or process soft limit), it picked a load channel to reduce memory and set the variable _should_wait_flush to true
thread C found that _should_wait_flush is true, waiting on _wait_flush_cond
thread A finished it's reduce memory work, found that _should_wait_flush is true, set it to false, and notify all threads.
thread C is woke up and pick a load channel to do the reduce memory work, and now thread B's work is not finished.
We can see 2 threads doing reduce memory work when hard limit reached, it's quite confusing.
2022-11-08 00:07:52 +08:00
241801ca17 [typo](doc) fix get_start doc (#14001) 2022-11-07 21:28:45 +08:00
1c2532b9dc [Bug](udf) Make UDF's type always nullable (#14002) 2022-11-07 20:51:31 +08:00
4ea1b39cb2 [enhancement](Nereids) remove unnecessary decimal cast (#13745) 2022-11-07 19:24:10 +08:00
f2978fb6ff [feat](Nereids) add graph simplifier (#14007) 2022-11-07 18:45:45 +08:00
22b4c6af20 [feature](Nereids) support statement having aggregate function in order by list (#13976)
1. add a feature that support statement having aggregate function in order by list. such as:
    SELECT COUNT(*) FROM t GROUP BY c1 ORDER BY COUNT(*) DESC;
2. add clickbench analyze unit tests
2022-11-07 17:01:31 +08:00
0031304015 [typo](docs)fix config doc #14010 2022-11-07 17:00:16 +08:00
bb9182d602 [fix](repeat)remove unmaterialized expr from repeat node (#13953) 2022-11-07 14:13:05 +08:00
7254999f02 [typo](docs) fix docs,delete redundant words #13849 2022-11-07 13:51:10 +08:00
3c8524b9d8 [security](fe jar) upgrade commons-codec:commons-codec to 1.13 #13951 2022-11-07 13:50:07 +08:00
32fea672b0 [chore](gutil) remove some gutil macros and solve some macro conflict with brpc (#13954)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-11-07 13:39:52 +08:00
e8d2fb6778 [feature](function)add search functions: multi_search_all_positions & multi_match_any (#13763)
Co-authored-by: yiliang qiu <yiliang.qiu@qq.com>
2022-11-07 11:50:55 +08:00
7ffe88b579 [feature-array](array-type) Add array function array_popback (#13641)
Remove the last element from array.

```
mysql> select array_popback(['test', NULL, 'value']);
+-----------------------------------------------------+
| array_popback(ARRAY('test', NULL, 'value')) |
+-----------------------------------------------------+
| [test, NULL]                                        |
+-----------------------------------------------------+
```
2022-11-07 10:48:16 +08:00
c7b2b90504 [fix](memtracker) Fix DCHECK !std::count(_consumer_tracker_stack.begin(), _consumer_tracker_stack.end(), tracker) 2022-11-06 16:41:03 +08:00
27549564a7 [feature](table-valued-function) Support S3 tvf (#13959)
This pr does three things:

1. Modified the framework of table-valued-function(tvf).
2. be support `fetch_table_schema` rpc.
3. Implemented `S3(path, AK, SK, format)` table-valued-function.
2022-11-06 11:04:26 +08:00
fb5a3e118a [feature-wip](dlf) prepare to support aliyun dlf (#13969)
[What is DLF](https://www.alibabacloud.com/product/datalake-formation)

This PR is a preparation for support DLF, with some changes of multi catalog

1. Add RuntimeException for most of hive meta store or es client visit operation.
2. Add DLF related dependencies.
3. Move the checks of es catalog properties to the analysis phase of creating es catalog

TODO(in next PR):

1. Refactor the `getSplit` method to support not only hdfs, but s3-compatible object storage.
2. Finish the implementation of supporting DLF
2022-11-06 10:01:57 +08:00
f29e43fee9 [fix](storage) rm unacessary check (#13986) (#13988)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-11-05 23:46:30 +08:00
1724faf9a5 [test](java-udf)add java udf RegressionTest about the currently supported data types #13972 2022-11-05 19:25:58 +08:00
d01f7c546a [refactor](iceberg-hudi) disable iceberg and hudi table by default (#13932) 2022-11-05 19:22:27 +08:00
wxy
620a137bd7 [enhancement](test) support tablet repair and balance process in ut (#13940) 2022-11-05 19:20:23 +08:00
380395a61f [doc](routineload)Common mistakes in adding routine load #13975 2022-11-05 19:17:33 +08:00
087488db3b [typo](doc) fixed spelling errors (#13974) 2022-11-05 15:40:55 +08:00