Commit Graph

8276 Commits

Author SHA1 Message Date
0637c339b1 [fix](array-type) support to insert the largeint in array (#11868)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-18 14:41:07 +08:00
b300b4faa0 [enhancement](memtracker) Optimize readability of mem exceed limit error message #11877 2022-08-18 14:39:41 +08:00
4c3f72d019 [improvement](meta) sort result by tablename when show table status like 'show data' (#11885) 2022-08-18 14:23:45 +08:00
d505d1a5ae [Vectorized](compaction) filter delete data in base compaction (#11721)
* [Vectorized](compaction) filter delete data in base compaction


Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-08-18 14:22:59 +08:00
3eeaa8e65b [typo](fix) Fix spelling errors in comments (#11810) 2022-08-18 13:55:41 +08:00
0903dd61f3 [Enhancement](Planner) Improve error message when columns order and keys orders don't match. (#11724)
When creating table like this:
```
CREATE TABLE `test`.`test_key_order` (
  `k1` tinyint(4) NULL COMMENT "",
  `k2` smallint(6) NULL COMMENT "",
  `k3` int(11) NULL COMMENT "",
  `v1` double MAX NULL COMMENT "",
  `v2` float SUM NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`k1`, `k3`, `k2`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 5
PROPERTIES (
"replication_num" = "1"
);
```

The error message before is:
```
Key columns should be a ordered prefix of the schema.
```

With this PR, the error message is:
```
Key columns should be a ordered prefix of the schema. KeyColumns[1] (starts from zero) is k3, but corresponding column is k2 in the previous columns declaration.
```
2022-08-18 13:28:54 +08:00
ca77824857 [typo](doc)Add the actual hive bitmap udf documentation (#11883)
add miss hive bitmap udf
2022-08-18 12:20:24 +08:00
e1a1a04c2f [Enhancement](Doe) Be query es use fe generate dsl. (#11840) 2022-08-18 10:31:17 +08:00
cfb90b39c7 (vec-stream-load-json) simdjson throw execption lead to core dump (#11880)
when config::enable_simdjson_parser=true in vec streamload, may lead to core dump when json input invalid format string like '{ "a', or all the fields is null like '{}', this may lead to simdjson lib throw some unhandled expection like `Objects and arrays can only be iterated when they are first encountered`.We should take care of these cases

Signed-off-by: eldenmoon <15605149486@163.com>
2022-08-18 10:27:34 +08:00
6c66bdbf30 [fix](orderby)remove useless null literal in order by (#11821) 2022-08-18 10:10:25 +08:00
881670566c [fix]Fix the coredump when an IOError occurs in be (#11857) 2022-08-18 09:13:41 +08:00
8b10a1a3f7 [enhancement](VSlotRef) enhance column_id check in execute function during runtime (#11862)
The column id check in VSlotRef::execute function before is too strict for fuzzy test to continuously produce random query. Temporarily loosen the check logic.
Moreover, there exists some careless call to VExpr::get_const_col, it might return a nullptr but not every function call checks if it's valid. It's an underlying problem.
2022-08-18 09:12:26 +08:00
582be130dd [Feature] (ODBC) support read/write emoji of utf16 via odbc table (#11863)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-08-18 09:09:02 +08:00
ff1971f916 [improvement](test) add dryRun option and group all cases into either p0 or p1 (#11576)
1. add dryRun option to list tests
2. group all cases into p0 p1 p2
2022-08-17 22:45:53 +08:00
4cdf9f2a23 [Enhancement](Nereids) Refine nereids parser. (#11839)
1. Use ParseException in nereids parser.
2. Add check utils in the parser test.
3. Distinguish matchesFromRoot and matches when checking plans.
2022-08-17 20:17:26 +08:00
11dc5cad83 [feature-wip](unique-key-merge-on-write) add min/max key in segment (#11830)
some feature:
1. add min max key in segment footer to speed up get_row_ranges_by_keys
2. do not load pk bloom filter in query

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-17 18:11:39 +08:00
000253b6aa [doc] Fix typos (#11852)
fix a typo in get-starting doc
2022-08-17 17:52:56 +08:00
50ef6e35be [enhancement](RowDescriptor) enhance tuple_idx check during runtime (#11835) 2022-08-17 17:50:48 +08:00
4a4d3b273d fix reorder error (#11854)
join reorder throw unexpected exception when join type is not cross and inner
2022-08-17 17:26:24 +08:00
dc4eb1e155 [docs](typo) fix some typo in bitmap docs (#11850)
fix some typo in bitmap docs
2022-08-17 16:58:55 +08:00
790a1d681f [Bug](external iceberg table)Fix iceberg on ha-hdfs unknown hostname bug. #11844 2022-08-17 16:21:30 +08:00
98243e99ae [feature-wip](unique-key-merge-on-write) unique key table with MOW supports delete sign column (#11672) 2022-08-17 15:12:11 +08:00
7df8c6f493 [vectorized](improvement) improve agg function of bitmap_union with f… (#11822)
* [vectorized](improvement) improve agg function of bitmap_union with fastuinon
2022-08-17 14:13:01 +08:00
18b84b2dfe [Bug](compile) fix compiling problem (#11851)
fix compiling problem
2022-08-17 13:44:57 +08:00
b7e22f72c9 fix-doc (#11756)
Document typo update
2022-08-17 11:49:48 +08:00
4d00271bd2 [docs] Change JDBC error port (#11809)
Change JDBC error port
2022-08-17 11:48:33 +08:00
5bd7ec0d29 [doc](flink-connector) update flink connector 1.15 support (#11824)
update flink connector 1.15 support
2022-08-17 11:48:02 +08:00
ba3e0b3f96 [feature](compaction) allow to set disable_auto_compaction for tables (#11743) 2022-08-17 11:05:47 +08:00
12c4d1f4dd [feature-wip](unique-key-merge-on-write) unique key table with MOW supports sequence column (#11808) 2022-08-17 10:56:14 +08:00
c3e6a841c1 [feature-wip](unique-key-merge-on-write) fix that sort segments by segment id in descending order (#11811) 2022-08-17 10:54:30 +08:00
3a49156e30 [performance] (vectorization)optimize In Expr (#11826)
Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-08-17 10:46:37 +08:00
c715209a7e [refactor](dpp) remove original dpp writer (#11838)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-17 10:42:29 +08:00
3e13b7d2c2 [Bugfix](light-shema-change) fix _finish_clone dead lock (#11823)
In engine_clone_task.cpp, it use tablet->tablet_schema() to create rowset, but in the method, it need a lock that already locked in engine_clone_task.cpp:514. It use cloned_tablet_meta->tablet_schema() originally, but modified in #11131. It need to revert to use cloned_tablet_meta->tablet_schema().
2022-08-17 09:10:08 +08:00
a07e153419 [Feature](nereids)support view and nested view (#11589)
support view in query
and add a rewrite rule: merge consecutive projects.
the rule can merge relative consecutive projects to one project to improve efficiency
2022-08-16 19:24:01 +08:00
fadc78c6cf [fix](str_to_date) str_to_date support format without leading zero (#11817)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-16 18:23:16 +08:00
30175010c7 Fix nullptr in perform_remote_tablet_gc (#11820) 2022-08-16 16:50:21 +08:00
f39f57636b [feature-wip](parquet-reader) update column read model and add page index (#11601) 2022-08-16 15:04:07 +08:00
01383c3217 [Enhancement](stream-load-json) using simdjson to parse json (#11665)
Currently we use rapidjson to parse json document, It's fast but not fast enough compare to simdjson.And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by sprintf, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in _set_column_value we could iterate through the json document by for (auto field: object_val) {xxx}, this is much faster than looking up a field by it's field name like objectValue.FindMember("k1").The third optimization is the at_pointer interface simdjson provided, this could directly get the json field from original document.
2022-08-16 14:49:50 +08:00
f2292a3b1d [Enhancement](array-type) enable_array_type flag update (#11785)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-16 14:41:57 +08:00
fecfdd78bf [enhancement](status) Fix Status related macros to enable RVO or move ctor (#11753) 2022-08-16 14:40:35 +08:00
3e10be4ba7 [remote-udaf](sample) add some java demo (#11752) 2022-08-16 14:37:34 +08:00
cf45151a66 [Fix](audit) do not use duplicate query id in fe.audit.log (#11746)
The original logic in ConnectProcessor.java might result in duplicate query id for different query statement in fe.audit.log as follows.
2022-08-16 14:37:16 +08:00
c124470408 [enhancement](memory) Fix too much cache leads to less memory available for queries (#11751)
Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache.

For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock.

Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.
2022-08-16 14:35:57 +08:00
1e61fdf242 [remote-udaf](sample) add some c++ demo (#11761) 2022-08-16 14:32:04 +08:00
4be6e70f1c [fix](query) fix orderby keys limit return less or no result (#11757)
The bug is caused by use _num_rows_read for limit check. _num_rows_read is count of rows read from storage, but may be filtered by filter_block for WHERE predicate.

Add a _num_rows_return, which is rows after filter_block for WHERE predicate, for count for really returned rows.
2022-08-16 14:31:47 +08:00
7d836cf0c7 [fix](memtracker) Fix flush memtable to reduce load channel mem not executed (#11771)
The memory value automatically tracked by the tcmalloc hook in the DeltaWriter is smaller than the value recorded manually in the memtable, because the first 4096-byte Chunk requested by each MemPool when the memtable is initialized is not tracked to the DeltaWriter by the hook.

The values ​​of the two are not equal, causing the mem_consumption() == _mem_table->memory_usage branch judgment to fail.
2022-08-16 14:30:45 +08:00
340ee6af6a (fix)[regression-test] add qt_having1 (#11800) 2022-08-16 14:29:37 +08:00
dc18def456 [regression-test](bitmap) Add regression case for some bitmap funcations (#11783)
Co-authored-by: smallhibiscus <844981280>
2022-08-16 14:23:59 +08:00
2a1803c646 [enhancement](memtracker) Optimize query memory accuracy (#11740)
Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory.

At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.
2022-08-16 14:23:28 +08:00
573588693c [bugfix](load) get max versio in read lock (#11806)
Introduced by #11195。 Get max version from tablet meta should in read lock in multi-thread load。
2022-08-16 12:25:29 +08:00