Commit Graph

5732 Commits

Author SHA1 Message Date
b3d476eebb [fix](ui)source map files not included in production builds (#11612)
Co-authored-by: wangyf0555 <wangyongfeng@flywheels.com>
2022-08-10 08:19:07 +08:00
ae90d45594 [Bug](show data skew)fix show data skew logic (#11616)
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-08-10 08:18:39 +08:00
aaaf6915e4 [feature-wip](unique-key-merge-on-write) fix rowid conversion ut that may create a directory under an incorrect path (#11628) 2022-08-10 08:17:47 +08:00
601f28dd90 [fix](regexpr)regexpr functions' contexts should be THREAD_LOCAL (#11595) 2022-08-10 06:58:24 +08:00
01e4522612 [fix]collect_list/collect_set without GROUP BY for NOT NULL column (#11529)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-09 20:49:37 +08:00
df47b6941d [feature-wip](array-type) support the array type in reverse function (#11213)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-09 20:49:09 +08:00
169996d8e4 [feature](information_schema) add rowsets table into information_s… (#11266)
* [feature](information_schema) add 'segments' table into information_schema
2022-08-09 18:15:54 +08:00
7b67661262 add plan checker (#11619)
This PR proposes to add a plan checker to facilitate plan checking in unit tests.

Usage of plan checker is like below:
```java
new PlanChecker()
  .plan(myPlan)
  .applyBottomUp(myRule)
  .matches(expectedPattern);
```
2022-08-09 17:19:30 +08:00
HB
583b44dfa8 [enhancement](broker) Improve the availability of broker load (#10699) 2022-08-09 17:00:48 +08:00
cc6c92935a [minor](log) add a warn log to observer invalid query profile (#11588)
I try to fix the bug in #10095. the error occurred when I first create a empty table and query it.
But I can't reproduced it again.
So I add a warn log here to observer
2022-08-09 14:10:03 +08:00
2cadf85988 [improvement](alter) modify table's default replica if table is unpartitioned (#11550)
Before, if a table is unpartitioned, when executing following alter stmt:
```
alter table tbl1 set ("replication_num" = "1");
```
Only the tbl1 partition's replication_num is changed
(for unpartitioned table, it also has a single partition with same name as table's)
But the table's default replication_num is unchanged.
So when executing `show create table tbl1`, you will find that the replication_num is still the origin value.

This CL mainly changes:
1. For unpartitioned table, if user change it's replication num, both table's and partition's replication_num will be changed.
2022-08-09 14:09:38 +08:00
85e67b04e2 fix-doc3 (#11587)
bloomFilter   fix-doc
2022-08-09 13:35:32 +08:00
fcf767b2e4 [fix](doc)Modify the installation document, the description of disk space limit (#11609)
Modify the installation document, the description of disk space limit
2022-08-09 13:33:32 +08:00
a4f9628576 [improvement](datax) improvement json import and support csv writing
1.At present, read_json_by_line and fuzzy_parse are used for json format writing, and the performance of streamload writing will decrease. It is modified to strip_outer_array and fuzzy_parse writing, and the speed is increased by about 3 times.

2.Add csv writing, the column separator is set to \x01, and the row separator is set to \x02, the performance is about 5 times higher than before
2022-08-09 11:50:24 +08:00
436ee0dd1d [feature-wip](statistics) step4.1: manually inject statistics for a table or column (#11030)
This pr mainly to supplement the syntax of the previous pr(#8861),
it supports users to manually inject statistics, including table, partition, and column statistics. 

table/partition stats type:
- row_count
- data_size

column stats type:
- ndv
- avg_size
- max_size
- num_nulls
- min_value
- max_value

Modify table or partition statistics:
```
ALTER TABLE table_name 
SET STATS ('k1' = 'v1', ...) [PARTITIONS(p_name1, p_name2...)]
```

Modify column statistics:
```
ALTER TABLE table_name MODIFY COLUMN columnName 
SET STATS ('k1' = 'v1', ...) [PARTITIONS(p_name1, p_name2...)]
```

Some notes:
- Only support statistics injected into olap type tables.
- Statistics injected into temporary partitions are not supported.
- When injecting statistics, if it is a partitioned table, users need to specify a partition name.
- If multiple partitions are specified, the same stats will be injected on multiple partitions.
- The current code also has mock statistics @zhengshij
2022-08-09 11:24:23 +08:00
970a35d658 [fix](docs) Fix some errors related to privilege and grant in the docs (#11377)
Fix some errors related to privilege and grant in the docs
2022-08-09 11:02:47 +08:00
2b918eaccd [fix](Doris On ES) Fix es not support aliases error (#11547)
1. Fix es not support aliases error
2. Fix multicatalog query es error
3. add ut
2022-08-09 09:36:05 +08:00
f9b151744d optimize topn query if order by columns is prefix of sort keys of table (#10694)
* [feature](planner): push limit to olapscan when meet sort.

* if olap_scan_node's sort_info is set, push sort_limit, read_orderby_key
and read_orderby_key_reverse for olap scanner

* There is a common query pattern to find latest time serials data.
 eg. SELECT * from t_log WHERE t>t1 AND t<t2 ORDER BY t DESC LIMIT 100

If the ORDER BY columns is the prefix of the sort key of table, it can
be greatly optimized to read much fewer data instead of read all data
between t1 and t2.

By leveraging the same order of ORDER BY columns and sort key of table,
just read the LIMIT N rows for each related segment and merge N rows.

1. set read_orderby_key to true for read_params and _reader_context
   if olap_scan_node's sort info is set.
2. set read_orderby_key_reverse to true for read_params and _reader_context
   if is_asc_order is false.
3. rowset reader force merge read segments if read_orderby_key is true.
4. block reader and tablet reader force merge read rowsets if read_orderby_key is true.

5. for ORDER BY DESC, read and compare in reverse order
5.1 segment iterator read backward using a new BackwardBitmapRangeIterator and
    reverse the result block before return to caller.
5.2 VCollectIterator::LevelIteratorComparator, VMergeIteratorContext return
    opposite result for _is_reverse order in its compare function.

Co-authored-by: jackwener <jakevingoo@gmail.com>
2022-08-09 09:08:44 +08:00
b44c47fc10 [fix] (remote storage) fix bug for storage policy (#11597) 2022-08-09 09:05:48 +08:00
b9f7f63c81 [Fix](planner) Fix wrong planner with count(*) optmizer for cross join optimization (#11569) 2022-08-09 09:01:25 +08:00
7c950c7cd5 [feature](Nereids) support cross join in Nereids (#11502)
support cross join in Nereids

1. add PhysicalNestedLoopJoin
2. Translate PhysicalNestedLoopJoin to CrossJoinNode in PhysicalPlanTranslator
2022-08-08 22:14:27 +08:00
1701ffa7c0 [fix](planner)push constant expr in predicate to outer join's other conjuncts by mistake (#11527)
constant expr in predicate should not be pushed to outer join's other conjuncts
2022-08-08 20:56:08 +08:00
4f60b37402 [feature](Nereids):refactor and add outer join LAsscom. (#11531)
refactor and add outer join LAsscom.
Extract the common function to LAsscomHelper.
2022-08-08 20:08:12 +08:00
Fy
647b6e843a [feature](nereids)add InPredicate in expressions (#11264)
1. Add InPredicate expression parser and translator
2. Add regression-test for In predicate (in nereids_syntax)
3. Support NOT EqualTo and NOT InPredicate in ExpressionTranslator#visitNot()
2022-08-08 19:59:54 +08:00
25a6be850d [doc](fix)Export doc fix (#11584)
export doc fix
2022-08-08 19:27:57 +08:00
ed7f7dead9 [Refactor](push-down predicate) Derive push-down predicate from vconjuncts (#11468)
* [Refactor](push-down predicate) Derive push-down predicate from vconjuncts
2022-08-08 19:19:26 +08:00
0a5fd99d02 [feature-wip](unique-key-merge-on-write) speed up publish_txn (#11557)
In our origin design, we calc delete bitmap in publish txn, and this operation
will cost too much time as it will load segment data and lookup row key in pre
rowset and segments.And publish version task should run in order, so it'll lead
to timeout in publish_txn.

In this pr, we seperate delete_bitmap calculation to tow part, one of it will be
done in flush mem table, so this work can run parallel. And we calc final
delete_bitmap in publish_txn, get a rowset_id set that should be included and
remove rowsets that has been compacted, the rowset difference between memtable_flush
and publish_txn is really small so publish_txn become very fast.In our test,
publish_txn cost about 10ms.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-08 18:57:55 +08:00
c1c635e944 [Refactor](Nereids) Fix expression constant and improve SlotExtractor (#11513)
1. Fix expression constant and add unit test.
2. Improve logic in SlotExtractor and remove useless class IterationVisitor.
2022-08-08 17:36:21 +08:00
9349746987 [Fix](stream-load-json) fix VJsonReader::_write_data_to_column invalid column type cast when meet null (#11564)
column_ptr will be a none nullable column pointer after `column_ptr = &nullable_column->get_nested_column()`
so we should not cast column_ptr to ColumnNullable any more
2022-08-08 15:57:39 +08:00
6c065d3d59 [script](start_fe) support "--version" to show fe build info (#11563) 2022-08-08 15:55:01 +08:00
87f56914e9 [Improvement](debug message) add necessary info to DCHECK message (#11586) 2022-08-08 15:54:09 +08:00
411254c128 [Enhancement](hdfs) Support loading hdfs config from hdfs-site.xml (#11571) 2022-08-08 14:18:28 +08:00
37d1180cca [feature-wip](parquet-reader)decode parquet data (#11536) 2022-08-08 12:44:06 +08:00
Pxl
2cd3bf80dc [bugfix](schema change)fix core dump on vectorized_alter_table (#11538) 2022-08-08 10:45:28 +08:00
b93860902f [doc](tablet-health) modify content about tablet state (#11086)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-08-08 10:43:13 +08:00
1e6a3610a7 [feature-wip](unique-key-merge-on-write) optimize rowid conversion and add ut (#11541) 2022-08-08 10:41:44 +08:00
e8a344b683 [feature-wip](parquet-reader) add predicate filter and column reader (#11488) 2022-08-08 10:21:24 +08:00
40b50400b2 [fix](doc) remove docs for direct compiling on Centos (#11575)
I tried to compile doris on Centos directly according to docs, however
it does not work. It is very difficult to find tools needed by doris
compilation on Centos.
2022-08-08 09:56:47 +08:00
6ea3465264 [improvement](doc)Description of bitmap type query result is null (#11506)
Description of bitmap type query result is null
2022-08-08 09:51:45 +08:00
4f5db35990 [fix](date) fix the value may be changed during the parsing of date and datetime types (#11573)
* [fix](date) fix the value may be changed during the parsing of date and datetime types
2022-08-08 08:58:30 +08:00
8802a41918 fix profile may cause query slow (#11386)
Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-08-07 20:52:52 +08:00
8b9d299472 [improvement](thirdparty) Build re2 with release mode (#11578) 2022-08-07 20:50:07 +08:00
7deebf7086 [doc](asf) update .asf.ymal to stop sending notification to dev@doris (#11574) 2022-08-07 20:31:24 +08:00
bd4048f8fb [enhancement](compaction) add idle schedule and max_size limit for base compaction (#11542)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-07 16:21:57 +08:00
ee4d9d4347 [improvement](test) group some cases and group a case to p0 if it is not grouped (#11548) 2022-08-06 15:12:08 +08:00
683a1261c6 [Enhancement](vectorized) Runtime Filter support equivalent slot of outer join (#11530) 2022-08-06 08:10:28 +08:00
57b7a416d2 [chore](build) add apache snapshot maven repo to repositories (#11549) 2022-08-06 07:15:28 +08:00
3070318f95 [Enhancement](IdGenerator) Use IdGeneratorBuffer to get better performance for creating tablet in fe when do alter table job (#11524)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-08-05 23:27:29 +08:00
574332bd6c [fix](nereids) revert tpch regession test (#11551)
Nereids tpch regression test is faulty, rollback first,and then add a more stable test later
2022-08-05 15:55:59 +08:00
52290fed90 [tools](tpch)update queries for better performance (#11523) 2022-08-05 14:04:26 +08:00