Commit Graph

2533 Commits

Author SHA1 Message Date
f944bf4d44 [Compile][Bug] Fix FE compilation bug (#4979)
[Bug] Fix compile failed that cannot find symbol for variable scanRangeLength, Introduced by #4914 #4912
2020-11-28 16:19:54 +08:00
4c63dc0027 [Metric] Add metrics for compaction permits and log for compaction merge (#4893)
1. Add metrics to `used permits` and `waitting permits` for compaction.
It would be useful to monitor `permits` hold by all executing compaction tasks and waitting compaction task.

2. Add log which can be chosen by config  for merge rowsets. 
It would be helpful to track the process of rowsets merging for compaction task which lasts for a long time.
2020-11-28 10:00:08 +08:00
f1248cb10e [BUG] Fix colocate balance bug when there is decommissioned be (#4955)
We should ignore decommissioned BE when select BEs to balance group bucketSeq.
2020-11-28 09:59:25 +08:00
2e9c8dda04 [Doris On ES][Bug-Fix] fix problem for selecting random be (#4972)
1.  Random().nextInt() maybe return negative numeric value which would result in `java.lang.ArrayIndexOutOfBoundsException`, 
pass a positive numeric value would avoid this problem.

```
int seed = new Random().nextInt(Short.MAX_VALUE) % nodesInfo.size()
```

2.  EsNodeInfo[] nodeInfos = (EsNodeInfo[]) nodesInfo.values().toArray() maybe lead `java.lang.ClassCastException  in some JDK version : [Ljava.lang.Object; cannot be cast to [Lorg.apache.doris.external.elasticsearch.EsNodeInfo` , pass the original `Class Type` can resolve this.

```
EsNodeInfo[] nodeInfos = nodesInfo.values().toArray(new EsNodeInfo[0]);
```
2020-11-28 09:57:44 +08:00
2331ce10f1 [Bug]Parquet map/list/struct structure recognize (#4968)
When a parquet file contains a `Map/List/Struct` structure, Doris can not recognize the column correctly,
and throws exception 'Invalid column: xxxx', that means Doris can not find the column.
The `Map` structure will be recognized into two columns: `key and value`.
The follow is the schema of a parquet file recognized by Doris. This patch tries to solve this problem.
2020-11-28 09:56:29 +08:00
cb749ce51d [Improvement] Add parquet file name to the error message (#4954)
When a user tries to load parquet file into Doris, like this path: `hdfs://hadoop/user/data/date=20201024/*`,
but acturally the path contains some none parquet files,the error is throwed
`Couldn't deserialize thrift: No more data to read.\\nDeserializing page header failed.`.
If the error message includes the file name information, we can quickly locate the errors.
Therefore, this patch try to add the file name to the error message.
2020-11-28 09:54:18 +08:00
c6bc30e375 [Bug] Fix httpv2 append extra useless information in get_small_file api (#4953) 2020-11-28 09:52:52 +08:00
55ce88da34 [Schema change] Support More column type in schema change (#4938)
1. Support modify column type CHAR to TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DATE
and TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE convert to a wider range of numeric types (#4937)

2. Use template to refactor code of types.h and schema_change.cpp to delete redundant code.
2020-11-28 09:52:28 +08:00
3b56b601fb Show fe commit hash on proc (#4943)
Show FE's commit has in SHOW PROC "/frontends" result.
2020-11-28 09:50:48 +08:00
0493eb172f [Optimize] optimize host selection strategy (#4914)
When a tablet selects which replica's host to execute scan operation,
it takes `round-robin` strategy to load balance. `minAssignedBytes` is the current load of one host.
If a backend is not alive momently, it will randomly take one of other replicas as the choice,
but the unalive backend's `minAssignedBytes`  not be descreased and the new choice's `minAssignedBytes`
also not be increased. That will make the real load of the backends not correct.
2020-11-28 09:48:13 +08:00
68db176013 [Refator]Modify code write error (#4950)
* fix typo in udf: replace function

Co-authored-by: wangxixu <wangxixu@xiaomi.com>
2020-11-27 12:16:45 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
2682712349 [Bug] Fix be ut compile failed and core in delta_writer_test when ulimit < 60000. (#4941) 2020-11-24 22:21:19 +08:00
b7b1d5eb38 [Refactor] Short circuit return to avoid meaningless loop (#4933) 2020-11-24 13:46:50 +08:00
37a6731244 [BUG] Fix Colocate table balance bug (#4936)
Fix bug that colocation group is always in unstable status.
2020-11-22 21:22:44 +08:00
584b33f95b [Bug] Fix the bug of NULL do not show in CTE statement. (#4932)
All Column create in inlineView will set `allowNull = false`, which will cause `NULL` data in CTE be process will be ignore.
So we should set column in inlineView allowNull to make sure correct of query.
2020-11-22 20:58:03 +08:00
8e9bbfb3ba [Script] Check and create if the log directory not existed before outputing message to the log file. (#4929)
This is a minor issue when we had FE start after a fresh installation,
but it will occur an error about the log directory is missing due to log directory is not existed
before some environment check message outputing to the log file.
the log directory creation code in bin/start_fe.sh is in the wrong place,
only need to put the log directory creation code in the beginning.
2020-11-22 20:52:32 +08:00
c28769c512 [Bug] Avoid partition prune if predicate is not with SlotRef (#4833) (#4921) 2020-11-22 20:49:20 +08:00
4f7c6da1f5 [Refactor] Refactor function getScanRangeLength (#4912)
getScanRangeLength always return 1, it is no need to maintain a function like this.
2020-11-22 20:44:11 +08:00
fb7f4c8791 [Bug] fix bug that be thrift client cannot connect to fe thrift server when fe thrift server use TThreadedSelectorServer model (#4908)
Fix bug that be thrift client cannot connect to fe thrift server when fe thrift server use TThreadedSelectorServer model
2020-11-22 20:40:33 +08:00
f1b57c4418 [Optimize] Avoid repeated sending of common components in Fragments (#4904)
This CL mainly changes:

1. Avoid repeated sending of common components in Fragments

    In the previous implementation, a query may generate multiple Fragments,
these Fragments contain some common information, such as DescriptorTable.
Fragment will be sent to BE in a certain order, so these public information will be sent repeatedly
and generated repeatedly on the BE side.

    In some complex SQL, these public information may be very large,
thereby increasing the execution time of Fragment.

    So I improved this. For multiple Fragments sent to the same BE, only the first Fragment will carry
these public information, and it will be cached on the BE side, and subsequent Fragments
no longer need to carry this information.

    In the local test, the execution time of some complex SQL can be reduced from 3 seconds to 1 second.

2. Add the time-consuming part of FE logic in Profile

    Including SQL analysis, planning, Fragment scheduling and sending on the FE side, and the time to fetch data.
2020-11-22 20:38:05 +08:00
e507fcc3b3 [Enhancement] Improve list comparing performance (#4880)
The function equalSets is not efficient enough currently, the time complexity is O(n^2).
To improve the performance of comparing two lists, this patch tries to use hash map structure
to make the time complexity to be O(n).
2020-11-22 20:35:12 +08:00
f445ed5b8a Disable the creation of segment v1 table (#4913) 2020-11-20 09:51:14 +08:00
234e9b532f [Doc] Fiexed example content in bitmap_union.md (#4919) 2020-11-20 09:49:31 +08:00
64b219f04d Fix typo (#4923) 2020-11-20 09:48:27 +08:00
d1a7f1d2c6 Fix column_reader_writer_test UT (#4924) 2020-11-20 09:47:01 +08:00
0eda5270c8 [Docs] Add doc of be_config.md and change some default value of BE config (#4906)
Add doc of be_config.md and change some default value of BE config
2020-11-18 21:56:58 +08:00
6101155679 [CodeStyle]Replace tab with spaces (#4909) 2020-11-18 21:56:07 +08:00
ec9da30c9c [New Feature]Support udf when loading data (#4863)
Many time, our users want to use UDFs they developed to ETL the data
when loading the data into Doris.
But currently, broker load does not support to use UDF.
As UDF belongs to a database, it needs to check whether it has the SELECT permission of the database.
This patch try to solve this problem.
2020-11-18 21:51:59 +08:00
6247408689 [Compact]Take tablet scan frequency into consider when selecting tablet for compaction (#4837)
A large number of small segment files will lead to low efficiency for scan operations.
Multiple small files can be merged into a large file by compaction operation.
So we could take the tablet scan frequency into consideration when selecting an tablet for compaction
and preferentially do compaction for those tablets which are scanned frequently during a
latest period of time at the present.

Using the compaction strategy of Kudu for reference, scan frequency can be calculated
for tablet during a latest period of time and be taken into consideration when calculating compaction score.
2020-11-18 21:51:12 +08:00
dcca3bbe5b Avoid duplicate column when adding slot in empty tuple (#4901)
Fixed #4900  
When the supplementary column already exists in the tuple, this column is directly materialized instead of adding a new slot.
2020-11-17 15:52:36 +08:00
bba85fc352 Update routine-load-manual.md (#4911)
add key word for routine load
2020-11-17 10:21:53 +08:00
b48c768dc7 [ComplexType] Restructure storage type to support complex types expending (#4905)
This CL includes:
* Change the column metadata to a tree structure.
* Refactor the segment_v2.ColumnReader and sgment_v2.ColumnWriter to support complex type.
* Implements the reading and writing of array type.
2020-11-16 21:59:41 +08:00
448df42fb0 [Compatibility] Add table_privileges, schema_privileges and user_privileges tables(#4899)
Add privileges tables in information_schema database
2020-11-16 21:58:30 +08:00
55080ba888 [BUG] Fix colocate join memory limit problem (#4894)
In colocate join, the memory limit of each instance is usually less than the value of exec_mem_limit,
which could lead to query failure (Memory exceed limit).
Since the purpose of resetting colocate-join memory limit
(/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java) is unclear to me,
I just change the default value of query_colocate_join_memory_limit_penalty_factor from 8 to 1, as a hotfix.
2020-11-16 21:57:00 +08:00
c5e435146d [Refactor] Remove break label for readability (#4890)
Co-authored-by: tanhao <tanhao.0902@bytedance.com>
2020-11-16 21:56:10 +08:00
5aefd701cb [Improve]modify isDecommissioned be capacity calculate rule (#4889)
I use containerized deployment of BE nodes, both using the same distributed disk.
When doing data migration, the current logic will lead to errors.
For example, my distributed disk has 10t and has been used by other services for 9T,
at this time, it is assumed that all the 9T data is used by BE nodes
2020-11-16 21:55:35 +08:00
2af4bc294f [Bug] Java Version BitmapValue deserialized failed when only has 32-bit bitmap (#4884) 2020-11-16 21:54:07 +08:00
e706a6bca4 [Doc] Running Profile document add HASH_JOIN_NODE, etc. (#4878)
- Running Profile document add `HASH_JOIN_NODE`, `CROSS_JOIN_NODE`, `UNION_NODE`, `ANALYTIC_EVAL_NODE`.
- `UNION_NODE` increase`MaterializeExprsEvaluateTime` profile.
2020-11-16 21:53:25 +08:00
18a22bd347 [BUG] Fix field error in information_schema.columns (#4858) 2020-11-15 22:01:32 +08:00
aca9b2da82 [Bug] Fix bug introduced by split RowsDelFiltered profile (#4881)
bug introduced from pr #4825, will cause `schema_change` to report an error:
```
schema_change.cpp:1271] fail to check row num! source_rows=1, merged_rows=0, filtered_rows=0, new_index_rows=0
schema_change.cpp:1921] failed to process the version. version=2-2
schema_change.cpp:1615] failed to alter tablet. base_tablet=44643.1383650721.b140317f6662c1e0-65bcbc87db8d22bc, drop new_tablet=45680.1530531459.474e41f3dd538fb6-9284085daac24f83
```
2020-11-13 16:16:10 +08:00
69c422e31e [Bug] Fix bug #4886 and #4586 by refactoring code of method 'getDbs' (#4887)
fix issue #4886
2020-11-13 11:55:10 +08:00
e9923100f2 [Profile][UT] Fix UT and remove useless profile (#4879)
Fix UT failed by #4825 and remove useless profile
2020-11-12 16:28:57 +08:00
97867364e7 Revert "[FEATURE]Check date type to avoid scan all partitions (#4756)" (#4877)
This reverts commit c8df76a807b4856f71bcb6a3a023849f3bf294d7.

This commit has some problem when handling predicate like:
`k1 = "2020-10-10 10:00:00.000"`

This is a valid predicate, and FE Datetime can not support milli or micro seconds, so it will treat it as invalid date time value.

So we revert it, and may find some better solution later.
2020-11-12 13:52:10 +08:00
796f44beac [Bug] Fix bug that routine load blocked with TOO_MANY_TASKS error (#4861)
When receiving empty msg from kafka, the load process will quit abnormally.
Fix #4860
2020-11-12 10:05:10 +08:00
1810f10497 [Bug] Fix bug that failed to create view with complex select stmt (#4840)
Fix bug that failed to create view with complex select stmt.
Fix #4839
2020-11-12 10:04:00 +08:00
a1ae399737 [Refactor] Refactor storage medium migration task process (#4475)
This CL refactor the storage medium migration task process in BE.
I did not modify the execution logic. Just extract part of the logic
in the migration task and put it in task_work_pool.

In this way, the migration task is only used to process the migration
from the specified tablet to the specified data dir.

Later, we can use this task to migrate of tablets between different disks. #4476
2020-11-12 10:00:43 +08:00
dd70653c91 [DOCS] Fix some docs typo (#4873) 2020-11-11 21:24:19 +08:00
1151a0063c [Bug] Make 'LastStartTime' in backends list as the actual BE start time (#4872)
We use 'LastStartTime' in backends list to check whether there is an unexpected
restart of BE, but it will be changed as BE's first heartbeat time after FE
restarted, it would be better to set it to BE's actual start time.
2020-11-11 21:24:06 +08:00
4ccd7b84ad [Bug] Rename table logic error (#4870)
1. Rename table operation will failed to drop table with old name in Catalog.
2. Rename table operation forget the check rollup names.
2020-11-11 21:22:10 +08:00