Commit Graph

2157 Commits

Author SHA1 Message Date
abeb25d2a9 Fx large int literal (#4168) 2020-07-30 00:53:50 +08:00
0e79f6908b [CodeRefactor] Modify FE modules (#4146)
This CL mainly changes:

1. Add 2 new FE modules

    1. fe-common

        save all common classes for other modules, currently only `jmockit`
        
    2. spark-dpp

        The Spark DPP application for Spark Load. And I removed all dpp related classes to this module, including unit tests.
        
2. Change the `build.sh`

    Add a new param `--spark-dpp` to compile the `spark-dpp` alone. And `--fe` will compile all FE modules.
    
    the output of `spark-dpp` module is `spark-dpp-1.0.0-jar-with-dependencies.jar`, and it will be installed to `output/fe/spark-dpp/`.

3. Modify some bugs of spark load
2020-07-29 16:18:05 +08:00
1b3af783e6 [Plugin] Add properties grammar in InstallPluginStmt (#4173)
This PR is to support grammar like the following: INSTALL PLUGIN FROM [source] [PROPERTIES("KEY"="VALUE", ...)]
user can set md5sum="xxxxxxx", so we don't need to provide a md5 uri.
2020-07-29 15:02:31 +08:00
83a751497e [Bug][Socket Leak] Fix bug that Mysql NIO server is leaking sockets (#4192)
When using mysql nio server, if the mysql handshake protocol fails,
we need to actively close the channel to prevent socket leakage.
2020-07-29 15:01:27 +08:00
59676a1117 [BUG] fix 4149, add sessionVariable to choose broadcastjoin first when cardinality cannot be estimated (#4150) 2020-07-29 12:28:52 +08:00
79b4f92cb7 Rewrite GroupByClause.oriGroupingExprs (#4197) 2020-07-29 12:27:15 +08:00
f292d80266 [Bug][SchemaChange]Fix alter schema add key column bug in agg model (#4143)
Fix the bug "alter schema add key column bug in agg model when using LinkedSchemaChange policy",
the detail description #4142.
2020-07-28 20:53:04 +08:00
841f9cd07b [Bug][SparkLoad] Divide the upload in spark repository into two steps (#4195)
When Fe uploads the spark archive, the broker may fail to write the file,
resulting in the bad file being uploaded to the repository.

Therefore, in order to prevent spark from reading bad files, we need to
divide the upload into two steps.
The first step is to upload the file, and the second step is to rename the file with MD5 value.
2020-07-28 16:24:07 +08:00
150f8e0e2b Support check committed txns before catalog drop meta, like db, table, partition etc (#4029)
This PR is to ensure that dropped db , table or partition can be with normal state after recovered by user. Commited txns can not be aborted, because the partitions's commited versions have been changed, and some tablets may already have new visible versions. If user just don't want the meta(db, table or partition) anymore, just use drop force instead of drop to skip committed txn check.
2020-07-28 15:18:52 +08:00
90eaa514ba [SQL][JCUP]Reduce conflict of sql_parser.cup (#4177)
Fix the Shift/Reduce conflict in cup file #4176
2020-07-28 10:04:50 +08:00
a2b53b8ddd [Profile] Add transfer destinations detail to profile (#4161)
Add transfer destinations detail to profile
2020-07-27 23:37:50 +08:00
50e6a2c8a0 [SQL][Function] Fix from/to_base64 may return incorrect value (#4183)
from/to_base64 may return incorrect value when the value is null #4130 
remove the duplicated base64 code
fix the base64 encoded string length is wrong, and this will cause the memory error
2020-07-27 22:55:05 +08:00
9e5ca697f3 [Doc] Fix typo for stream load content in basic-usage.md (#4185) 2020-07-27 16:50:15 +08:00
94ac0f43dc Use LongAdder or volatile long to replace AtomicLong in some scenarios (#4131)
This PR is to use LongAdder or volatile long to replace AtomicLong in some scenarios.
In the statistical summation scenario, LongAdder(introduced by jdk1.8) has better performance than AtomicLong in high concurrency update scenario. And if we just want to keep get and set operation for variable to be atomic, just add volatile at the front of the variable is enough, use AtomicLong is a little heavy.
NOTE: LongAdder is usually preferable to AtomicLong when multiple threads update a common sum that is used for purposes such as collecting statistics, not for fine-grained synchronization control, such as auto-incremental ids.
2020-07-27 15:48:35 +08:00
f2c9e1e534 [Spark Load]Create spark load's repository in HDFS for dependencies (#4163)
### Resume
When users use spark load, they have to upload the dependent jars to hdfs every time.
This cl will add a self-generated repository under working_dir folder in hdfs for saving dependecies of spark dpp programe and spark platform.
Note that, the dependcies we upload to repository include:
1、`spark-dpp.jar`
2、`spark2x.zip`
1 is the dpp library which built with spark-dpp submodule. See details about spark-dpp submodule in pr #4146 .
2 is the spark2.x.x platform library which contains all jars in $SPARK_HOME/jars

**The repository structure** will be like this:

```
__spark_repository__/
    |-__archive_1_0_0/
    |        |-__lib_990325d2c0d1d5e45bf675e54e44fb16_spark-dpp.jar
    |        |-__lib_7670c29daf535efe3c9b923f778f61fc_spark-2x.zip
    |-__archive_2_2_0/
    |        |-__lib_64d5696f99c379af2bee28c1c84271d5_spark-dpp.jar
    |        |-__lib_1bbb74bb6b264a270bc7fca3e964160f_spark-2x.zip
    |-__archive_3_2_0/
    |        |-...
```

The followinng conditions will force fe to upload dependencies:
1、When fe find its dppVersion is absent in repository.
2、The MD5 value of remote file does not match the local file.
Before Fe uploads the dependencies, it will create an archive directory with name `__archive_{dppVersion}` under the repository.
2020-07-27 01:48:41 +00:00
ed8cb6a002 [Feature][Meta]Update/Read/Write VisibleVersionTime for Partition#4076 (#4086)
#4076 
1. The visibleVersionTime is updated when insert data to partition
2. GlobalTransactionMgr call partition.updateVisibleVersionAndVersionHash(version, versionHash) when fe is restarted
3. If fe restart, VisibleVersionTime may be changed, but the changed value is newer than the old value
2020-07-26 21:20:55 +08:00
1f7009354a [Bug] Add db read lock when processing unfinished publish task (#4178)
This Bug was introduced by PR #4053, here should add db read lock
when processing unfinished publish task.
2020-07-26 20:15:14 +08:00
911eb04594 [Bug][UpdateDataQuota] Skip update used data quota for information_schema db and fix bug for wrong time interval for UpdateDbUsedDataQuotaDaemon (#4175)
This PR is to skip update used data quota for information_schema db,
and fix bug for wrong time interval for UpdateDbUsedDataQuotaDaemon.
2020-07-26 20:14:03 +08:00
4d828d2411 Fix recover database not in "show databases" (#4170) 2020-07-25 10:04:35 +08:00
b32500bda0 [Script] Restore build parallel config (#4166) 2020-07-24 21:30:56 +08:00
4608f9786e Support checking database used data quota when data load job begin a new txn (#3955)
Now, we only check database used data quota when create or alter table, or in some old type load job, but not for routine load job and stream load job. This PR provide a uniform solution to check db used data quota when data load job begin a new txn.
2020-07-24 10:03:43 +08:00
28f4d30542 Optimize the logic of processing unfinishedTask when transaction is publishTimeout (#4053)
This PR is to optimize the logic of processing unfinishedTask when transaction is publishTimeout, we find errorReplica by
"db -> table -> partition -> index -> tablet(backendId) -> replica" path.
2020-07-24 09:59:01 +08:00
a01d1aec56 [Compaction] track RowsetReader's mem & add metric (#4068)
Ref https://github.com/apache/incubator-doris/issues/3624#issuecomment-655933244
Only RowsetReaders in compaction are under the track.
Other RowsetReaders won't be effected, because the parent_tracker is nullptr.
2020-07-24 07:58:09 +08:00
443b8f100b [Feature][ThreadPool]Add Web Page to display thread's stats (#4110)
This CL mainly includes:
- add some methods to get thread's stats from Linux's system file in
env.
- support get thread's stats by http method.
- register page handle in BE to show thread's stats to help developer
position some thread relate problem.
2020-07-23 21:08:36 +08:00
2334f5d997 Fix some problem related with publish version task (#4089)
This PR is mainly do following three things:
1. Add thread name in fe log to make trace problem more easy.
2. Add agent_task_resend_wait_time_ms config to escape sending duplicate agent task to be.
3. Skip to continue to update replica version when new version is lower than replica version in fe.
2020-07-23 20:06:02 +08:00
d66609de85 [Code Structure] Move the code file to the right place (#4154)
IsNullPredicateTest.java is not is right place
2020-07-23 15:49:52 +08:00
75ebe2b363 [Bug] Compaction row number cannot be matched between input rowsets and output rowsets. (#4139)
Unique Key table will load duplicate rows for different loads.
If exists duplicate row between loads. Compaction will merge this rows.
The statistics should take this merged number into consideration.
Now, We missed the merged number. So it will encounter error when compaction.
2020-07-23 10:28:56 +08:00
e4f5a2936b [TabletRepair] Delete bad replicas when no BE can be used to create new replica
When there is no available BE for relocating replicas, delete the bad replica first.
2020-07-22 22:42:31 +08:00
31a6c43a69 [Bug][Alter] Fix boolean support (#4123)
Fixes #4122
 *  add type check when add bloom filter index on boolean column.
 *  support add boolean column.
2020-07-22 22:38:55 +08:00
cc7f04de2c [Log] Add compaction point log record (#4128)
Add log to record the compaction point changing.
When OLAP_ERR_BE_SEGMENTS_OVERLAPPING happens,
it will be used to  track the bugs. Add issue #4134 link.
2020-07-22 22:35:49 +08:00
5c4bba107e [Bug] Fix isnull(null) analyze error (#4094) 2020-07-22 20:04:39 +08:00
46c8c250a6 [Bug] fix use-after-poison bug in ut schema_change_test (#4118)
Using slice->data to create HyperLogLog, it will exec HyperLogLog(Slice(const char*)). Then Slice(const char*) will use strlen(data) to calc the size. But the slice in this unit test isn't a C-string. Need to use Slice.
2020-07-22 09:33:41 +08:00
ad17afef91 [CodeRefactor] #4098 Make FE multi module (#4099)
This PR change the FE code structure to maven multi module structure. 
See ISSUE: #4098 for more info, such as How to resolve conflicts.
2020-07-21 12:42:42 +08:00
2de4f2471b [MV] Add framework of mv selector (#4014)
This commit mainly supports creating bitmap_union, hll_union, and count materialized views.
* The main changes are as follows:
1. When creating a materialized view, doris judge the semantic analysis of the newly supported aggregate function.
Only bitmap_union(to_bitmap(column)), hll_union(hll_hash(column)) and count(column) are supported.

2. Match the correct materialized view when querying.
After the user sends the query, if there is a possibility of matching the materialized view, the query will be rewritten firstly.
    Such as:
    Table: k1 int, k2 int
    MV: k1 int, mv_bitmap_union_k2 bitmap mv_bitmap_union
        mv_bitmap_union = to_bitmap(k2)
    Query: select k1, count(distinct k2) from Table
    Found that there is a match between the materialized view column and the query column, the query is rewritten as:
    Rewritten query: select k1, bitmap_union_count(mv_bitmap_union_k2) from table

Then when the materialized view is matched, it can be matched to the query materialized view table.
Sometimes the rewritten query may not match any materialized view, which means that the rewriting failed. The query needs to be re-parsed and executed again.
2020-07-20 17:26:40 +08:00
fbf7bd6a1d [Bug] Change get load state interface (#4081)
Now, the PathTrie will match wrong interface between
/api/{db}/{table} and /api/{db}/{label}
2020-07-20 15:51:27 +08:00
03cf9b2a24 [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
Related issue #4017, main changes as follows:
1. Add expired_snapshot_rs_version_map,_expired_snapshot_rs_metas,
2. Add  VersionedRowsetTracker record compacted path version
3. Record path version when rowsets compact
4. In gc process, add expired snapshot rowsets to unused set to remove.
2020-07-19 22:03:59 +08:00
15d9e10a8b [Bug] Fix bug that tablet meta lock twice (#4112)
* [Bug] Fix bug that tablet meta lock twice

The tablet meta load may already be hold before calling
generate_tablet_meta_copy(), so we need provide a unlocked
version of generate_tablet_meta_copy()

* fix typo

Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2020-07-19 21:27:24 +08:00
bb35de2ccb [Bug][Alter] Cancel the alter job if database has been dropped (#4088)
Cancel the alter job in WAITING_TXN state if database has been dropped
Fix: #4087
2020-07-19 21:26:57 +08:00
8500d8b695 [metrics] Use atomic instead of SpinLock for integer metric (#4036) 2020-07-17 11:01:33 +08:00
de3c4b198e Support materialized view extend column in load and insert (#3677)
This commit mainly supports load bitmap_union, hll_union, and count materialized views.

The main changes are as follows:
1、insert stmt support load extend column
2、load stmt support load extend column

Issue : #3344

Co-authored-by: HangyuanLiu <460660956@qq.com>
2020-07-17 10:43:36 +08:00
d07a23ece3 [webserver] Introduce mustache to simplify BE's website render (#4062)
cpp-mustache is a C++ implementation of a Mustache template engine
with support for RapidJSON, and in order to simplify RapidJSON object
building, we introduce class EasyJson from Apache Kudu.
2020-07-16 22:39:51 +08:00
db50c19aad [Thread Resource Leak] Fix thread resource leak after checkpoint catalog destroyed (#4049)
This PR is mainly to fix thread resource leak, and then add some notice
to use newDaemonScheduledThreadPool api in ThreadPoolManager.
2020-07-16 22:38:39 +08:00
3a4a38c2fc [Bug] Fix orc decimal (#4097)
Result may error when ORC load negative decimal value

When load negative decimal which has pre zero , the result is wrong.
eg -0.0014, the orc result is -14(precision ... 0)
2020-07-16 22:36:52 +08:00
1aec46b215 [Bug] Do not choose decommissioned BE in colocate balance For #4102 (#4103)
When set `disable_colocate_balance` to false and set some BE to decommission,
`Coloratebalancer#balanceGroup` will choose decommissioned BE to locate tablets,
which is not right

Fix #4102
2020-07-16 22:34:51 +08:00
a0c19df18c [Website] Redesign the home page of document website (master) (#4069) 2020-07-16 11:36:24 +08:00
5032b7fe7a Support materialized view schema change in bitmap hll and count field [#3739] (#3873)
+ Building the materialized view function for schema_change here based on defineExpr.
+ This is a trick because the current storage layer does not support expression evaluation.
+ count distinct materialized view will set mv_expr with to_bitmap or hll_hash.
+ count materialized view will set mv_expr with count.
+ Support to regenerate historical data when a new materialized view is created in BE。
    + Support to_bitmap function
    + Support hll_hash function
    + Support count(field) function
For #3344
2020-07-16 10:45:15 +08:00
14ac49dde5 Fix be may core dump when linked schema change (#4079)
* fix a core

* Update be/src/olap/rowset/segment_group.cpp
2020-07-15 10:14:42 +08:00
9b0ad66b78 [runtime] Replace the thread pool in FragmentMgr (#4057) 2020-07-15 10:03:48 +08:00
c00326bd85 [Doc] Create CODE_OF_CONDUCT.md (#4070) 2020-07-14 22:28:38 +08:00
5e555bfafb [GithubTemplate] Fix PR template (#4092)
move PR template to .github root.
2020-07-14 10:50:35 +08:00