doris

Author	SHA1	Message	Date
satanson	664e6a5898	[Storage] "align_tag_path" and ALIGN_TAG_PREFIX is needless (#4410 )	2020-08-26 10:47:21 +08:00
Stalary	ca5e224594	[Bug] Fix the bug that replication_num in show create table is incorrect (#4393 )	2020-08-26 10:43:59 +08:00
Mingyu Chen	763a42c9af	[MySQL Compatibility 2/4][Bug] Fix bug and improve compatibility with mysql protocol (#4362 ) 1. select database() will only return database name, without cluster name. 2. select user() will return the IP which user connected in.	2020-08-26 10:40:42 +08:00
weizuo93	613c44e889	[Optimize]Optimize the disk selection strategy on BE for tablet creation (#4373 ) When creating a tablet, it is necessary to select a disk from all disks that meet the requirements on the BE node to store the tablet. In Doris, the current disk selection strategy is to randomly select a disk from all disks that meet the requirements for tablet creation. After the cluster has been running for a long time, we found that the distribution of the number of tablets on different disks in a BE node is unbalanced. In order to solve this problem, we introduced the algorithm of "two random choices" for disk selection when creating the tablet: (1) Select two disks from all disks that meet the requirements on the BE node randomly； (2) Choose the disk with a smaller number of tablet from the two disks selected in (1) for tablet creation.	2020-08-26 10:35:33 +08:00
Mingyu Chen	0040153c51	[MySQL Compatibility 1/4][Bug] Fix bug that set sql_mode with concat() function failed (#4359 ) Support `set sql_mode = concat(@@sql_mode, "STRICT_TRANS_TABLES");`	2020-08-26 10:28:25 +08:00
HangyuanLiu	b1c7841c20	[SQL] Fix TupleIsNull miss in SelectStmt resultExpr (#4279 )	2020-08-26 10:27:50 +08:00
Lijia Liu	d5a0a738f4	[SQL] Rewrite count(distinct if(bool, bitmap, null)) to bitmap_union_count (#4201 ) Add IF(BOOL, BITMAP, BITMAP) function.	2020-08-26 10:26:40 +08:00
wyb	691227922e	[SQL Plan]Fix explicit broadcast join bug (#4424 ) Use broadcast join when users specify explicitly [BROADCAST] in queries.	2020-08-25 22:06:45 +08:00
ZhangYu0123	c201cf6e4f	Support batch delete[part 2] (#4425 ) support batch delete for read compaction	2020-08-25 14:05:04 +08:00
caiconghui	1410d4e623	[Doc] Add in predicate support content in delete-manual.md (#4404 ) Add in predicate support content in delete-manual.md	2020-08-24 21:52:28 +08:00
Mingyu Chen	67b842ce04	[License] Organize and modify the license of the code (#4371 ) 1. Disable the MySQL client and LZO library by default when building the Doris. MySQL client library is used for MySQL external table feature. This feature will be replaced by the new ODBC external table soon. LZO library is used to compress/decompress data of some old data format of Doris, which is no longer used anymore. 2. Add missing license to some files. 3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared. 4. Remove the js source code from webroot, it will be downloaded as thirdparty	2020-08-24 21:51:55 +08:00
Mingyu Chen	976820ba20	[SegmentV2] Change the default storage format to SegmentV2 (#4387 ) Since the Segment V2 has been released for a long time, we should make it as default storage format for newly created table. This CL mainly changes: 1. For all newly created tables, their default storage format is Segment V2. 2. For all already exist tablets, their storage format remain unchanged. 3. Fix bugs described in Fix #4384 and Fix #4385	2020-08-24 21:51:17 +08:00
HappenLee	5fc79561d7	[MemTracker][Bug-Fix] Fix core in DECHECK in memory tracker (#4421 ) Fix DECHECK failed in mem_tracker, issue #4420	2020-08-23 22:41:02 +08:00
Zhengguo Yang	af2b749a87	make some readFields Deprecated (#4399 ) We have changed most of our serialization methods to json. In order to be compatible with previous data, these classes still retain the readFields method. Some prs that involve modifying metadata often modify the readFields method. To avoid this, we should Mark these methods as Deprecated #4398	2020-08-21 22:58:08 +08:00
Zhengguo Yang	d61c10b761	[Delete] Support batch delete [part 1] (#4310 ) * Implements the grammar of the batch delete #4051 * Process create, alter table when table has delete sign column * Support the syntax for enabling the delete column * Automatically filtered deleted data in the select statement. * Automatically add delete sign when create rollup table TODO: * Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction	2020-08-21 22:57:16 +08:00
HappenLee	984006adf9	[ODBC SCAN NODE] 2/4 Add Thrift Interface of odbc_scan_node (#4389 ) issue:#4376	2020-08-21 21:26:48 +08:00
HappenLee	a8fe54b7b9	[ODBC SCAN NODE] 1/4 Add unix odbc library. (#4377 )	2020-08-21 21:26:14 +08:00
lichaoyong	5976395bb6	[BUG] Remove the deduplication of LEFT SEMI/ANTI JOIN with not equal predicate (#4417 ) ``` SELECT * FROM (SELECT cs_order_number, cs_warehouse_sk FROM catalog_sales WHERE cs_order_number = 125005 AND cs_warehouse_sk = 4) cs1 LEFT SEMI JOIN (SELECT cs_order_number, cs_warehouse_sk FROM catalog_sales WHERE cs_order_number = 125005) cs2 ON cs1.cs_order_number = cs2.cs_order_number AND cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk; ``` The above query has an equal predicate and a not equal predicate. If there exists not equal preidcate, the build table should be remained as it is. So the deduplication should be removed.	2020-08-21 19:55:09 +08:00
EmmyMiao87	76a04de6c4	[MV] Input correct keys type of index meta when `Add Partition` (#4408 ) Define Expr will not serialized in Column `toThrift`. 1. When adding partition, different indexes should use their own keys type instead of using the keys type of base table uniformly. ` 2. There are two kinds of define expr in Column , one is analyzed, and the other is not analyzed. Currently, analyzed define expr is only used when creating materialized views, so the define expr in RollupJob must be analyzed. In other cases, such as define expr in `MaterializedIndexMeta`, it may not be analyzed after being relayed. When executing the load, the analyzed define expr (such as to_bitmap(cast(k1, varchar))) will not be analyzed again. Only a cast function will be added to the inner layer(such as to_bitmap(cast(cast(k1 ,int), varchar))) which is analyzed too. The define expr that has not been analyzed (such as cast(k1, varchar)) will be analyzed when executing the load.	2020-08-21 10:42:41 +08:00
ZhangYu0123	a7422ee142	[UT][Bug-Fix] Resolve UT memory leak problem (#4406 ) Fix ut memory leak on Fix #4164	2020-08-21 10:41:54 +08:00
EmmyMiao87	09b1965499	[MV] Fix errors when alter materialized view which based on dup table (#4375 ) 1. Input the correct keys type when mv is updated. The keys type of mv should be used in schema change job rather then keys type of base table. Otherwise, the be will core and thrown exception "Create replicas failed". 2. Forbidden add non-key column on agg mv directly when base table is duplicate model If a dup table has a agg mv, user will not add a non-key column on mv. The non-key column can only be added to dup index.	2020-08-21 10:36:03 +08:00
bigdataplumber	0715c54004	Fix mispelling (#4407 ) Centers to centos	2020-08-21 09:14:21 +08:00
EmmyMiao87	04a75b7c28	[Doc] Fix spelling errors in dynamic partition docs (#4395 ) Change-Id: I84de1602b99c6b89b59ccc5869c96516c40a181d	2020-08-20 09:31:33 +08:00
Janko	008149a7a5	[Doc] Increase mailing list subscription method. (#4391 )	2020-08-20 09:30:54 +08:00
EmmyMiao87	6bb111b42c	Modify mv rewrite rule on 'Count distinct' (#4382 ) The rewrite rule named `CountToSum` does not distinguish between `Count` and `Count distinct` which causes `Count distinct` is rewritten as `Sum` incorrectly. So this commit modified matching rule. When the function is `Count distinct`, the rewrite rule will not take effect. Fixed #4381	2020-08-20 09:30:35 +08:00
Mingyu Chen	b6859f1bd4	[JsonLoad] Fix bug that row num stat is not correct when loading json (#4379 ) When all fields are null, the row is invalid, it should be filtered	2020-08-20 09:30:19 +08:00
ZhangYu0123	60d9d31ec1	[Optimize] Optimize coding bit operation in BE (#4366 ) Optimize bit operation in variable length coding. Remove unnecessary bit operation.	2020-08-20 09:29:53 +08:00
xinghuayu007	bfb39a2826	[SQL][Function] Add replace() function (#4347 ) replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow: mysql> select replace("http://www.baidu.com:9090", "9090", ""); +------------------------------------------------------+ \| replace('http://www.baidu.com:9090', '9090', '') \| +------------------------------------------------------+ \| http://www.baidu.com: \| +------------------------------------------------------+	2020-08-20 09:28:53 +08:00
Mingyu Chen	4c571cb6f5	Revert "[Metrics] Support tablet level metrics (#4327 )" (#4397 ) This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-08-19 22:37:52 +08:00
Mingyu Chen	ea6d7c281d	[Bug] Remove RECOVER_TABLET worker pool to make ASAN compile happy (#4392 ) Co-authored-by: morningman <chenmingyu@baidu.com>	2020-08-19 17:39:46 +08:00
zhbinbin	f92428248f	Support udaf_orthogonal_bitmap (#4198 ) The original Doris bitmap aggregation function has poor performance on the intersection and union set of bitmap cardinality of more than one billion. There are two reasons for this. The first is that when the bitmap cardinality is large, if the data size exceeds 1g, the network / disk IO time consumption will increase; The second point is that all the sink data of the back-end be instance are transferred to the top node for intersection and union calculation, which leads to the pressure on the top single node and becomes the bottleneck. My solution is to create a fixed schema table based on the Doris fragmentation rule, and hash fragment the ID range based on the bitmap, that is, cut the ID range vertically to form a small cube. Such bitmap blocks will become smaller and evenly distributed on all back-end be instances. Based on the schema table, some new high-performance udaf aggregation functions are developed. All Scan nodes participate in intersection and union calculation, and top nodes only summarize The design goal is that the base number of bitmap is more than 10 billion, and the response time of cross union set calculation of 100 dimensional granularity is within 5 s. There are three udaf functions in this commit: orthogonal_bitmap_intersect_count, orthogonal_bitmap_union_count, orthogonal_bitmap_intersect.	2020-08-19 10:29:13 +08:00
ZhangYu0123	dc3ed1c525	[Compaction]Compaction rules optimization (#4212 ) Compaction rules optimization, the detail problem description and design to see #4164. This pr commits 2 functions: (1) add the cumulative policy configable, and implement original policy. (2) implement universal policy, the optimization version in #4164.	2020-08-19 09:34:13 +08:00
Yingchun Lai	56260a65c8	[Metrics] Support tablet level metrics (#4327 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-08-18 16:56:12 +08:00
Mingyu Chen	e25108097d	[Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage (#4345 ) After PR: #4135, If a mem tracker has parent, it should be created by 'CreateTracker'. So I removed other unused constructors. And also fix the bug described in #4344	2020-08-18 16:54:55 +08:00
Mingyu Chen	38a2a7a269	[Bug] Fix bug that modification of global variable can not be persisted. (#4324 ) When setting global variables, such as `set global default_rowset_type=beta`, the operation is not correctly persisted. This CL change the fe meta version to 90. --------------- The main reason for this problem is that for the modification of global variable, we directly use Java's reflection mechanism to modify static member variables in `GlobalVariable` class. But in the persistence method of the `set` operation, we only persist the value stored in the `globalSessionVariable` variable, and this variable does not contain Global Variable. So I added a new OperationType: `OP_GLOBAL_VARIABLE_V2`, and added a `GlobalVarPersistInfo` class to record all changes.	2020-08-18 16:54:35 +08:00
ZhangYu0123	8a3eaeecf1	Update support batch delete storage design document (#4234 ) * Update delete index design document	2020-08-18 15:37:14 +08:00
Mingyu Chen	3359467b9a	[Tablet][Recovery] Support using empty tablet to repair the damaged or missing tablet (#4255 ) In some very special circumstances, such as code bugs, or human misoperation, etc., all replicas of some tablets may be lost. In this case, the data has been substantially lost. However, in some scenarios, the business still hopes to ensure that the query will not report errors even if there is data loss, and reduce the perception of the user layer. At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally. Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one. Also fix a bug in Fix #4274	2020-08-18 06:13:53 +00:00
caoyang10	53d00d92cc	[Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351 ) (#4352 ) resolve the problem of querying ES table always route at same 3 BE nodes because of random strategy	2020-08-18 10:36:18 +08:00
lichaoyong	d5e456a3c3	[BUG] Fix except wrong answer bug (#4369 ) Doris use HashTable to implement except. If user send A except B except C, first do A except B and then except C. After A except B, HashTable will be rebuild. There is a bug here to throw some rows.	2020-08-18 09:23:48 +08:00
Stalary	26fe510011	[Doc] modify the document error (#4357 )	2020-08-17 23:06:23 +08:00
WangCong	391d534ae7	[Bug]Fix bug that BE crash when load ORC file (#4350 )	2020-08-17 22:55:29 +08:00
xueyan.li	e69496feaf	[MysqlCompatibility] Support collate field option in expr (#4365 ) Support SQL like: ``` select collation_name, character_set_name, is_default collate utf8_general_ci = 'Yes' as is_default from information_schema.collations ```	2020-08-17 22:52:57 +08:00
EmmyMiao87	38921d4343	[MV]Forbidden aggregated partition key column on mv (#4343 ) The partition column of table also must be the key in materialized view. If not, when user wants to add partition of table, the be will core. The materialized view could not create partition correctly when partition column has been aggregated.	2020-08-15 11:38:50 +08:00
ZhangYu0123	d6028863f3	[Compaction] Manually trigger compaction RESTapi interface (#4312 ) Add restapi to be which do compaction task by manual trigger. The detail design in #4311 .	2020-08-13 23:41:46 +08:00
HangyuanLiu	4fa35c9f39	[Bug][RoutineLoad] Fix routine load timezone property invalid (#4339 )	2020-08-13 23:40:54 +08:00
Mingyu Chen	05fa55047e	[Doc][Json Load] Improve json data format load documents (#4337 ) And some detail explaination of JsonPath and Columns parameter	2020-08-13 23:39:57 +08:00
ZhangYu0123	1d9b3aeee7	[Doc] Repair document format (#4336 ) The error format '##keyword' in a lot of docs. This pr is to repair document format. #4335	2020-08-13 23:39:41 +08:00
xueyan.li	ac9c7741e9	[SQL]Support datagrip show database information (#4332 ) Support show schema()	2020-08-13 23:39:05 +08:00
ZhangYu0123	11ec7bbe24	[Bug]Add LargeInt cast to Date and Datatime, add timezone to stale_version_path_json_doc (#4321 ) (1) Add LargeInt cast to date and datatime, see #3864 LargeInt can cast to date and datatime. Fix this error: Unable to find _ZN5doris13CastFunctions16cast_to_date_valEPN9doris_udf15FunctionContextERKNS1_11LargeIntValE (2) Add local timezone info to stale_version_path_json_doc rest api Add timezone to "last create time" field. { "path id": "1", "last create time": "1970-01-01 10:46:40 +0800", "path list": "1 -> [2-3] -> [4-5]" }, and add timezone to the test unix, see #4121 .	2020-08-13 23:38:30 +08:00
wangbo	790779fb6f	[SparkLoad]remove unncessary convert from dataframe to rdd (#4304 )	2020-08-13 23:37:38 +08:00

... 319 320 321 322 323 ...

18263 Commits