doris

Author	SHA1	Message	Date
yangzhg	502fa2eb50	[GroupingSet] Fix core when using grouping sets in large data (#2858 ) dst_tuples memory size to Allocate is wrong	2020-02-07 21:40:29 +08:00
kangkaisen	feb02ab27a	Make intersect_count function accept any expression that returns bitmap (#2850 )	2020-02-07 09:56:54 +08:00
kangkaisen	e7817053cc	[Uitls] ParseUtil::parse_mem_spec support K and T suffix (#2854 )	2020-02-07 09:31:35 +08:00
Yunfeng,Wu	b35e8153c0	[Doris on Es] Fix lte and gte error expression (#2851 ) LE should LTE GE should GTE	2020-02-06 20:52:14 +08:00
Mingyu Chen	f77cfcdb61	[Compaction] Avoid unnecessary compaction (#2839 ) It is not necessary to perform compaction in the following cases 1. A tablet has only 2 rowsets, the versions are [0-1] and [2-x]. In this case, there is no need to perform base compaction because the [0-1] version is an empty version. Some tables will be partitioned by day, and then each partition will only load one batch of data each day, so a large number of tablets with rowsets [0-1][2-2] will appear. And these tablets do not need to be base compaction. 2. The initial value of the `last successful execution time of compaction` is 0, which causes the first time to determine the time interval from the last successful execution time of compaction, which always meets the conditions to trigger cumulative compaction.	2020-02-06 16:40:38 +08:00
caiconghui	d549c40fcd	Fix spelling mistakes for load metrics description (#2840 )	2020-02-06 10:18:30 +08:00
LingBin	14c772013b	Fix removing tablet bug from partition_map in TabletManager (#2842 ) When using an iterator of _tablet_map.tablet_arr(`std::list`) to remove a tablet, we should first remove tablet from _partition_map to avoid the iterator becoming invalid.	2020-02-06 09:57:12 +08:00
LingBin	e991b1300f	[Code Refactor] Refactor AgentServer to make it less error-prone and more readable (#2831 ) In `AgentServer`, each task type needs to be processed separately, which leads to very long code, hard to read, and not easy to detect errors (for example, some task type processing may be missed, corresponding relationship may be error) Fortunately, the code for each task_type is very similar, so this is a good case to use `MACRO`, which can greatly reduce the repeated code and solve above problems. This patch also fix two small bugs: 1. The `_topic_subscriber` member has not been released in dtor 2. in `submit_tasks()`, the `status_code` is not reset before each task is processed, resulting in wrong judgment. No functional changes in this patch.	2020-02-06 09:56:00 +08:00
ZHAO Chun	25a6d6abbe	Make cmake and maven configurable (#2837 )	2020-02-05 23:04:29 +08:00
yangzhg	581c771ff3	[Doc] Add create index usage document (#2832 )	2020-02-05 15:35:41 +08:00
LingBin	ee5323a6a0	[Code Refactor]Improve initialization flow of Schema (#2833 ) When constructing `Schema` objects, two similar `init` functions need to be called, and the call order is implicitly required, which is easy to be misused. At the same time, some of the existing comments are missing or out of date, which will cause some misleading. This patch unifies the initialization logic of `Schema`. No functional changes in this patch.	2020-02-05 11:48:54 +08:00
kangpinghuang	a27e89065b	Add file cache for v2 (#2782 ) Add file descriptor cache for segment v2 to solve too many open file problems	2020-02-04 00:16:01 +08:00
Mingyu Chen	bb4a7381ae	[UnitTest] Support starting mocked FE and BE process in unit test (#2826 ) This CL implements a simulated FE process and a simulated BE service. You can view their specific usage methods at `fe/src/test/java/org/apache/doris/utframe/DemoTest.java` At the same time, I modified the configuration of the maven-surefire-plugin plugin, so that each unit test runs in a separate JVM, which can avoid conflicts caused by various singleton classes in FE. Starting a separate jvm for each unit test will bring about 30% extra time overhead. However, you can control the number of concurrency of unit tests by setting the `forkCount` configuration of the maven-surefire-plugin plugin in `fe/pom.xml`. The default configuration is still 1 for easy viewing of the output log. If set to 3, the entire FE unit test run time is about 4 minutes.	2020-02-03 21:17:57 +08:00
Mingyu Chen	bb00f7e656	[Load] Fix bug of wrong file group aggregation when handling broker load job (#2824 ) Describe the bug First, In the broker load, we allow users to add multiple data descriptions. Each data description represents a description of a file (or set of files). Including file path, delimiter, table and partitions to be loaded, and other information. When the user specifies multiple data descriptions, Doris currently aggregates the data descriptions belonging to the same table and generates a unified load task. The problem here is that although different data descriptions point to the same table, they may specify different partitions. Therefore, the aggregation of data description should not only consider the table level, but also the partition level. Examples are as follows: data description 1 is: ``` DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1") INTO TABLE `tbl1` PARTITION (p1, p2) ``` data description 2 is: ``` DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2") INTO TABLE `tbl1` PARTITION (p3, p4) ``` What user expects is to load file1 into partition p1 and p2 of tbl1, and load file2 into paritition p3 and p4 of same table. But currently, it will be aggregated together, which result in loading file1 and file2 into all partitions p1, p2, p3 and p4. Second, the following 2 data descriptions are not allowed: ``` DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1") INTO TABLE `tbl1` PARTITION (p1, p2) DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2") INTO TABLE `tbl1` PARTITION (p2, p3) ``` They have overlapping partition(p2), which is not support yet. And we should throw an Exception to cancel this load job. Third, there is a problem with the code implementation. In the constructor of `OlapTableSink.java`, we pass in a string of partition names separated by commas. But at the `OlapTableSink` level, we should be able to pass in a list of partition ids directly, instead of names. ISSUE: #2823	2020-02-03 20:15:13 +08:00
Lijia Liu	99ad56d1bf	Support bitmap index for more type (#2630 ) For #2589 1. date(uint24_t)/datetime(int64_t)/largeint(int128_t) use frame of reference code as dict. 2. decimal(decimal12_t) also uses frame of reference code as dict. 3. float/double use bitshuffle code as dict.	2020-01-31 21:09:29 +08:00
Lishi	89c7234c1c	Support starts_with (str, prefix) function (#2813 ) Support starts_with function	2020-01-21 14:09:08 +08:00
yangzhg	7099fcf2d3	Remove unused file (#2819 ) This file was replace by thirdparty/patches/incubator-brpc-0.9.5.patch in pr #2798 . But this file was forgotten to remove in #2798	2020-01-21 13:43:48 +08:00
HangyuanLiu	64e99f29e6	Fix parquet arrow read batch bug (#2812 ) Fix parquet arrow read batch bug #2811 The original code was to determine the number of rows in the batch based on the number of rows in the parquet RowGroup.But now it's a batch take 65535 lines. So when parquet row greater than 65535，the number of batch don't match the number of rowgroup. The code using the field "_current_line_of_group" as a position of array can cause the data to be out of array cause be crash	2020-01-21 10:57:56 +08:00
yangzhg	5dc80dc05d	[Maven] Fix some mistake in fe/pom.xml (#2818 )	2020-01-21 10:38:46 +08:00
xy720	2a30ac2ba5	[SQL] Return NullLiteral in castTo method instead of throwing a exception (#2799 )	2020-01-21 10:20:31 +08:00
WingC	7760495744	[Doc]Update Docker Env to env-1.2 (#2817 )	2020-01-20 22:58:09 +08:00
caiconghui	9dc9051930	Remove unused code for ShowPartitionsStmtTest and add apache license header (#2808 )	2020-01-20 22:51:26 +08:00
yangzhg	acc89411dc	Fix docs sequence error (#2814 )	2020-01-20 22:35:40 +08:00
wenbronk	010f6cd1c1	Update installing/compilation.md (#2816 ) Fix docker images version	2020-01-20 22:27:22 +08:00
caiconghui	58ff952837	[Stmt] Support new show functions syntax to make user search function more conveniently (#2800 ) SHOW [FULL] [BUILTIN] FUNCTIONS [IN\|FROM db] [LIKE 'function_pattern'];	2020-01-20 14:14:42 +08:00
yangzhg	0f829ca4c4	Add arm compatible patches (#2798 )	2020-01-20 00:21:47 +08:00
caiconghui	47a7df17ec	Add notes in java stream load sample to avoid wrong use of stream load (#2802 )	2020-01-19 23:22:17 +08:00
yangzhg	634928e4d0	Fix typo and remove tmp file in ut (#2789 )	2020-01-19 21:33:48 +08:00
LingBin	7c4149cf27	Improve comparison and printing of Version (#2796 ) * Improve comparison and printing of Version There are two members in `Version`:` first` and `second`. There are many places where we need to print one `Version` object and compare two `Version` objects, but in the current code, these two members are accessed directly, which makes the code very tedious. This patch mainly do: 1. Adds overloaded methods for `operator<<()` for `Version`, so we can directly print a Version object; 2. Adds the `cantains()` method to determine whether it is an containment relationship; 3. Uses `operator==()` to determine if two `Version` objects are equal. Because there are too many places need to be modified, there are still some naked codes left, which will be modified later. This patch also removes some necessary header file references. No functional changes in this patch.	2020-01-19 18:04:28 +08:00
WingC	92d8f6ae78	[Alter] Allow submitting alter jobs when table is unstable Alter job will wait table to be stable before running.	2020-01-18 22:56:37 +08:00
caiconghui	ae018043b0	[Alter] Support replication_num setting for table level (#2737 ) Support replication_num setting for table level, so There is no need for user to set replication_num for every alter table add partition statement. eg: `alter table tbl set ("default.replication_num" = "2");`	2020-01-18 21:17:22 +08:00
Youngwb	1550401d4b	Support param exec_mem_limit for spark-doris-connctor (#2775 )	2020-01-18 00:14:39 +08:00
LingBin	c71eefa2ac	Add path util (#2747 ) Note that the methods in path_util are only related to path processing, and do not involve any file and IO operations The upcoming patch will use these util methods, used to extract operations such as concatenation of directory strings from processing logic.	2020-01-18 00:05:00 +08:00
Dayue Gao	a3789ab2af	Refine .clang-format (#2791 )	2020-01-18 00:00:49 +08:00
worker24h	23f472903a	[Routine Load] Fix a bug that `show routine load` will throw Unknown Exception If we connect to a non-master FE and execute `show routine load;`. It may sometimes throw Unknown Exception, because some of fields in thrift result is not set.	2020-01-17 20:46:00 +08:00
jmk1011	6365a7d559	[FE Maven] Change maven repository url from http to https (#2786 ) From January 15th, 2020, Requests to http://repo1.maven.org/maven2/ return a 501 HTTPS Required status. So switch central repository url from http to https	2020-01-17 16:45:04 +08:00
yangzhg	fc55423032	[SQL] Support Grouping Sets, Rollup and Cube to extend group by statement Support Grouping Sets, Rollup and Cube to extend group by statement support GROUPING SETS syntax ``` SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) ); ``` cube or rollup like ``` SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP\|CUBE(a,b,c) ``` [ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039) [FIX] fix analyzer error in window function(#2039)	2020-01-17 16:24:02 +08:00
Dayue Gao	3b24287251	Support 64 bits integers for BITMAP type (#2772 ) Fixes #2771 Main changes in this CL * RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h * leveraging Roaring64Map to support unsigned BIGINT for BITMAP type * introduces two new format (SINGLE64 and BITMAP64) for BITMAP type So far we have three storage format for BITMAP type ``` EMPTY := TypeCode(0x00) SINGLE32 := TypeCode(0x01), UInt32LittleEndian BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/) ``` In order to support BIGINT element and keep backward compatibility, introduce two new format ``` SINGLE64 := TypeCode(0x03), UInt64LittleEndian BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64 ``` Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column. Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are 1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](`35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)`). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this. 2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.	2020-01-17 14:13:38 +08:00
xy720	463c0e87ec	Replace PowerMock/EasyMock by Jmockit (4/4) (#2784 ) This commit replaces the PowerMock/EasyMock in our unit tests. (All)	2020-01-17 14:09:00 +08:00
WingC	8df63bc191	[Doc] Add en doc for dynamic partition feature (#2764 )	2020-01-16 21:54:26 +08:00
LingBin	d0e2fc3305	Remove resource_info related members from TaskWorkerPool (#2704 ) The `TResourceInfo` was used to help `cgruops` to isolate resources, but it is no longer used. In fact, the `TResourceInfo` information is no longer carried in the requests from FE to BE.	2020-01-16 14:39:08 +08:00
xy720	753a7dd73a	Replace PowerMock/EasyMock by Jmockit (3/4)	2020-01-16 13:24:43 +08:00
HangyuanLiu	0ddca59d36	Add timestampadd/timestampdiff function (#2725 )	2020-01-15 21:47:07 +08:00
vinson0526	8ea5907252	Update arrow's version to 0.15.1 and shaded it in spark-doris-connector (#2769 )	2020-01-15 21:08:34 +08:00
xy720	9bc306d17c	Replace PowerMock/EasyMock by Jmockit (2/4) (#2749 )	2020-01-15 20:31:30 +08:00
Mingyu Chen	4496ebb632	[Alter View] Fix bug that alter view operation lost when replaying from image (#2773 ) When "replay" something, we should use Catalog.getCurrentCatalog() instead of Catalog.getInstance(), otherwise, we may get wrong Catalog instance.	2020-01-15 20:04:09 +08:00
kangpinghuang	7fe6431ac7	Fix delete handler init when schema change (#2767 ) delete handler init failed because there are missed version. Schema change should return failure when get version failed.	2020-01-15 15:42:56 +08:00
kangkaisen	54952a24ad	Remove and comment some FE code (#2766 )	2020-01-15 15:14:52 +08:00
Mingyu Chen	9e54751098	[Snapshot] Modify the prefer snapshot version (#2748 ) In this CL, prefer snapshot version in snapshot request is defined in thrift. So that both FE and BE can use this version value.	2020-01-15 15:10:14 +08:00
DanyBin	7768629f08	Add bitmap_contains and bitmap_has_any functions (#2752 )	2020-01-15 14:31:44 +08:00

... 231 232 233 234 235 ...

13073 Commits