doris

Author	SHA1	Message	Date
yangzhg	ed95352ecd	support intersect and except syntax (#2882 )	2020-02-13 16:48:46 +08:00
令狐少侠	fd492e3b6f	[Doris on ES] Support escape character (#2865 )	2020-02-13 11:32:48 +08:00
LingBin	3c539aac54	[Refactor] Some tiny refactor on streaming-load related code (#2891 ) Mainly contains the following modifications: 1. Use `std::unique_ptr` to replace some naked pointers 2. Modify some methods from member-method to local-static-function 3. Modify some methods do not need to be public to private 4. Some formatting changes: such as wrapping lines that are too long 5. Remove some useless variables 6. Add or modify some comments for easier understanding No functional changes in this patch.	2020-02-13 10:42:52 +08:00
yangzhg	f2875ceb73	[Index] Add column type check when creating bitmap index (#2883 )	2020-02-12 23:05:16 +08:00
yangzhg	3e160aeb66	[GroupingSet] fix a bug when using grouping set without all column in a grouping set item (#2877 ) fix a bug when using grouping sets without all column in a grouping set item will produce wrong value. fix grouping function check will not work in group by clause	2020-02-12 21:50:12 +08:00
LingBin	e9ff40f07f	Add `sync_dir` interface to Env (#2884 ) when we need to ensure that a newly-created file is fully synchronized back to disk, we should call `fsync()` on the parent directory—that is, the directory containing the newly-created file. That is to say, In this situation, we should call `fsync()` on both the newly-created file and its parent directory. Unfortunately, currently in Doris, in any scenario, directories are not fsynced. This patch adds `sync_dir()` interface first, laying the groundwork for future fixes. This patch also removes unneeded private method `dir_exists()`.	2020-02-12 13:55:17 +08:00
LingBin	5440e19d01	Improve the triggering strategy of BE report (#2881 ) Currently, the report from BE to FE is completed in the background threads of `AgentServer` (`report_tablet_thread` and `report_disk_stat_thread`). These two threads will sleep and be in a standby state after each report, if there is any need to report immediately, they will be notified and wake up immediately to report. For example, when background thread (`disk_monitor_thread`) in `StorageEngine` finds some tablets were deleted, it will notify `AgentServer` to trigger a report immediately. In the current implementation, in order to report ASAP, a local variable (`_is_drop_tables`) and two other flags are used to record whether reporting is needed, and then `StorageEngine::disk_monitor_thread` checks the value of this variable every time it runs, to determine whether it needs to be triggered Reporting. This is actually superfluous, and it may result in untimely notifications, as shown below: ``` (thread_1) (thread_2) disk-monitor disk-stat-reporter \| \| \| reporting \| \| notify_1 \| \| \| \| wait_for_notify(will wait until timeout or next notification) \| \| V V ``` When `report_tablet_thread` has not started waiting, `StorageEngine::disk_monitor_thread` triggers a notification, so this notification will not be received by `report_tablet_thread`, resulting in the BE not reporting to the FE until the lock times out or the next round of `disk_monitor_thread` detection. This change restructures the triggering implementation, and solves the above problem. This change also changes some methods(that do not need to be public) to private.	2020-02-11 20:38:44 +08:00
yangzhg	59dd2a0d7f	Fix some typo in bitmap index docs. (#2879 )	2020-02-11 19:03:42 +08:00
wangbo	1f001481ae	Support batch add and drop rollup indexes #2671 (#2781 )	2020-02-11 12:58:01 +08:00
Seaven	fdbc0f7cca	Change replace bzip and boost sources (#2878 ) (#2880 ) replace fast sources for download bzip and boost when build	2020-02-11 12:08:47 +08:00
HangyuanLiu	3a8e783444	Compatible with python3 in build (#2876 )	2020-02-10 21:50:42 +08:00
Jongmin Park	03c8bc91ca	Change MVN to MVN_CMD same as #2837 for builing hdfs broker (#2874 )	2020-02-10 18:40:12 +08:00
LingBin	4e151b1551	Remove boost exception when parse store path (#2861 )	2020-02-10 17:50:52 +08:00
LingBin	c89d0a090c	Fix bug that _min_percentage_of_error_disk was not initialized (#2867 ) In StorageEngine, the variable _min_percentage_of_error_disk was not initialized (so it defaults to 0), which causes the process to exit whenever one disk fails. What we expect is that exit the process only when the number of failed disks reach a certain percentage. Also, this variable should mean the maximum percentage of error disks allowed, not the minimum, so change the configuration name to max_percentage_of_error_disk.	2020-02-10 16:58:24 +08:00
Dayue Gao	7037754978	Fix a bug that TabletsChannel may be written after cancel (#2870 ) TabletsChannel may be written after cancelation, leading to core at DeltaWriter::write. We should check the state of TabletsChannel at the beginning of each operations.	2020-02-10 14:49:00 +08:00
LingBin	77805e85d2	Fix lock type when clear trash (#2868 ) In `TabletManager::start_trash_swee`, the modification of `_tablet_map` should be protected by `write-lock` of `_tablet_map_lock`	2020-02-10 13:14:17 +08:00
Mingyu Chen	c6090965e2	[Unused Code] Remove unused file in gensrc/scripts (#2863 ) gensrc/script/doris_functions.py gensrc/script/gen_opcodes.py These 2 files are useless. Remove them.	2020-02-10 09:45:01 +08:00
WingC	d8a47f9a7f	[Doc] Fix CANCEL DECOMMISSION syntax bug (#2860 ) Fix CANCEL DECOMMISSION syntax bug	2020-02-08 13:07:19 +08:00
yangzhg	502fa2eb50	[GroupingSet] Fix core when using grouping sets in large data (#2858 ) dst_tuples memory size to Allocate is wrong	2020-02-07 21:40:29 +08:00
kangkaisen	feb02ab27a	Make intersect_count function accept any expression that returns bitmap (#2850 )	2020-02-07 09:56:54 +08:00
kangkaisen	e7817053cc	[Uitls] ParseUtil::parse_mem_spec support K and T suffix (#2854 )	2020-02-07 09:31:35 +08:00
Yunfeng,Wu	b35e8153c0	[Doris on Es] Fix lte and gte error expression (#2851 ) LE should LTE GE should GTE	2020-02-06 20:52:14 +08:00
Mingyu Chen	f77cfcdb61	[Compaction] Avoid unnecessary compaction (#2839 ) It is not necessary to perform compaction in the following cases 1. A tablet has only 2 rowsets, the versions are [0-1] and [2-x]. In this case, there is no need to perform base compaction because the [0-1] version is an empty version. Some tables will be partitioned by day, and then each partition will only load one batch of data each day, so a large number of tablets with rowsets [0-1][2-2] will appear. And these tablets do not need to be base compaction. 2. The initial value of the `last successful execution time of compaction` is 0, which causes the first time to determine the time interval from the last successful execution time of compaction, which always meets the conditions to trigger cumulative compaction.	2020-02-06 16:40:38 +08:00
caiconghui	d549c40fcd	Fix spelling mistakes for load metrics description (#2840 )	2020-02-06 10:18:30 +08:00
LingBin	14c772013b	Fix removing tablet bug from partition_map in TabletManager (#2842 ) When using an iterator of _tablet_map.tablet_arr(`std::list`) to remove a tablet, we should first remove tablet from _partition_map to avoid the iterator becoming invalid.	2020-02-06 09:57:12 +08:00
LingBin	e991b1300f	[Code Refactor] Refactor AgentServer to make it less error-prone and more readable (#2831 ) In `AgentServer`, each task type needs to be processed separately, which leads to very long code, hard to read, and not easy to detect errors (for example, some task type processing may be missed, corresponding relationship may be error) Fortunately, the code for each task_type is very similar, so this is a good case to use `MACRO`, which can greatly reduce the repeated code and solve above problems. This patch also fix two small bugs: 1. The `_topic_subscriber` member has not been released in dtor 2. in `submit_tasks()`, the `status_code` is not reset before each task is processed, resulting in wrong judgment. No functional changes in this patch.	2020-02-06 09:56:00 +08:00
ZHAO Chun	25a6d6abbe	Make cmake and maven configurable (#2837 )	2020-02-05 23:04:29 +08:00
yangzhg	581c771ff3	[Doc] Add create index usage document (#2832 )	2020-02-05 15:35:41 +08:00
LingBin	ee5323a6a0	[Code Refactor]Improve initialization flow of Schema (#2833 ) When constructing `Schema` objects, two similar `init` functions need to be called, and the call order is implicitly required, which is easy to be misused. At the same time, some of the existing comments are missing or out of date, which will cause some misleading. This patch unifies the initialization logic of `Schema`. No functional changes in this patch.	2020-02-05 11:48:54 +08:00
kangpinghuang	a27e89065b	Add file cache for v2 (#2782 ) Add file descriptor cache for segment v2 to solve too many open file problems	2020-02-04 00:16:01 +08:00
Mingyu Chen	bb4a7381ae	[UnitTest] Support starting mocked FE and BE process in unit test (#2826 ) This CL implements a simulated FE process and a simulated BE service. You can view their specific usage methods at `fe/src/test/java/org/apache/doris/utframe/DemoTest.java` At the same time, I modified the configuration of the maven-surefire-plugin plugin, so that each unit test runs in a separate JVM, which can avoid conflicts caused by various singleton classes in FE. Starting a separate jvm for each unit test will bring about 30% extra time overhead. However, you can control the number of concurrency of unit tests by setting the `forkCount` configuration of the maven-surefire-plugin plugin in `fe/pom.xml`. The default configuration is still 1 for easy viewing of the output log. If set to 3, the entire FE unit test run time is about 4 minutes.	2020-02-03 21:17:57 +08:00
Mingyu Chen	bb00f7e656	[Load] Fix bug of wrong file group aggregation when handling broker load job (#2824 ) Describe the bug First, In the broker load, we allow users to add multiple data descriptions. Each data description represents a description of a file (or set of files). Including file path, delimiter, table and partitions to be loaded, and other information. When the user specifies multiple data descriptions, Doris currently aggregates the data descriptions belonging to the same table and generates a unified load task. The problem here is that although different data descriptions point to the same table, they may specify different partitions. Therefore, the aggregation of data description should not only consider the table level, but also the partition level. Examples are as follows: data description 1 is: ``` DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1") INTO TABLE `tbl1` PARTITION (p1, p2) ``` data description 2 is: ``` DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2") INTO TABLE `tbl1` PARTITION (p3, p4) ``` What user expects is to load file1 into partition p1 and p2 of tbl1, and load file2 into paritition p3 and p4 of same table. But currently, it will be aggregated together, which result in loading file1 and file2 into all partitions p1, p2, p3 and p4. Second, the following 2 data descriptions are not allowed: ``` DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1") INTO TABLE `tbl1` PARTITION (p1, p2) DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2") INTO TABLE `tbl1` PARTITION (p2, p3) ``` They have overlapping partition(p2), which is not support yet. And we should throw an Exception to cancel this load job. Third, there is a problem with the code implementation. In the constructor of `OlapTableSink.java`, we pass in a string of partition names separated by commas. But at the `OlapTableSink` level, we should be able to pass in a list of partition ids directly, instead of names. ISSUE: #2823	2020-02-03 20:15:13 +08:00
Lijia Liu	99ad56d1bf	Support bitmap index for more type (#2630 ) For #2589 1. date(uint24_t)/datetime(int64_t)/largeint(int128_t) use frame of reference code as dict. 2. decimal(decimal12_t) also uses frame of reference code as dict. 3. float/double use bitshuffle code as dict.	2020-01-31 21:09:29 +08:00
Lishi	89c7234c1c	Support starts_with (str, prefix) function (#2813 ) Support starts_with function	2020-01-21 14:09:08 +08:00
yangzhg	7099fcf2d3	Remove unused file (#2819 ) This file was replace by thirdparty/patches/incubator-brpc-0.9.5.patch in pr #2798 . But this file was forgotten to remove in #2798	2020-01-21 13:43:48 +08:00
HangyuanLiu	64e99f29e6	Fix parquet arrow read batch bug (#2812 ) Fix parquet arrow read batch bug #2811 The original code was to determine the number of rows in the batch based on the number of rows in the parquet RowGroup.But now it's a batch take 65535 lines. So when parquet row greater than 65535，the number of batch don't match the number of rowgroup. The code using the field "_current_line_of_group" as a position of array can cause the data to be out of array cause be crash	2020-01-21 10:57:56 +08:00
yangzhg	5dc80dc05d	[Maven] Fix some mistake in fe/pom.xml (#2818 )	2020-01-21 10:38:46 +08:00
xy720	2a30ac2ba5	[SQL] Return NullLiteral in castTo method instead of throwing a exception (#2799 )	2020-01-21 10:20:31 +08:00
WingC	7760495744	[Doc]Update Docker Env to env-1.2 (#2817 )	2020-01-20 22:58:09 +08:00
caiconghui	9dc9051930	Remove unused code for ShowPartitionsStmtTest and add apache license header (#2808 )	2020-01-20 22:51:26 +08:00
yangzhg	acc89411dc	Fix docs sequence error (#2814 )	2020-01-20 22:35:40 +08:00
wenbronk	010f6cd1c1	Update installing/compilation.md (#2816 ) Fix docker images version	2020-01-20 22:27:22 +08:00
caiconghui	58ff952837	[Stmt] Support new show functions syntax to make user search function more conveniently (#2800 ) SHOW [FULL] [BUILTIN] FUNCTIONS [IN\|FROM db] [LIKE 'function_pattern'];	2020-01-20 14:14:42 +08:00
yangzhg	0f829ca4c4	Add arm compatible patches (#2798 )	2020-01-20 00:21:47 +08:00
caiconghui	47a7df17ec	Add notes in java stream load sample to avoid wrong use of stream load (#2802 )	2020-01-19 23:22:17 +08:00
yangzhg	634928e4d0	Fix typo and remove tmp file in ut (#2789 )	2020-01-19 21:33:48 +08:00
LingBin	7c4149cf27	Improve comparison and printing of Version (#2796 ) * Improve comparison and printing of Version There are two members in `Version`:` first` and `second`. There are many places where we need to print one `Version` object and compare two `Version` objects, but in the current code, these two members are accessed directly, which makes the code very tedious. This patch mainly do: 1. Adds overloaded methods for `operator<<()` for `Version`, so we can directly print a Version object; 2. Adds the `cantains()` method to determine whether it is an containment relationship; 3. Uses `operator==()` to determine if two `Version` objects are equal. Because there are too many places need to be modified, there are still some naked codes left, which will be modified later. This patch also removes some necessary header file references. No functional changes in this patch.	2020-01-19 18:04:28 +08:00
WingC	92d8f6ae78	[Alter] Allow submitting alter jobs when table is unstable Alter job will wait table to be stable before running.	2020-01-18 22:56:37 +08:00
caiconghui	ae018043b0	[Alter] Support replication_num setting for table level (#2737 ) Support replication_num setting for table level, so There is no need for user to set replication_num for every alter table add partition statement. eg: `alter table tbl set ("default.replication_num" = "2");`	2020-01-18 21:17:22 +08:00
Youngwb	1550401d4b	Support param exec_mem_limit for spark-doris-connctor (#2775 )	2020-01-18 00:14:39 +08:00

1 2 3 4 5 ...

1491 Commits