doris

Author	SHA1	Message	Date
Dayue Gao	4e2f01a9fa	[Compaction] Fix a bug that CumulativeCompaction compares time of different precision (#2693 ) time(NULL) returns second-resolution timestamp, however all compaction related time in Tablet are in millis-resolution. Therefore should use UnixMillis() instead.	2020-01-07 21:31:36 +08:00
LingBin	844ccaafc9	Remove boost filesystem exception in FileUtils (#2692 ) If `errer_code` is provided, then the `boost::filesystem` functions will not throw an exception, so we do not need to catch it.	2020-01-07 07:29:05 -06:00
kangkaisen	7d2610d091	Change bitmap functions return type to BITMAP (#2690 )	2020-01-07 19:27:21 +08:00
yangzhg	852046de29	Fix incompatibility with arm architecture in olap #2645 (#2682 )	2020-01-07 19:16:10 +08:00
HangyuanLiu	2326b478b6	Support load orc format in Apache Doris (#2554 ) Support load orc format in Apache Doris	2020-01-07 14:22:43 +08:00
yangzhg	de4d1778c6	Fix incompatibility with arm architecture in util and gutil (#2650 ) 1. upgrade gutil code from imapla to new verison， include `cpuinfo`, `spinlock` and `linux_syscall_support ` 2. impliments arm version utf8 check code 3. remove incompatible code from stopwatch	2020-01-06 18:39:31 +08:00
WingC	7f148c188e	[Build]Make set target arch universal (#2660 )	2020-01-06 14:46:07 +08:00
ZHAO Chun	87a50070c4	Fix bug: parquet scanner don't seek (#2661 )	2020-01-06 13:55:40 +08:00
WingC	220ed8436c	[Unit Test]Fix Schema Change Test Case (#2659 )	2020-01-05 20:08:23 +08:00
ZHAO Chun	1648226927	Adapt arrow 0.15 API (#2657 ) This CL supports arrow's zero copy read interface, which can make code comply with arrow 0.15. And the schema change unit test has some problem, I disable it in run-ut.sh	2020-01-04 15:54:29 +08:00
kangkaisen	5dff936243	Fix HLL_UNION_AGG AnalyticFn result in BE core by adding hll_get_value (#2653 )	2020-01-03 19:23:56 +08:00
yangzhg	c098178f7a	[Index] Implements create drop show index syntax for bitmap index [#2487 ] (#2573 ) ### create table with index ``` CREATE TABLE table1 ( siteid INT DEFAULT '10', citycode SMALLINT, username VARCHAR(32) DEFAULT '', pv BIGINT SUM DEFAULT '0', INDEX index_name [USING BITMAP] (siteid, citycode) COMMENT 'balabala' ) AGGREGATE KEY(siteid, citycode, username) DISTRIBUTED BY HASH(siteid) BUCKETS 10 PROPERTIES("replication_num" = "1"); ``` ### create index ``` CREATE INDEX index_name ON table1 (siteid, citycod) [USING BITMAP] COMMENT 'balabala'; or ALTER TABLE table1 ADD INDEX index_name [USING BITMAP] (siteid, citycod) COMMENT 'balabala'; ``` ### drop index ``` DROP INDEX index_name ON table1; or ALTER TABLE table1 DROP INDEX index_name ``` ### show index ``` SHOW INDEX[ES] FROM table1 ``` output ``` +---------+-------------+-----------------+------------+---------+ \| Table \| Index_name \| Column_name \| Index_type \| Comment \| +---------+-------------+-----------------+------------+---------+ \| table1 \| index_name \| siteid,citycode \| BITMAMP \| balabala\| +---------+-------------+-----------------+------------+---------+ ```	2020-01-03 17:41:26 +08:00
kangpinghuang	7951e15208	Fix estimate_segment_size problem #2643 (#2644 )	2020-01-03 11:11:34 +08:00
Mingyu Chen	9c90b09a3f	[Alter Table] No need to check whether table is stable when doing some kinds of alter operation (#2617 ) * [Alter Table] No need to check whether table is stable when doing some kinds of alter operation. Not all alter table operation require table to be stable. Such as rename, modify meta data.	2020-01-02 20:51:23 +08:00
令狐少侠	d05768ffd4	Fix core when es_scanner_node exit (#2634 )	2020-01-02 16:30:11 +08:00
Mingyu Chen	6cab929d6d	[Compaction] Limit the max concurrency of running compaction tasks (#2635 ) Compaction task may sometimes consume much memory and results in OOM. And currently, there is no good way to predict the mem consumption of a compaction task, so I add a new BE config: max_compaction_concurrency to limit the max concurrency of running compaction tasks manually.	2020-01-02 14:47:54 +08:00
Mingyu Chen	cc924c9e6a	[Rowset Reader] Improve the merge read efficiency of alpha rowsets (#2632 ) When merge reads from one rowset with multi overlapping segments, I introduce a priority queue(A Minimum heap data structure) for multipath merge sort, to replace the old N*M time complexity algorithm. This can significantly improve the read efficiency when merging large number of overlapping data. In mytest: 1. Compaction with 187 segments reduce time from 75 seconds to 42 seconds 2. Compaction with 3574 segments cost 43 seconds, and with old version, I kill the process after waiting more than 10 minutes... This CL only change the reads of alpha rowset. Beta rowset will be changed in another CL. ISSUE: #2631	2020-01-02 14:10:05 +08:00
yangzhg	2a8e77d9cb	Support arm atomicops (#2626 ) (#2627 )	2019-12-31 22:39:22 +08:00
lichaoyong	4c5b0b6dc9	Remove VersionHash used to comparison in BE (#2622 )	2019-12-31 19:38:45 +08:00
LingBin	13733d91e3	Fix the missing sync in SegmentWriter (#2623 ) In the default configuration, `WritableFile` does not sync when close file. We need to do it manually to ensure durability.	2019-12-31 18:34:40 +08:00
wkhappy1	9783fb7221	Fix: UDF version `GLIBCXX_3.4.21' not found (#2629 )	2019-12-31 18:32:42 +08:00
kangpinghuang	5229ea24da	Fix bloom filter statistics bug (#2609 )	2019-12-30 23:23:39 +08:00
lichaoyong	da2838e5fe	Set AGG_KEYS upon upgrade from tablet if has_keys_type() is false (#2620 ) Doris support AGG_KEYS/UNIQUE_KEYS/DUP_KEYS/ three storage model. Among these three model, UNIQUE_KYES/DUP_KEYS is added after AGG_KEYS. For historical tablet, the keys_type field to indicate storage model may be missed for AGG_KEYS. So upgrade from historical tablet, this situation should be taken into consideration and set to be AGG_KEYS.	2019-12-30 23:17:16 +08:00
Youngwb	feda66f99f	Spark return error to users when spark on doris query failed (#2531 )	2019-12-30 21:58:13 +08:00
Dayue Gao	da8c9b4429	[Segment V2] refactor SegmentReaderWriterTest and add UT for lazy materialization (#2614 )	2019-12-30 21:07:58 +08:00
kangpinghuang	368bbfd426	Fix linked schema change bug #2610 (#2613 )	2019-12-30 15:48:52 +08:00
kangkaisen	db698978da	Make from_unixtime and date_format function support grayscale upgrade (#2612 )	2019-12-30 13:55:23 +08:00
LingBin	ffea3f8825	[env] Add CREATE_OR_OPEN and rename existing open modes (#2604 ) The upcoming patch will use CREATE_OR_OPEN mode This patch also remove virtual dtors to cpp file. * Move the dtors back to env.h Generally, placing the dtor in an `.h` file(inline) or in a `cpp` file depends on the trade-off between code expansion and function call overhead. The code expansion rate is closely related to the number of class members and the inheritance level. For the several classes here: `Env`, `ReadableFile`, and `WritableFile` have no members and are the top level of the inheritance hierarchy, But for now I have no obvious evidence to prove that make their dtors inline will cause serious code expansion and more instruction cache-misses, even if there are thousands of `ReadableFile` objects kept being created and released during running.	2019-12-30 13:51:38 +08:00
LingBin	7afbda803a	Fix memory leak when compression fails in ColumnWriter (#2606 ) Only the Pages in the linked-list can be destructed in the ColumnWriter dtor, but if we meet something wrong, we will return directly, which causes a memory leak	2019-12-27 22:31:02 +08:00
LingBin	379619dfbd	Unify the names of methods in `TabletManager` which do not require locks (#2525 ) * Unify the names of methods in `TabletManager` which do not require locks Currently, there are several naming patterns in `TabletManager` class for methods (mainly private methods) that needs to be executed inside the lock: 1. `xxx_with_no_lock()`: The "with_no_lock" suffix has two meanings: one is not needed, and the other is that a lock has been added externally; 2. `xxx_unlock()`: "unlock" is a verb and may be mistaken for the need to unlock a mutex in this method. 3. `xxx_unlocked()`: Note that "unlocked" is an adjective that means the operation in this method is not locked. 4. `xxx_locked()`: "locked" is also an adjective, meaning that the method is locked. This is also more likely to be misunderstood: one is already locked externally; the other is locked internally by the method. Actually what we really want is `xxx_already_locked`, but this way the name is a little longer. 5. There is no identification in the method name: the reader cannot intuitively know whether the method needs to be locked This patch unifies all the above pattern to be `xxx_unlocked()`, and adjust some indentation in code style. Additionally, this patch also remove an unused `add_tablet()` method, because a new version has already been used. This patch doesn't contain any functional modifications.	2019-12-27 02:34:35 -06:00
Yunfeng,Wu	e41aef54f2	[Doirs-On-Elasticsearch] Accerlate first scroll search (#2575 ) Add terminate_after for the first scroll to avoid decompress all postings list	2019-12-27 14:05:21 +08:00
kangkaisen	5fd7133e69	Fix bitmap, hll, segment v2 DefaultValue bug (#2570 ) 1. Change the bitmap and HLL default value to empty bitmap and empty bitmap HLL 2. Fix DefaultValueColumnIterator bug 3. Fix uint24.h ostream bug	2019-12-27 14:01:45 +08:00
kangkaisen	50d89f548b	Remove meaningless warn log for NullValueReader (#2586 )	2019-12-27 14:00:01 +08:00
Seaven	4ed87964fe	Add zip util(#2348 ) (#2441 ) Support .zip file extract by minizip	2019-12-27 10:10:21 +08:00
Mingyu Chen	1421a9be41	[Compaction] Support compact only one rowset (#2558 ) Support compaction operation to compact only one rowset. After the modification, the last rowset of the tablet will also be compacted. At the same time, we added a `segments_overlap_pb` field to the rowset meta. Used to describe whether the segment data in the rowset overlaps. This field is set by `rowset_writer`. Initially UNKNOWN for compatibility with existing data. In addition, the version hash of the rowset generated after compaction is directly set to the version hash of last rowset participating in compaction, to ensure that the tablet's version hash remains unchanged after compaction.	2019-12-27 10:08:41 +08:00
caiconghui	043a9528f7	Support decompressing csv file with deflate format in hdfs broker load (#2583 )	2019-12-27 08:06:22 +08:00
WingC	f7032b07f3	Support more schema change from VARCHAR type (#2501 )	2019-12-26 22:38:53 +08:00
Dayue Gao	11f8d542db	[Segment V2] Support lazy-materialization-read (#2547 ) Current read path of SegmentIterator ---- 1. apply short key index and various column indexes to get the row ranges (ordinals of rows) to scan 2. read all return columns according to the row ranges 3. evaluate column predicates on the RowBlockV2 to further prune rows Problem ---- When the column predicates at step 3 could filter a large proportion of rows in RowBlockV2, most values of non-predicate columns we read at step 2 are thrown away, i.e we did lots of useless work and I/O at step 2. Lazy materialization read ---- With lazy materialization, the read path changes to 1. apply short key index and various column indexes to get the row ranges (ordinals of rows) to scan (unchanged) 2. read only predicate columns according to the row ranges 3. evaluate column predicates on the RowBlockV2 to further prune rows, a selection vector is maintained to indicate the selected rows 4. read the remaining columns based on the selection vector of RowBlockV2 In this way, we could avoid reading values of non-predicate columns of all rows that can't pass the predicates. Example ---- ``` function: seek(ordinal), read(block_offset, count) (step 1) row ranges: [0,2),[4,8),[10,11),[15,20) (step 1) row ordinals: [0 1 4 5 6 7 10 15 16 17 18 19] (step 2) read of predicate columns: seek(0),read(0,2),seek(4),read(2,4),seek(10),read(6,1),seek(15),read(7,5) (step 3) selection vector: [3 4 5 6] (step 3) selected ordinals: [5 6 7 10] (step 4) read of remaining columns: seek(5),read(3,3),seek(10),read(6,1) ``` Performance evaluation ---- Lazy materialization is particularly useful when column predicates could filter many rows and lots of big metrics (e.g., hll and bitmap type columns) are queried. In our internal test cases on bitmap columns, queries run 20%~120% faster when using lazy materialization.	2019-12-26 22:00:16 +08:00
lichaoyong	3e3cdd8f2e	Add log to indicate version upon scan failed (#2582 )	2019-12-26 20:09:14 +08:00
kangpinghuang	ee64ab55db	Fix segment size (#2549 )	2019-12-26 11:51:53 +08:00
HangyuanLiu	6444187908	Fix Bug : Load parquet data during the upgrade may result in data errors (#2556 )	2019-12-24 23:27:33 +08:00
kangpinghuang	7f48bd3c5a	Support bloom filter index for large int type (#2550 )	2019-12-24 19:04:03 +08:00
kangpinghuang	f9685372a1	Fix bloom filter bug #2526 (#2532 )	2019-12-24 07:45:11 +08:00
Mingyu Chen	a511042397	[Export] Forget to set timeout for export job (#2516 )	2019-12-23 18:14:41 +08:00
yangzhg	5ff5bf20c9	Fix core dump when using datetime in window function (#2482 )	2019-12-23 09:38:37 +08:00
kangpinghuang	b4d935ab37	Fix compaction with delete rowset bug (#2523 ) [STORAGE][SEGMENTV2] when base compaction rowsets with delete rowset of more than two condition, stats rows_del_filtered is wrong and compaction will fail because of line check.	2019-12-21 12:13:46 +08:00
HangyuanLiu	5b9b0a84d5	Add curdate function (#2521 )	2019-12-20 21:23:16 +08:00
kangkaisen	6815979ba5	Fix invalid to_bitmap input lead to BE core (#2510 )	2019-12-19 21:28:00 +08:00
Mingyu Chen	5111f8cfe8	[Export] Fix bug that NPE may be thrown when executing "show export;" (#2509 ) Some export job from old version of Doris may not has timeout property, which will cause NPE. 2 more changes: 1. Change the default BE config "max_runnings_transactions" to 2000. 2. Add a new metric to FE to show the master ip:port.	2019-12-19 19:09:25 +08:00
EmmyMiao87	49b8097495	Fix the core of get_next in exchange node (#2505 ) The _input_batch hasn't been initialized in exchange node. The undefined behavior will cause that the BE wants to get the capacity of input_batch before BE initialize it. The issue is #2504	2019-12-19 16:40:33 +08:00

1 2 3 4 5 ...

697 Commits