doris

Author	SHA1	Message	Date
lichaoyong	da2838e5fe	Set AGG_KEYS upon upgrade from tablet if has_keys_type() is false (#2620 ) Doris support AGG_KEYS/UNIQUE_KEYS/DUP_KEYS/ three storage model. Among these three model, UNIQUE_KYES/DUP_KEYS is added after AGG_KEYS. For historical tablet, the keys_type field to indicate storage model may be missed for AGG_KEYS. So upgrade from historical tablet, this situation should be taken into consideration and set to be AGG_KEYS.	2019-12-30 23:17:16 +08:00
Youngwb	feda66f99f	Spark return error to users when spark on doris query failed (#2531 )	2019-12-30 21:58:13 +08:00
Dayue Gao	da8c9b4429	[Segment V2] refactor SegmentReaderWriterTest and add UT for lazy materialization (#2614 )	2019-12-30 21:07:58 +08:00
kangpinghuang	368bbfd426	Fix linked schema change bug #2610 (#2613 )	2019-12-30 15:48:52 +08:00
kangkaisen	db698978da	Make from_unixtime and date_format function support grayscale upgrade (#2612 )	2019-12-30 13:55:23 +08:00
LingBin	ffea3f8825	[env] Add CREATE_OR_OPEN and rename existing open modes (#2604 ) The upcoming patch will use CREATE_OR_OPEN mode This patch also remove virtual dtors to cpp file. * Move the dtors back to env.h Generally, placing the dtor in an `.h` file(inline) or in a `cpp` file depends on the trade-off between code expansion and function call overhead. The code expansion rate is closely related to the number of class members and the inheritance level. For the several classes here: `Env`, `ReadableFile`, and `WritableFile` have no members and are the top level of the inheritance hierarchy, But for now I have no obvious evidence to prove that make their dtors inline will cause serious code expansion and more instruction cache-misses, even if there are thousands of `ReadableFile` objects kept being created and released during running.	2019-12-30 13:51:38 +08:00
LingBin	7afbda803a	Fix memory leak when compression fails in ColumnWriter (#2606 ) Only the Pages in the linked-list can be destructed in the ColumnWriter dtor, but if we meet something wrong, we will return directly, which causes a memory leak	2019-12-27 22:31:02 +08:00
LingBin	379619dfbd	Unify the names of methods in `TabletManager` which do not require locks (#2525 ) * Unify the names of methods in `TabletManager` which do not require locks Currently, there are several naming patterns in `TabletManager` class for methods (mainly private methods) that needs to be executed inside the lock: 1. `xxx_with_no_lock()`: The "with_no_lock" suffix has two meanings: one is not needed, and the other is that a lock has been added externally; 2. `xxx_unlock()`: "unlock" is a verb and may be mistaken for the need to unlock a mutex in this method. 3. `xxx_unlocked()`: Note that "unlocked" is an adjective that means the operation in this method is not locked. 4. `xxx_locked()`: "locked" is also an adjective, meaning that the method is locked. This is also more likely to be misunderstood: one is already locked externally; the other is locked internally by the method. Actually what we really want is `xxx_already_locked`, but this way the name is a little longer. 5. There is no identification in the method name: the reader cannot intuitively know whether the method needs to be locked This patch unifies all the above pattern to be `xxx_unlocked()`, and adjust some indentation in code style. Additionally, this patch also remove an unused `add_tablet()` method, because a new version has already been used. This patch doesn't contain any functional modifications.	2019-12-27 02:34:35 -06:00
Yunfeng,Wu	e41aef54f2	[Doirs-On-Elasticsearch] Accerlate first scroll search (#2575 ) Add terminate_after for the first scroll to avoid decompress all postings list	2019-12-27 14:05:21 +08:00
kangkaisen	5fd7133e69	Fix bitmap, hll, segment v2 DefaultValue bug (#2570 ) 1. Change the bitmap and HLL default value to empty bitmap and empty bitmap HLL 2. Fix DefaultValueColumnIterator bug 3. Fix uint24.h ostream bug	2019-12-27 14:01:45 +08:00
kangkaisen	50d89f548b	Remove meaningless warn log for NullValueReader (#2586 )	2019-12-27 14:00:01 +08:00
Seaven	4ed87964fe	Add zip util(#2348 ) (#2441 ) Support .zip file extract by minizip	2019-12-27 10:10:21 +08:00
Mingyu Chen	1421a9be41	[Compaction] Support compact only one rowset (#2558 ) Support compaction operation to compact only one rowset. After the modification, the last rowset of the tablet will also be compacted. At the same time, we added a `segments_overlap_pb` field to the rowset meta. Used to describe whether the segment data in the rowset overlaps. This field is set by `rowset_writer`. Initially UNKNOWN for compatibility with existing data. In addition, the version hash of the rowset generated after compaction is directly set to the version hash of last rowset participating in compaction, to ensure that the tablet's version hash remains unchanged after compaction.	2019-12-27 10:08:41 +08:00
caiconghui	043a9528f7	Support decompressing csv file with deflate format in hdfs broker load (#2583 )	2019-12-27 08:06:22 +08:00
WingC	f7032b07f3	Support more schema change from VARCHAR type (#2501 )	2019-12-26 22:38:53 +08:00
Dayue Gao	11f8d542db	[Segment V2] Support lazy-materialization-read (#2547 ) Current read path of SegmentIterator ---- 1. apply short key index and various column indexes to get the row ranges (ordinals of rows) to scan 2. read all return columns according to the row ranges 3. evaluate column predicates on the RowBlockV2 to further prune rows Problem ---- When the column predicates at step 3 could filter a large proportion of rows in RowBlockV2, most values of non-predicate columns we read at step 2 are thrown away, i.e we did lots of useless work and I/O at step 2. Lazy materialization read ---- With lazy materialization, the read path changes to 1. apply short key index and various column indexes to get the row ranges (ordinals of rows) to scan (unchanged) 2. read only predicate columns according to the row ranges 3. evaluate column predicates on the RowBlockV2 to further prune rows, a selection vector is maintained to indicate the selected rows 4. read the remaining columns based on the selection vector of RowBlockV2 In this way, we could avoid reading values of non-predicate columns of all rows that can't pass the predicates. Example ---- ``` function: seek(ordinal), read(block_offset, count) (step 1) row ranges: [0,2),[4,8),[10,11),[15,20) (step 1) row ordinals: [0 1 4 5 6 7 10 15 16 17 18 19] (step 2) read of predicate columns: seek(0),read(0,2),seek(4),read(2,4),seek(10),read(6,1),seek(15),read(7,5) (step 3) selection vector: [3 4 5 6] (step 3) selected ordinals: [5 6 7 10] (step 4) read of remaining columns: seek(5),read(3,3),seek(10),read(6,1) ``` Performance evaluation ---- Lazy materialization is particularly useful when column predicates could filter many rows and lots of big metrics (e.g., hll and bitmap type columns) are queried. In our internal test cases on bitmap columns, queries run 20%~120% faster when using lazy materialization.	2019-12-26 22:00:16 +08:00
lichaoyong	3e3cdd8f2e	Add log to indicate version upon scan failed (#2582 )	2019-12-26 20:09:14 +08:00
kangpinghuang	ee64ab55db	Fix segment size (#2549 )	2019-12-26 11:51:53 +08:00
HangyuanLiu	6444187908	Fix Bug : Load parquet data during the upgrade may result in data errors (#2556 )	2019-12-24 23:27:33 +08:00
kangpinghuang	7f48bd3c5a	Support bloom filter index for large int type (#2550 )	2019-12-24 19:04:03 +08:00
kangpinghuang	f9685372a1	Fix bloom filter bug #2526 (#2532 )	2019-12-24 07:45:11 +08:00
Mingyu Chen	a511042397	[Export] Forget to set timeout for export job (#2516 )	2019-12-23 18:14:41 +08:00
yangzhg	5ff5bf20c9	Fix core dump when using datetime in window function (#2482 )	2019-12-23 09:38:37 +08:00
kangpinghuang	b4d935ab37	Fix compaction with delete rowset bug (#2523 ) [STORAGE][SEGMENTV2] when base compaction rowsets with delete rowset of more than two condition, stats rows_del_filtered is wrong and compaction will fail because of line check.	2019-12-21 12:13:46 +08:00
HangyuanLiu	5b9b0a84d5	Add curdate function (#2521 )	2019-12-20 21:23:16 +08:00
kangkaisen	6815979ba5	Fix invalid to_bitmap input lead to BE core (#2510 )	2019-12-19 21:28:00 +08:00
Mingyu Chen	5111f8cfe8	[Export] Fix bug that NPE may be thrown when executing "show export;" (#2509 ) Some export job from old version of Doris may not has timeout property, which will cause NPE. 2 more changes: 1. Change the default BE config "max_runnings_transactions" to 2000. 2. Add a new metric to FE to show the master ip:port.	2019-12-19 19:09:25 +08:00
EmmyMiao87	49b8097495	Fix the core of get_next in exchange node (#2505 ) The _input_batch hasn't been initialized in exchange node. The undefined behavior will cause that the BE wants to get the capacity of input_batch before BE initialize it. The issue is #2504	2019-12-19 16:40:33 +08:00
kangpinghuang	63ea05f9c7	Add convert tablet rowset type (#2294 ) to solve the issue #2246. scheme is as following: add a optional preferred_rowset_type in TabletMeta for V2 format rollup index tablet add a boolean session variable use_v2_rollup, if set true, the query will v2 storage format rollup index to process the query. test queries will be sent to online service to verify the correctness of segment-v2 by send the the same queries to fe with use_v2_rollup set or not to check whether the returned results are the same.	2019-12-18 18:49:47 +08:00
Youngwb	48f559600f	Fix bug when spark on doris run long time (#2485 )	2019-12-18 13:08:21 +08:00
Mingyu Chen	222f8390c7	[Compaction] Fix the bug that cumulative point grows unreasonably (#2490 ) When there are to many segment in one rowset, which is larger than BE config 'max_cumulative_compaction_num_singleton_deltas', the cumulative compaction will not work and just increase the cumulative point, because there is only once rowset being selected. So when selecting rowset for cumulative compaction, we should meet 2 requirments before finishing the selection logic: 1. compaction score is larger than 'max_cumulative_compaction_num_singleton_deltas' 2. at least 2 rowsets are selected.	2019-12-18 12:59:17 +08:00
WingC	c81b1db406	Support convert VARCHAR type to DATE type (#2489 )	2019-12-18 12:58:47 +08:00
kangpinghuang	d31f774852	Add block split bloom filter (#2471 ) [STORAGE][SEGMENTV2] use block split bloom filter build bloom filter against data page add distinct value to bloom filter add ordinal index to bloom filter index	2019-12-18 12:57:44 +08:00
WingC	89003b774b	Support Convert Varchar to INT (#2481 )	2019-12-17 22:02:28 +08:00
Mingyu Chen	e1ba0efbc7	Optimize compaction strategy of tablet on BE (#2473 ) The current compaction selection strategy and cumulative point update logic will cause the cumulative compaction to not work, and all compaction tasks will be completed only by the base compaction. This can cause a large number of data versions to pile up. In the current cumulative point update logic, when a cumulative cannot select enough number of rowsets, it will directly increase the cumulative point. Therefore, when the data version generates the same speed as the cumulative compaction polling, it will cause the cumulative point to continuously increase without triggering the cumulative compaction. The new strategy mainly modifies the update logic of cumulative point to ensure that the above problems do not occur. At the same time, the new strategy also takes into account the problem that compaction cannot be performed if cumulative points stagnate for a long time. Cumulative points will be forced to increase through threshold settings to ensure that compaction has a chance to execute. Also add a new HTTP API to view the compaction status of specified tablet. See `compaction-action.md` for details.	2019-12-17 10:30:43 +08:00
kangkaisen	d00c5e3066	Fix base_compaction minor log error (#2461 )	2019-12-16 13:45:19 +08:00
Seaven	e4cc17599f	Add plugin definition (#2351 )	2019-12-13 21:38:17 +08:00
kangkaisen	cf6d705df9	Add intersect_count UDAF (#2418 ) 1 Because we don't support array type currently, so I use variable arguments instead. 2 intersect_count directly return final count, not bitmap like bitmap_union, because intersect_count return bitmap is more complex and need more serialize. If we really need bitmap format from intersect_count, we could do that in another PR and which won't have compatibility problems.	2019-12-13 16:12:05 +08:00
lichaoyong	14293b39f3	Fix RLE encoding/decoding bug upon large negative number. (#2448 ) Doris have use RLE to encoding/decoding integer. Four types are comprised of the RLE encoding/decoding algorithm. Short Repeat : used for short repeating integer sequences. Direct : used for integer sequences whose values have a relatively constant bit width. Patched Base : used for integer sequences whose bit widths varies a lot. Delta : used for monotonically increasing or decreasing sequences. This bug occurs in Patched Base Type for large negative number. In patched base, base value is stored 1 to 8 bytes and encoding to 0 ~ 7. If the base value is 8 byte, the encoding value for base width should be 7. But now will encoding to 8, this is problem. It will result in inconsistent data with loaded data because wrong encoding procedure. In extreme case, the BE process will be cored dump because illegal address.	2019-12-13 08:51:05 +08:00
Lijia Liu	4d958ec7a1	Fix BE do_tablet_meta_checkpoint retain _meta_lock for a long time (#2430 ) Add a flag in RowsetMeta to record whether it has been deleted from rowset meta. Before this PR, 37156 rowsets only cost 1642 s. With this PR, 37319 rowsets just cost 1 s.	2019-12-12 23:21:43 +08:00
Dayue Gao	94d60122a3	encoding of ColumnMetaPB should not be DEFAULT_ENCODING (#2451 ) [Storage][V2 Format] Currently all columns use DEFAULT_ENCODING as ColumnMetaPB.encoding. However we may change the default encoding type for a data type in the future, therefore concrete encoding type such as PLAIN_ENCODING/BIT_SHUFFLE should be stored in column meta in order to support encoding evolution.	2019-12-12 23:01:41 +08:00
Mingyu Chen	c39d35df4c	Add tablet compaction score metrics (#2427 ) [Metric] Add tablet compaction score metrics Backend: Add metric "tablet_max_compaction_score" to monitor the current max compaction score of tablets on this Backend. This metric will be updated each time the compaction thread picking tablets to compact. Frontend: Add metric "tablet_max_compaction_score" for each Backend. These metrics will be updated when backends report tablet. And also add a calculated metric "max_tablet_compaction_core" to monitor the max compaction core of tablets on all Backends.	2019-12-12 17:46:59 +08:00
kangkaisen	a5f52f80df	Add bitmap_hash function (#2439 ) Add a bitmap_hash function. Add a murmur_hash3_32 hash function.	2019-12-12 16:55:07 +08:00
kangpinghuang	c07f37d78c	[Segment V2] Add a control framework between FE and BE through heartbeat #2247 (#2364 ) The control framework is implemented through heartbeat message. Use uint64_t as flags to control different functions. Now add a flag to set the default rowset type to beta.	2019-12-12 12:18:32 +08:00
LingBin	913792ce2b	Add copy_object() method for HLL columns when loading (#2422 ) Currently, special treatment is used for HLL types (and OBJECT types). When loading data, because there is no need to serialize HLL content (the upper layer has already done), we directly save the pointer of `HyperLogLog` object in `Slice->data` (at the corresponding `Cell` in each `Row`) and make `Slice->size` to be 0. This logic is different from when reading the HLL column. When reading, we need to deserialize the HLL object from the `Slice` object. This causes us to have different implementations of `copy_row()` when loading and reading. In the optimization(commit: 177fec8917304e399aa7f3facc4cc4804e72ce8b), the logic of `copy_row()` was added before a row can be added into the `MemTable`, but the current `copy_row()` treats the `HLL column Cell` as a normal Slice object(i.e. will memcpy its data according its size). So this change adds a `copy_object()` method to `TypeInfo`, which is used to copy the HLL column during loading data. Note: The way of copying rows should be unified in the future. At that time, we can delete the `copy_object()` method.	2019-12-11 22:07:51 +08:00
Dayue Gao	5312e840d2	Fix heap-use-after-free in TxnManager::force_rollback_tablet_related_txns (#2435 )	2019-12-11 21:49:26 +08:00
Dayue Gao	c42b6c34cd	Fix alloc-dealloc-mismatch in OrdinalPageIndex (#2437 )	2019-12-11 21:39:48 +08:00
Dayue Gao	83b5455be5	[Load] Fix several races in stream load that could cause BE crash (#2414 ) This CL fixes the following problems 1. check whether TabletsChannel has been closed/cancelled in `reduce_mem_usage` to avoid using a closed DeltaWriter 2. make `FlushHandle.wait` wait for all submitted tasks to finish so that memtable is deallocated before its delta writer 3. make `~MemTracker()` release its consumption bytes to accommodate situations in aggregate_func.h that bitmap and hll call `MemTracker::consume` without corresponding `MemTracker::release`, which cause the consumption of root tracker never drops to zero	2019-12-10 21:59:05 +08:00
WingC	af3d901a06	Convert INT type to DATE type (#2393 )	2019-12-07 21:56:52 +08:00
令狐少侠	afd6784dbb	Fix bug of sleep (#2409 )	2019-12-07 21:49:16 +08:00

1 2 3 4 5 ...

649 Commits