doris

Author	SHA1	Message	Date
zhengyu	a7180c5ad8	[fix](segcompaction) fix segcompaction failed for newly created segment (#15022 ) (#15023 ) Currently, newly created segment could be chosen to be compaction candidate, which is prone to bugs and segment file open failures. We should skip last (maybe active) segment while doing segcompaction.	2022-12-19 14:17:58 +08:00
Pxl	219489ca0e	[Bug](s2geo) avoid some core dump on s2geo && enable ut of s2geo #15068	2022-12-16 10:56:02 +08:00
AlexYue	f17b138cbd	[BugFix](regression) don't use sf1DataPath when stream load (#15060 ) don't use sf1DataPath when stream load	2022-12-14 12:39:56 +08:00
Gabriel	1200b22fd2	[function](round) compute accurate round value by decimal (#14946 )	2022-12-13 09:53:43 +08:00
liqing-coder	38570312dd	[feature](split_by_string)support split by string function (#13741 )	2022-12-12 15:22:30 +08:00
Yulei-Yang	33349c3419	[feature](function)Support negative index for function split_part (#13914 )	2022-12-12 09:56:09 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
yixiutt	6a26435e8d	[bugfix](compaction) fix promotion size bug (#14836 )	2022-12-07 18:54:30 +08:00
camby	e279c90965	[fix](ColumnVector) ColumnVector::insert_date_column crashed #14839 ColumnVector::insert_date_column make BE crashed with large data(>512 rows). Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-12-06 09:06:57 +08:00
HappenLee	12304bc0ee	[Pipeline](exec) Support pipeline exec engine (#14736 ) Co-authored-by: Lijia Liu <liutang123@yeah.net> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: shee <13843187+qzsee@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> ## Problem Summary: ### 1. Design DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine ### 2. How to use: Set the environment variable `set enable_pipeline_engine = true; `	2022-12-02 17:11:34 +08:00
yixiutt	3dde97bff1	(compaction) opt compaction task producer and quick compaction (#13495 ) (#14535 ) 1.remove quick_compaction's rowset pick policy, call cu compaction when trigger quick compaction 2. skip tablet's compaction task when compaction score is too small Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-12-02 10:07:44 +08:00
yixiutt	94a6ffb906	[feature](compaction) support vertical_compaction & ordered_data_compaction (#14524 )	2022-12-01 22:15:41 +08:00
Kang	fe95b84c34	[fix](jsonb)fix CAST String to JSONB nullable problem (#14626 ) fix CAST String to SONB nullable problem in DEBUG mode.	2022-11-29 16:22:22 +08:00
Kang	ed92a8f81e	[feature](jsonb function)change jsonb_extract_string behavior and doc (#14619 ) 1. change jsonb_extract_string behavior: convert to string instead of NULL if the type of json path is not string 2. move jsonb tutorial doc to JSONB data type	2022-11-28 11:36:54 +08:00
Kang	52c6ba051e	[feature](jsonb type)refactor JSONB type using column and add testcase (#13778 ) 1. Refactor JSONB type using ColumnString instead making a copy. 2. Add regression testcase for JSONB load and functions.	2022-11-26 10:06:15 +08:00
zy-kkk	7ae7830c50	[improvement](function)add size function alias array_size (#14594 ) * add size function alias * fix	2022-11-25 22:29:48 +08:00
yiguolei	fd3af489a4	[memory](chunkallocator) disable chunkallocator when reserved bytes == 0 (#14494 ) disable chunkallocator when reserved bytes == 0 disable chunkallocator by default	2022-11-23 17:12:53 +08:00
Adonis Ling	249b688663	[chore](github) Add a workflow to check BE UT on macOS (#14506 )	2022-11-23 08:38:28 +08:00
Pxl	bcd641877f	[Enhancement](scan) disable build key range and filters when push down agg work (#14248 ) disable build key range and filters when push down agg work	2022-11-21 12:47:57 +08:00
Gabriel	2c42f0a905	[refactor](decimalv3) Refine code for DecimalV3 (#14394 )	2022-11-19 16:57:17 +08:00
carlvinhust2012	eab0af7afe	[optimization](array-type) optimize the export precision of floating point numbers (#14261 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-11-18 18:24:11 +08:00
Adonis Ling	2b6f85ab96	[chore](macOS) Fix BE UT (#14307 ) #13195 left some unresolved issues. One of them is that some BE unit tests fail. This PR fixes this issue. Now, we can run the command ./run-be-ut.sh --run successfully on macOS.	2022-11-18 10:13:38 +08:00
slothever	6da2948283	[feature-wip](multi-catalog) support iceberg v2(step 1) (#13867 ) Support position delete(part of).	2022-11-17 17:56:48 +08:00
Ashin Gau	20634ab7e3	[feature-wip](multi-catalog) support partition&missing columns in parquet lazy read (#14264 ) PR https://github.com/apache/doris/pull/13917 has supported lazy read for non-predicate columns in ParquetReader, but can't trigger lazy read when predicate columns are partition or missing columns. This PR support such case, and fill partition and missing columns in `FileReader`.	2022-11-16 08:43:11 +08:00
zhengyu	24b51b9035	[fix](compaction) segcompaction coredump if the rowset starts with a big segment (#14174 ) (#14176 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-11-14 09:54:08 +08:00
Xinyi Zou	dd11d5c0a5	[enhancement](memory) Support try catch bad alloc (#14135 )	2022-11-13 11:22:56 +08:00
luozenglin	376b4fda9f	[fix](scankey) fix extended scan key errors. (#14200 ) Issue Number: close #14199	2022-11-12 20:44:09 +08:00
xy720	035657c5a1	[typo](comment) Fix a lot of spell errors in be comments (#14208 ) fix typos.	2022-11-12 16:06:15 +08:00
Yixi Zhang	0ba13af8ff	[feature](running_difference) support running_difference function (#13737 )	2022-11-11 21:22:56 +08:00
HappenLee	74a1e28af3	[Opt](exec) prevent the scan key split whole range (#14088 ) prevent the scan key split whole range	2022-11-11 15:46:00 +08:00
abmdocrt	b6ba654f5b	[Feature](Sequence) Support sequence_match and sequence_count functions (#13785 )	2022-11-11 13:38:45 +08:00
Ashin Gau	6bd5378f66	[feature-wip](multi-catalog) lazy read for ParquetReader (#13917 ) Read predicate columns firstly, and use VExprContext(push-down predicates) to generate the select vector, which is then applied to read the non-predicate columns. The data in non-predicate columns may be skipped by select vector, so the value-decode-time can be reduced. If a whole page can be skipped, the decompress-time can also be reduced.	2022-11-10 16:56:14 +08:00
Pxl	0e26f28bf2	[Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581 ) enlarge runtime filter in predicate threshold	2022-11-10 15:48:46 +08:00
Kang	aec214b4b0	[bug](ColumnDecimal)call set_decimalv2_type when cloning ColumnDecimal (#14061 ) * call set_decimalv2_type when cloning ColumnDecimal * clang format	2022-11-09 11:23:43 +08:00
luozenglin	115c6bd411	[fix](keyranges) fix the split error of keyranges (#14049 ) fix the split error of keyranges	2022-11-08 22:09:16 +08:00
Pxl	9d8b4bc176	[Enhancement](Dictionary-codec) update dict once on same segment (#13936 ) update dict once on same segment	2022-11-08 10:59:35 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
Kang	34f43ac781	[bug](like function)fix like '' (empty string) get wrong result with all rows #14035	2022-11-08 08:51:39 +08:00
yiguolei	32fea672b0	[chore](gutil) remove some gutil macros and solve some macro conflict with brpc (#13954 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-07 13:39:52 +08:00
TengJianPing	04830af039	[fix](tablet sink) fallback to non-vectorized interface in tablet_sink if is in progress of upgrding from 1.1-lts to 1.2-lts (#13966 )	2022-11-05 10:19:51 +08:00
zhengyu	554f566217	[enhancement](compaction) introduce segment compaction (#12609 ) (#12866 ) ## Design ### Trigger Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare. ### Target Selection We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction. ### Compaction Process A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings: - build a MergeIterator from the target segments - create a new segment writer - for each block readed from MergeIterator, the Writer append it ### SegID handling SegID must remain consecutive after segment compaction. If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4: - we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3 - delete seg_0, seg_1, seg_2 and seg_3 - rename seg_0-3 to seg_0 - rename seg_4 to seg_1 It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.	2022-11-04 14:12:51 +08:00
Xinyi Zou	32a029d9dc	[enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795 )	2022-11-03 09:47:12 +08:00
qiye	b83744d2f6	[feature](function)add regexp functions: regexp_replace_one, regexp_extract_all (#13766 )	2022-11-02 23:15:57 +08:00
zhangstar333	374303186c	[Vectorized](function) support topn_array function (#13869 )	2022-11-02 19:49:23 +08:00
Mingyu Chen	942611c185	Revert "[enhancement](compaction) opt compaction task producer and quick compaction (#13495 )" (#13833 ) This reverts commit 4f2ea0776ca3fe5315ab5ef7e00eefabfb5771a0.	2022-11-01 14:22:12 +08:00
Kang	7ae60a0ad2	[feature](function)add url functions: domain and protocol (#13662 )	2022-10-31 19:13:08 +08:00
yixiutt	4f2ea0776c	[enhancement](compaction) opt compaction task producer and quick compaction (#13495 ) 1.remove quick_compaction's rowset pick policy, call cu compaction when trigger quick compaction 2. skip tablet's compaction task when compaction score is too small Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-10-31 12:24:05 +08:00
Pxl	711dad28fb	[Chore](unused) remove QSorter #13769	2022-10-31 08:44:39 +08:00
Ashin Gau	e0667b297f	[feature-wip](multi-catalog) reuse hdfsFs and decode parquet values in batch (#13688 ) PR(https://github.com/apache/doris/pull/13404) introduced that ParquetReader will break up batch insertion when encountering null values, which leads to the bad performance compared to OrcReader. So this PR has pushed null map into decode function, reduce the time of virtual function call when encountering null values. Further more, reuse hdfsFS among file readers to reduce the time of building connection to hdfs.	2022-10-28 15:52:52 +08:00
pengxiangyu	eab8876abc	[Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487 )	2022-10-28 15:48:22 +08:00

1 2 3 4 5 ...

888 Commits