doris

Author	SHA1	Message	Date
luozenglin	115c6bd411	[fix](keyranges) fix the split error of keyranges (#14049 ) fix the split error of keyranges	2022-11-08 22:09:16 +08:00
Pxl	9d8b4bc176	[Enhancement](Dictionary-codec) update dict once on same segment (#13936 ) update dict once on same segment	2022-11-08 10:59:35 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
Kang	34f43ac781	[bug](like function)fix like '' (empty string) get wrong result with all rows #14035	2022-11-08 08:51:39 +08:00
yiguolei	32fea672b0	[chore](gutil) remove some gutil macros and solve some macro conflict with brpc (#13954 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-07 13:39:52 +08:00
TengJianPing	04830af039	[fix](tablet sink) fallback to non-vectorized interface in tablet_sink if is in progress of upgrding from 1.1-lts to 1.2-lts (#13966 )	2022-11-05 10:19:51 +08:00
zhengyu	554f566217	[enhancement](compaction) introduce segment compaction (#12609 ) (#12866 ) ## Design ### Trigger Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare. ### Target Selection We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction. ### Compaction Process A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings: - build a MergeIterator from the target segments - create a new segment writer - for each block readed from MergeIterator, the Writer append it ### SegID handling SegID must remain consecutive after segment compaction. If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4: - we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3 - delete seg_0, seg_1, seg_2 and seg_3 - rename seg_0-3 to seg_0 - rename seg_4 to seg_1 It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.	2022-11-04 14:12:51 +08:00
Xinyi Zou	32a029d9dc	[enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795 )	2022-11-03 09:47:12 +08:00
qiye	b83744d2f6	[feature](function)add regexp functions: regexp_replace_one, regexp_extract_all (#13766 )	2022-11-02 23:15:57 +08:00
zhangstar333	374303186c	[Vectorized](function) support topn_array function (#13869 )	2022-11-02 19:49:23 +08:00
Mingyu Chen	942611c185	Revert "[enhancement](compaction) opt compaction task producer and quick compaction (#13495 )" (#13833 ) This reverts commit 4f2ea0776ca3fe5315ab5ef7e00eefabfb5771a0.	2022-11-01 14:22:12 +08:00
Kang	7ae60a0ad2	[feature](function)add url functions: domain and protocol (#13662 )	2022-10-31 19:13:08 +08:00
yixiutt	4f2ea0776c	[enhancement](compaction) opt compaction task producer and quick compaction (#13495 ) 1.remove quick_compaction's rowset pick policy, call cu compaction when trigger quick compaction 2. skip tablet's compaction task when compaction score is too small Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-10-31 12:24:05 +08:00
Pxl	711dad28fb	[Chore](unused) remove QSorter #13769	2022-10-31 08:44:39 +08:00
Ashin Gau	e0667b297f	[feature-wip](multi-catalog) reuse hdfsFs and decode parquet values in batch (#13688 ) PR(https://github.com/apache/doris/pull/13404) introduced that ParquetReader will break up batch insertion when encountering null values, which leads to the bad performance compared to OrcReader. So this PR has pushed null map into decode function, reduce the time of virtual function call when encountering null values. Further more, reuse hdfsFS among file readers to reduce the time of building connection to hdfs.	2022-10-28 15:52:52 +08:00
pengxiangyu	eab8876abc	[Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487 )	2022-10-28 15:48:22 +08:00
Pxl	2fab0c45c7	[Feature](runtime-filter) add runtime filter breaking change adapt (#13246 ) add runtime filter breaking change adapt	2022-10-28 10:59:28 +08:00
Yongqiang YANG	295d887cf5	[improvement](thread) set name for priority thread pool (#13552 )	2022-10-26 09:32:15 +08:00
Gabriel	3006b258b0	[Improvement](bloomfilter) allocate memory for BF in open phase (#13494 )	2022-10-21 17:37:26 +08:00
Kang	ccc04210d6	[feature](jsonb type) functions for cast from and to jsonb datatype (#13379 )	2022-10-21 15:18:16 +08:00
Xinyi Zou	9dc5dd382a	[enhancement](memtracker) Fix Brpc mem count and refactored thread context macro (#13469 )	2022-10-21 12:01:38 +08:00
Pxl	1892e8f66e	[Enhancement](scanner) support split avg key range (#13166 )	2022-10-20 14:53:16 +08:00
DongLiang-0	2b328eafbb	[function](string_function) add new string function 'extract_url_parameter' (#13323 )	2022-10-20 11:11:43 +08:00
xiaojunjie	4996eafe74	[bugfix](VecDateTimeValue) eat the value of microsecond in function from_date_format_str (#13446 ) * [bugfix](VecDateTimeValue) eat the value of microsecond in function from_date_format_str * add sql based regression test Co-authored-by: xiaojunjie <xiaojunjie@baidu.com>	2022-10-20 09:02:33 +08:00
xy720	f329d33666	[chore](fix) Fix some spell errors in be's comments. #13452	2022-10-20 08:56:01 +08:00
luozenglin	c449028a5f	[fix](year) fix `year()` results are not as expected (#13426 ) fix `year()` results are not as expected	2022-10-19 11:28:00 +08:00
Kang	755a946516	[feature](jsonb) jsonb functions (#13366 ) Issue Number: Step3 of DSIP-016: Support JSON type	2022-10-19 08:44:08 +08:00
yixiutt	6d322f85ac	[improvement](compaction) delete num based compaction policy (#13409 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-10-18 16:13:28 +08:00
Adonis Ling	125def5102	[enhancement](macOS M1) Support building from source on macOS (M1) (#13195 ) # Proposed changes This PR fixed lots of issues when building from source on macOS with Apple M1 chip. ## ATTENTION The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime: 1. Some errors with memory tracker occur when BE (RELEASE) starts. 2. Some UT cases fail. ... Temporarily, the following changes are made on macOS to start BE successfully. 1. Disable memory tracker. 2. Use tcmalloc instead of jemalloc. This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues. ## Use case ```shell ./build.sh -j 8 --be --clean cd output/be/bin ulimit -n 60000 ./start_be.sh --daemon ``` ## Something else It takes around _10+_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.	2022-10-18 13:10:13 +08:00
HappenLee	f0dbbe5b46	[Bug](funciton) fix repeat coredump when step is to long (#13408 )	2022-10-18 09:55:06 +08:00
Xinyi Zou	87a6b1a13b	[enhancement](memtracker) Fix bthread local consume mem tracker (#13368 ) Previously, bthread_getspecific was called every time bthread local was used. In the test at #10823, it was found that frequent calls to bthread_getspecific had performance problems. So a cache is implemented on pthread local based on the btls key, but the btls key cannot correctly sense bthread switching. So, based on bthread_self to get the bthread id to implement the cache.	2022-10-17 18:31:07 +08:00
abmdocrt	045bccdbea	[Feature](Retention) support retention function (#13056 )	2022-10-17 11:00:47 +08:00
zxealous	a83eaddfcf	[test](cache)Add remote cache ut (#13377 )	2022-10-16 23:59:50 +08:00
Gabriel	1d5ba9cbcc	[Improvement](like) Change `like` function to batch call (#13314 )	2022-10-16 16:18:22 +08:00
zhannngchen	4a5095f00d	[cleanup](config) remove unused config push_write_mbytes_per_sec (#13290 )	2022-10-12 15:58:04 +08:00
Kang	1bd14f1d82	[feature-wip](jsonb) jsonb parse function and load (#13129 ) add function to parse json string to jsonb format and use it to support stream load.	2022-10-12 13:56:37 +08:00
zhangstar333	16999ef02d	[Vectorized][Function] support date_trunc and countequal function (#13039 )	2022-10-12 10:01:09 +08:00
赵立伟	334708dc8c	[fix](memory): avoid coredump when list pointer is null (#12919 )	2022-10-11 16:00:23 +08:00
Pxl	bdcb600f3d	[Bug](load) fix core dump on big block load (#13014 )	2022-10-10 12:38:32 +08:00
yiguolei	dc2d33298b	[chore](be config) remove config use_mmap_allocate_chunk #13196 This config is never used online and there exist bugs if enable this config. So that I remove this config and related tests. Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-10-09 16:19:59 +08:00
Pxl	245490d6b7	[Enhancement](runtime filter) optimize for runtime filter (#12856 ) optimize for runtime filter	2022-10-09 14:11:03 +08:00
Mingyu Chen	d286aa7bf7	[fix](spark-load) no need to filter row group when doing spark load (#13116 ) 1. Fix issue #13115 2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly. Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers. 3. Add more checks for broker load test cases.	2022-10-05 23:00:56 +08:00
Ashin Gau	026ffaf10d	[feature-wip](parquet-reader) add detail profile for parquet reader (#13095 ) Add more detail profile for ParquetReader: ParquetColumnReadTime: the total time of reading parquet columns ParquetDecodeDictTime: time to parse dictionary page ParquetDecodeHeaderTime: time to parse page header ParquetDecodeLevelTime: time to parse page's definition/repetition level ParquetDecodeValueTime: time to decode page data into doris column ParquetDecompressCount: counter of decompressing page data ParquetDecompressTime: time to decompress page data ParquetParseMetaTime: time to parse parquet meta data	2022-10-02 15:11:48 +08:00
Adonis Ling	e7f18e998a	[chore](be-ut) Remove useless lines which cause compilation errors (#13053 )	2022-09-30 11:26:25 +08:00
slothever	820ec435ce	[feature-wip](parquet-reader) refactor parquet_predicate (#12896 ) This change serves the following purposes: 1. use ScanPredicate instead of TCondition for external table, it can reuse old code branch. 2. simplify and delete some useless old code 3. use ColumnValueRange to save predicate	2022-09-28 21:27:13 +08:00
Mingyu Chen	d80b7b9689	[feature-wip](new-scan) support more load situation (#12953 )	2022-09-27 21:48:32 +08:00
Pxl	9607f60845	[Feature](serialize) move block_data_version to fe heart beat (#12667 ) Move block_data_version from be config to fe heart beat	2022-09-27 18:25:54 +08:00
TengJianPing	1bb42a7bc0	[function](hash) add support of murmur_hash3_64 (#12923 )	2022-09-26 14:23:37 +08:00
Ashin Gau	692176ec07	[feature-wip](parquet-reader) pre read page data in advance to avoid frequent seek (#12898 ) 1. Fix the bug of file position in `HdfsFileReader` 2. Reserve enough buffer for `ColumnColumnReader` to read large continuous memory	2022-09-25 21:21:06 +08:00
Shane	59699a4321	[feature](JSON datatype)Support JSON datatype (#10322 ) Add `JSON` datatype, following features are implemented by this PR: 1. `CREATE` tables with `JSON` type columns 2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE 3. `SELECT` JSON columns Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type) * add JSONB data storage format type * fix JsonLiteral resolve bug * add DataTypeJson case in data_type_factory * add JSON syntax check in FE * add operators for jsonb_document, currently not support comparison between any JSON type value * add ColumnJson and DataTypeJson * add JsonField to store JsonValue * add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral * add push_json for MysqlResultWriter * JSON column need no zone_map_index * Revert "JSON column need no zone_map_index" This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79. * add JSON writer and reader, ignore zone-map for JSON column * add json_to_string for DataTypeJson * add olap_data_convertor for JSON type * add some enum * add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions * fix column_json offsets overflow bug, format code * remove useless TODOs, add CmpType cases for JSON type * add license header * format license * format be codes * resolve rebase master conflicts * fix bugs for CREATE and meta related code * refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes * modification be codes along code review advice * fix rebase conflicts with master * add unit test for json_value and column_json * fix rebase error * rename json to jsonb * fix some data convert bugs, set Mysql type to JSON	2022-09-25 14:06:49 +08:00

1 2 3 4 5 ...

854 Commits