doris

Author	SHA1	Message	Date
TengJianPing	7b2fdd26a1	[schema change](fix) fix coredump of schema change (#13183 ) When schema change and compaction is executing simutaneously, both nullable and not nullable data can be read for the same column, need to reset _nullmap for each Block when converting Block data, or else Column case will be wrong.	2022-10-09 19:44:00 +08:00
Kikyou1997	fc711d89c8	[fix](projections) Open the project expressions properly. (#13162 ) In current 'ExecNode::open' function, the 'open(_projections)' is unreachable which might cause serious crashed. (#13150)	2022-10-09 18:43:45 +08:00
Xin Liao	89514fc964	[fix](rowset) fix that rowset writer doesn't process the return value, which may result in data loss (#13189 )	2022-10-09 17:10:11 +08:00
yiguolei	dc2d33298b	[chore](be config) remove config use_mmap_allocate_chunk #13196 This config is never used online and there exist bugs if enable this config. So that I remove this config and related tests. Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-10-09 16:19:59 +08:00
Gavin Chou	f373b22dcf	[fix](string) Fix over-allocated memory for string type (#13167 ) For string/varchar/text type, the length field is fixed to 2GB. (`ColumnMetaPB`) We don't actually have to allocate 2GB for every string type because we will reallocate the precise size of memory for the string in `WrapperField::from_string()` ``` Status from_string(const std::string& value_string, const int precision = 0, const int scale = 0) { if (_is_string_type) { if (value_string.size() > _var_length) { Slice* slice = reinterpret_cast<Slice*>(cell_ptr()); slice->size = value_string.size(); _var_length = slice->size; _string_content.reset(new char[slice->size]); slice->data = _string_content.get(); } } return _rep->from_string(_field_buf + 1, value_string, precision, scale); } ```	2022-10-09 14:14:39 +08:00
Pxl	245490d6b7	[Enhancement](runtime filter) optimize for runtime filter (#12856 ) optimize for runtime filter	2022-10-09 14:11:03 +08:00
Xin Liao	9e42804298	[feature-wip](unique-key-merge-on-write) unique key with merge on write table support schema change (#12886 )	2022-10-09 11:31:53 +08:00
Xin Liao	671dc93035	[feature-wip](unique-key-merge-on-write) fix that versions of multiple replicas are inconsistent when rebalance (#12363 )	2022-10-09 11:31:27 +08:00
xy720	b8b18e5153	[enhancement](array-type) Handle cast empty string value to array (#13028 ) Handle empty value between two comma when cast string to array type. before: mysql> select cast("[a,b,c,,,,]" as array<string>); +-----------------------------------+ \| CAST('[a,b,c,,,,]' AS ARRAY<TEXT>) \| +-----------------------------------+ \| ['a', 'b', 'c', ',', ','] \| +-----------------------------------+ 1 row in set (0.01 sec) after: mysql> select cast("[a,b,c,,,,]" as array<string>); +-----------------------------------+ \| CAST('[a,b,c,,,,]' AS ARRAY<TEXT>) \| +-----------------------------------+ \| ['a', 'b', 'c', '', '', ''] \| +-----------------------------------+ 1 row in set (0.01 sec)	2022-10-08 21:45:42 +08:00
Gabriel	869fe2bc5d	[Improvement](outfile) Support ORC format in outfile (#13019 )	2022-10-08 20:56:32 +08:00
Gabriel	c5f802b93c	[Bug](libjvm) reorder initialization of JNI (#13165 )	2022-10-08 18:53:47 +08:00
Ashin Gau	b81a8789c3	[feature-wip](parquet-reader) optimize the performance of column conversion (#13122 ) Convert Parquet column into doris column via batch method. In the previous implementation, only numeric types can be converted in batches, and other types can only be inserted one by one. This process will generate repeated virtual function calls and container expansion.	2022-10-08 18:03:10 +08:00
slothever	5214e898d9	[fix](parquet-reader) skip data/datatime column predicate filter to avoid coredump (#13072 ) Will be fixed later Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-10-08 18:02:35 +08:00
Mingyu Chen	cf2b93532b	[fix](file-scanner) fix some logic about broker load with parquet with new file scanner (#13135 ) Fix some logic about broker load using new file scanner, with parquet format: 1. If columns are specified in load stmt, but none of them are in parquet file, error will be thrown like `err: No columns found in file`. See `parquet_s3_case4` 2. If the first column of table are not in table, the result number of rows is wrong. See `parquet_s3_case8` 3. If column specified in `columns` in load stmt does not exist in file and table, error will be thrown like: `failed to find default value expr for slot: x1`. See `parquet_s3_case2`	2022-10-08 13:08:08 +08:00
Yongqiang YANG	91cf33865d	[improvement](load) config flush_thread_num_per_store to be 6 by default (#13076 ) Flushing memtable is cpu bound, so 2 thread for a disk is tool small.	2022-10-08 09:16:22 +08:00
weizuo93	8b03977689	fix bug that last line of data lost for stream load when line delimiter is more than one character (#13066 )	2022-10-07 16:12:05 +08:00
Tiewei Fang	b41748efa1	[feature-wip](new-scan)Add new jdbc scanner and new jdbc scan node (#12848 ) Related pr: #11582 This pr is the new jdbc scan node and scanner.	2022-10-07 09:55:17 +08:00
Yongqiang YANG	441b450a79	(runtimefilter) shorter time prepare consumes (#13127 ) Now, every preare put a runtime filter controller, so it takes the mutex lock on the controller map. Init of bloom filter takes some time in allocate and memset. If we run p1 tests with -parallel=20 -suiteParallel=20 -actionParallel=20, then we get error message like 'send fragment timeout 5s'. The patch fixes the problem in the following 2 ways: 1. Replace one mutex block with 128s. 2. If a plan fragment does not have a runtime filter, it does not need to take the locks.	2022-10-06 10:12:29 +08:00
Yongqiang YANG	218b0857ab	[fix](string) allocate memory according to actual size instead of max size (#13112 ) String column lengh is 2GB, if we allocate memory according to column length, string would consume a lot of memory. It also misleads memory tracker.	2022-10-06 09:56:22 +08:00
Mingyu Chen	d286aa7bf7	[fix](spark-load) no need to filter row group when doing spark load (#13116 ) 1. Fix issue #13115 2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly. Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers. 3. Add more checks for broker load test cases.	2022-10-05 23:00:56 +08:00
Lightman	7b75c2df54	[fix](BE) fix the stream load error when upgrade BE from 1.1.2 to master (#13058 )	2022-10-05 12:13:26 +08:00
Xinyi Zou	80e1f401f0	[enhancement](memory) Fix `USE_MEM_TRACKER=OFF` compile (#13085 )	2022-10-05 12:10:49 +08:00
Jibing-Li	984d387945	[Regression](load) Add broker load regression test. (#13062 ) Add basic broker load regression test. It has been tested. But default	2022-10-04 21:29:05 +08:00
zhangstar333	3f47f67b16	[fix](parquet) fix parquet write setting property is not effective (#12912 )	2022-10-04 21:25:57 +08:00
zhangstar333	e167aa120f	[fix](jdbc) fix insert into date type to oracle using wrong type (#12883 ) using JDBC insert into date type to ORACLE, it's should be use to_date function convert string to java.sql.date	2022-10-04 21:24:33 +08:00
Pxl	db89b0b703	[Enhancement](optimize) optimize for function multiply on decimalv2 (#13049 ) optimize for function multiply on decimalv2	2022-10-04 16:07:18 +08:00
Ashin Gau	026ffaf10d	[feature-wip](parquet-reader) add detail profile for parquet reader (#13095 ) Add more detail profile for ParquetReader: ParquetColumnReadTime: the total time of reading parquet columns ParquetDecodeDictTime: time to parse dictionary page ParquetDecodeHeaderTime: time to parse page header ParquetDecodeLevelTime: time to parse page's definition/repetition level ParquetDecodeValueTime: time to decode page data into doris column ParquetDecompressCount: counter of decompressing page data ParquetDecompressTime: time to decompress page data ParquetParseMetaTime: time to parse parquet meta data	2022-10-02 15:11:48 +08:00
Yongqiang YANG	8b14c4aa98	[fix](compaction) don't log cumu policy name for quick compaction (#13101 )	2022-10-01 21:40:42 +08:00
Gabriel	3294b18674	[Improvement](datev2) fix some compatible problems for datev2 (#13079 )	2022-09-30 13:56:01 +08:00
Adonis Ling	e7f18e998a	[chore](be-ut) Remove useless lines which cause compilation errors (#13053 )	2022-09-30 11:26:25 +08:00
carlvinhust2012	d73e437718	[fix](array-type) fix the be core dump when use string to insert array (#12728 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-09-30 10:44:27 +08:00
Gabriel	287ff50a6f	[Bug](datev2) Fix compatible error between datev2 and date (#13024 )	2022-09-29 18:01:55 +08:00
Gabriel	34b14a71c8	[Improvement](string) Optimize scanning for string #12911 ~0.2X performance boost for queries containing string predicates	2022-09-29 15:11:16 +08:00
Gabriel	c2fae109c3	[Improvement](outfile) Support output null in parquet writer (#12970 )	2022-09-29 13:36:30 +08:00
starocean999	bc2966ed80	[fix](like)the dictionary column should call get_shrink_value to get correct string value (#13032 ) * [fix](like)the dictionary column should call get_shrink_value to get correct string value	2022-09-29 09:09:36 +08:00
HappenLee	36bf8ad3eb	[Opt](Vec) Support const column check nullable and remove nullable (#13020 )	2022-09-29 08:39:19 +08:00
Adonis Ling	a853dd3c61	[Bug](aarch64) Fix the mmap errors which make BE down during starting up (#13031 )	2022-09-29 08:36:58 +08:00
slothever	820ec435ce	[feature-wip](parquet-reader) refactor parquet_predicate (#12896 ) This change serves the following purposes: 1. use ScanPredicate instead of TCondition for external table, it can reuse old code branch. 2. simplify and delete some useless old code 3. use ColumnValueRange to save predicate	2022-09-28 21:27:13 +08:00
Mingyu Chen	cd549d8a8f	[improvement](scan) remove concurrency limit if scan has predicate (#13021 ) If a scan node has predicate, we can not limit the concurrency of scanner. Because we don't know how much data need to be scan. If we limit the concurrency, this will cause query to be very slow. For exmple: select * from tbl limit 1, the concurrency will be 1; select * from tbl where k1=1 limit 1, the concurrency will not limit.	2022-09-28 17:07:07 +08:00
Xinyi Zou	16bb5cb430	[enhancement](memory) Jemalloc performance optimization and compatibility with MemTracker #12496	2022-09-28 12:04:29 +08:00
Gabriel	1ba9e4b568	[Improvement](sort) Reuse memory in sort node (#12921 )	2022-09-28 09:44:35 +08:00
Pxl	ee3dd423b9	[Bug](function) core dump on substr #13007	2022-09-28 08:54:49 +08:00
zhannngchen	d8ec53c83f	[enhancement](load) avoid duplicate reduce on same TabletsChannel #12975 In the policy changed by PR #12716, when reaching the hard limit, there might be multiple threads can pick same LoadChannel and call reduce_mem_usage on same TabletsChannel. Although there's a lock and condition variable can prevent multiple threads to reduce mem usage concurrently, but they still can do same reduce-work on that channel multiple times one by one, even it's just reduced.	2022-09-27 22:03:08 +08:00
Mingyu Chen	d80b7b9689	[feature-wip](new-scan) support more load situation (#12953 )	2022-09-27 21:48:32 +08:00
yongjinhou	16f5204cab	fix_md5sum_and_sm3sum (#13009 )	2022-09-27 21:41:14 +08:00
Pxl	9607f60845	[Feature](serialize) move block_data_version to fe heart beat (#12667 ) Move block_data_version from be config to fe heart beat	2022-09-27 18:25:54 +08:00
Pxl	64988cb3d4	[Enhancement](optimize) optimize for insert_indices_from (#12807 )	2022-09-27 15:49:15 +08:00
Adonis Ling	722106805f	[chore](build) Fix compilation errors reported by clang-15 (#13000 ) Add a compile flag -Wno-unused-but-set-variable to build libGeo.a .	2022-09-27 14:04:44 +08:00
TengJianPing	3f99dd5c4b	[function](bitmap) support bitmap_hash64 (#12992 )	2022-09-27 12:16:02 +08:00
Adonis Ling	429ac929fb	[chore](build) Support building from source on ubuntu-22.04 (aarch64) (#12813 ) Support building from source on ubuntu-22.04	2022-09-27 10:29:13 +08:00

1 2 3 4 5 ...

2906 Commits