doris

Author	SHA1	Message	Date
lihangyu	50c1d55769	[Improve](dynamic schema) support filtering invalid data (#21160 ) * [Improve](dynamic schema) support filtering invalid data 1. Support dynamic schema to filter illegal data. 2. Expand the regular expression for ColumnName to support more column names. 3. Be compatible with PropertyAnalyzer and support legacy tables. 4. Default disable parse multi dimenssion array, since some bug unresolved	2023-06-26 19:32:43 +08:00
HappenLee	5fdd9b9254	[Bug](RuntimeFiter) Fix bf error change the murmurhash to crc32 in regression test p2 (#21167 )	2023-06-26 16:39:45 +08:00
YueW	960e04b0ed	[fix](inverted index) fix build inverted index failed but not return immediately (#21165 )	2023-06-26 14:05:12 +08:00
ZhangYu0123	66005570c9	[fix](regression) fix p1 test_backup_restore fail caused by http download 401 invalid token error #21107	2023-06-26 12:56:46 +08:00
Mingyu Chen	1dec592e91	[improvement](fs_bench) optimize the usage of fs benchmark tool for hdfs (#21154 ) Optimize the usage of fs benchmark tool: 1. Remove `Open` benchmark, it is useless. 2. Remove `Delete` benchmark, it is dangerous. 3. Add `SingleRead` benchmark, user can specify an exist file to test read operation: `sh bin/run-fs-benchmark.sh --conf=conf/hdfs_read.conf --fs_type=hdfs --operation=single_read` 4. Modify the `run-fs-benchmark.sh`, remove `OPTS` section, use options in `fs_benchmark_tool` directly 5. Add some custom counters in the benchmark result, eg: ``` -------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------------------- HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 6864 ms 2385 ms 1 ReadRate=200.936M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3919 ms 1828 ms 1 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3839 ms 1819 ms 1 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_mean 4874 ms 2011 ms 3 ReadRate=304.054M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_median 3919 ms 1828 ms 3 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_stddev 1724 ms 324 ms 3 ReadRate=89.3768M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_cv 35.37 % 16.11 % 3 ReadRate=29.40% HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_max 6864 ms 2385 ms 3 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_min 3839 ms 1819 ms 3 ReadRate=200.936M/s ``` - For `open_read` and `single_read`, add `ReadRate` as `bytes per second`. - For `create_write`, add `WriteRate` as `bytes per second`. - For `exists` and `rename`, add `ExistsCost` and `RenameCost` as `time cost per one operation`.	2023-06-26 11:37:14 +08:00
Kang	2e6d91aa99	[chore](block) temporarily disable DCHECK for column name equality in MutableBlock (#21116 ) * tempororyly disable DCHECK for column name equality in MutableBlock::add_rows * num columns EQ to LE	2023-06-26 10:49:27 +08:00
yiguolei	28abeef72b	[performace](colddata) opt cold data read performance (#21141 ) In #10370, we try to opt string evaluate performance by rewrite the predicate using dict value. But it has to check if the string column is full dict encoding. So that we add a logic to read the last page of the string column to check it. But it has some bad performance for cold data because it has to load the column's ordinal index and zone map index. In some scenario for example, select * from table where pk_col=1. If the query condition is primary key, the result maybe just a few rows but the result may have 100 columns, it will cost a lot of time to load these indices. We could find a lot of time is spending on block_init_time. In my test, a table with 50 string columns and query with primary key. The first read time will reduce from 220ms to 40ms.	2023-06-26 10:39:20 +08:00
Xinyi Zou	6f7759b08d	[fix](memory) fix mem tracker grace exit (#21136 )	2023-06-26 10:28:24 +08:00
airborne12	1ac8cdec7e	[Fix](inverted index) fix inverted query cache for chinese tokenizer (#21106 ) 1. query cache for chinese tokenizer is confusing when just converting w_char to char. 2. seperate query_type from inverted_index_reader to clean code.	2023-06-25 22:04:02 +08:00
Lijia Liu	76bdcf1d26	[improvement](pipeline) task group scan entity (#19924 )	2023-06-25 14:43:35 +08:00
Qi Chen	d49c412c59	[Feature](multi-catalog) Add hdfs benchmark tools. (#21074 )	2023-06-25 09:35:27 +08:00
HappenLee	601120db04	[Bug](pipeline) access map may cause coredump in sink buffer (#21108 )	2023-06-24 23:03:59 +08:00
Xin Liao	691a988c97	[enhancement](merge-on-write) add async publish task when version is discontinuous for merge on write table when clone (#21025 ) version discontinuity may occur when clone. To deal with this case, add async publish task when version is discontinuous.	2023-06-22 21:50:14 +08:00
zhangstar333	a33521b2ce	[enhancement](exchange) add filter for exchange node in BE (#21087 )	2023-06-22 01:04:47 +08:00
zclllyybb	49bbe88327	[fix](log) fix the too large warning log of BE (#21027 )	2023-06-22 00:39:04 +08:00
TsukiokaKogane	3dfeee3946	[fix](typesystem) fix wrong return type argument cause type check fail (#21082 )	2023-06-22 00:04:46 +08:00
Xinyi Zou	2c9bdd64fa	[fix](memory) arena support memory reuse after clear() (#21033 )	2023-06-21 23:27:21 +08:00
Gabriel	2ce8cfbebd	[profile](sort) add some metrics in profile (#21056 )	2023-06-21 22:57:46 +08:00
Xinyi Zou	661e1ae7c5	[fix](memory) no switch bthread context in UBSAN compile (#21064 ) When UBSAN is compiled, all memory will be tracked to the orphan (unknown) mem tracker, and the bthread context and mem tracker will no longer be switched. The supplementary fixes are as follows: #20999	2023-06-21 21:14:07 +08:00
HHoflittlefish777	b2c4e51be1	[fix](load) delete lazy open DCheck when unkown load id (#21083 )	2023-06-21 20:42:31 +08:00
Chenyang Sun	18a0824eb3	[fix](compaction)Modify time series compaction policy default config (#21079 )	2023-06-21 20:29:58 +08:00
DongLiang-0	442a734ef5	[improvement](config) update be config max_runnings_transactions_per_txn_map default value (#21060 )	2023-06-21 20:29:13 +08:00
airborne12	6ac0bfeceb	[Feature](inverted index) add unicode parser for inverted index (#21035 )	2023-06-21 20:14:06 +08:00
Xinyi Zou	84b97860a1	[fix](memory) Fix memory exceed limit and query has been canceled, Allocator will block 100ms (#20959 )	2023-06-21 17:35:19 +08:00
zhannngchen	85ce6a22c0	[enhancement](merge-on-write) some misc optimizations (#21039 )	2023-06-21 16:16:06 +08:00
Yongqiang YANG	b65b821813	[enhancement](pk) add bvar stating cached io (#20977 )	2023-06-21 15:02:10 +08:00
Yongqiang YANG	c5560b8f93	[fix](load) segcompaction does not signal waiters when an error hanppens (#21043 ) This leads to a deadlock.	2023-06-21 14:56:34 +08:00
Qi Chen	bad22dd4e2	[Fix](orc-reader) Fix orc dict filter null value issue in `_convert_dict_cols_to_string_cols` which caused incorrect result. (#21047 ) Query results should not have empty values. ``` use regresssion.multi_catalog; select commit_id from github_events_orc WHERE (event_type = 'CommitCommentEvent') AND commit_id != "" limit 10; ``` ``` +------------------------------------------+ \| commit_id \| +------------------------------------------+ \| 685c1fd8dbbdc10c042932f9a9f88be00ff96c75 \| \| 685c1fd8dbbdc10c042932f9a9f88be00ff96c75 \| \| 4e3ab2ff2d2474f5d51334b9b0fdf17e9845a166 \| \| \| \| \| \| \| \| \| \| \| \| \| \| 7191c20cb49da07a7fc16aa32dc0de4faff528b2 \| +------------------------------------------+ 10 rows in set (0.54 sec) ```	2023-06-21 14:54:01 +08:00
zhannngchen	564b3533cf	[enhancement](merge-on-write) update publish/streamload/compaction co… (#21040 )	2023-06-21 14:49:51 +08:00
Gabriel	81abdeffbc	[Improvement](pipeline) Improve shared scan performance (#20785 )	2023-06-21 14:36:05 +08:00
Pxl	5f0bb49d46	[Feature](materialized-view) support create mv contain aggstate column (#20812 ) support create mv contain aggstate column	2023-06-21 13:06:52 +08:00
Jerry Hu	5f760a8939	[fix](runtime_filter) remove incorrect DCHECK (#21050 )	2023-06-21 11:27:53 +08:00
Ashin Gau	ef17289925	[feature](jni) add jni metrics and attach to BE profile automatically (#21004 ) Add JNI metrics, for example: ``` - HudiJniScanner: 0ns - FillBlockTime: 31.29ms - GetRecordReaderTime: 1m5s - JavaScanTime: 35s991ms - OpenScannerTime: 1m6s ``` Add three common performance metrics for JNI scanner: 1. `OpenScannerTime`: Time to init and open JNI scanner 2. `JavaScanTime`: Time to scan data and insert into vector table in java side 3. `FillBlockTime`: Time to convert java vector table to c++ block And support user defined metrics in java side, for example: `OpenScannerTime` is a long time for the open process, we want to determine which sub-process takes too much time, so we add `GetRecordReaderTime` in java side. The user defined metrics in java side can be attached to BE profile automatically.	2023-06-21 11:19:02 +08:00
dujl	0cf9de8cef	[fix](decimalv3) fix result error when cast a round decimalv3 to double (#20678 )	2023-06-21 00:02:48 +08:00
HappenLee	ca6f51fcd5	[Performance] disable mmap alloc for doris performance (#21034 ) disable mmap alloc for some benchmark	2023-06-20 23:27:49 +08:00
Xinyi Zou	6d579d924d	[fix](profile) delete useless profile add_child #20989	2023-06-20 23:21:52 +08:00
Xin Liao	b70a14d9c9	[fix](merge-on-write) fix that delete bitmap is not calculated correctly when has sequence column (#20955 )	2023-06-20 21:36:47 +08:00
Kang	2c11ce0a02	[bugfix](topn) fix key topn merge block conflict with index predicate result columns (#20820 )	2023-06-20 21:23:00 +08:00
airborne12	7a58a69aa9	[Fix](inverted index) skip index compaction when src rs did not have inverted index (#21010 )	2023-06-20 21:22:25 +08:00
Xinyi Zou	ce1b39e79d	[fix](profile) avoid unnecessary refresh profile of TabletsChannel Before, refresh the TabletsChannel profile in the LoadChannelMgr refresh memory statistics thread This means that enable_profile=false will refresh and have performance loss in stress test	2023-06-20 21:09:43 +08:00
Xinyi Zou	622ef63c69	[fix](memory) fix `bthread_setspecific` error in rpc done.run() (#20999 )	2023-06-20 21:00:45 +08:00
TengJianPing	55a6649da9	[fix](testcase) fix test case failure of insert null value into not null column (#20963 )	2023-06-20 20:46:07 +08:00
zzzxl	190debaac9	[Improvement](load) single partition load optimize (#20876 ) 1. When creating a single partition，partition and tablet are not looked up for each row of data 2. Only DISTRIBUTED BY random	2023-06-20 20:29:39 +08:00
Xin Liao	9eade148dd	[enhancement](merge-on-write) add primary key data page size config (#20961 )	2023-06-20 19:51:02 +08:00
airborne12	ccba11d7ea	[Fix](inverted index) remove IndexReader::indexExists, use fs interface (#20970 )	2023-06-20 15:22:25 +08:00
Kaijie Chen	012813b3f7	[fix](load) add missing flush context for BetaRowsetWriter::_add_block() (#20884 )	2023-06-20 14:27:39 +08:00
Qi Chen	c85271d2ae	[Fix](orc-reader) Fix filter size mismatch in orc reader. (#20998 ) Fix filter size mismatch in orc reader introduced by #20806	2023-06-20 12:27:16 +08:00
zzzxl	d05614ef51	[Fix](invert index)all directories use NoLock (#20962 )	2023-06-20 12:12:16 +08:00
Ashin Gau	923f7edad0	[opt](hudi) using native reader to read the base file with no log file (#20988 ) Two optimizations: 1. Insert string bytes directly to remove decoding&encoding process. 2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.	2023-06-20 11:20:21 +08:00
zzzzzzzs	824bc02603	[Function] Support date function: microsecond() (#20044 )	2023-06-20 10:32:54 +08:00

1 2 3 4 5 ...

4837 Commits