doris

Author	SHA1	Message	Date
Pxl	45f1909bc3	[Bug](lateral-view) make lateral view function's nullable mode work (#21242 ) make lateral view function's nullable mode work	2023-06-29 10:50:07 +08:00
Jerry Hu	7f0e37069f	[improvement](olap) filter the whole segment by dictionary (#21239 )	2023-06-29 10:34:29 +08:00
Jack Drogon	3f99b91ddf	[fix](gc_binlog) Fix tablet gc_binlogs nullptr (#21158 )	2023-06-29 10:10:33 +08:00
Pxl	f8cfe5e579	[Bug](pipeline) add DCHECK for _instance_to_sending_by_pipeline = false on _send_rpc (#21169 ) add DCHECK for _instance_to_sending_by_pipeline = false on _send_rpc	2023-06-29 10:03:57 +08:00
Xiangyu Wang	86af533e83	[Enhancement](heartbeat) make heartbeat ok when config repeated host-ip pairs (#21228 )	2023-06-28 23:12:06 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
Xinyi Zou	d2c42ec638	[fix](memory) Purge Jemalloc arena dirty pages when memory insufficient (#21237 ) Jemalloc dirty page only use madvise MADV_FREE, memory is not release back to system, RSS won't reduce in time, So when the process memory exceed limit or system available memory is insufficient, manually transfer dirty page to the muzzy page, which will call MADV_DONTNEED to release the physical memory back to the system. https://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms	2023-06-28 16:49:45 +08:00
Xinyi Zou	0396f78590	[fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259 )	2023-06-28 16:10:24 +08:00
wangbo	3304af848e	[Fix](storage)read page cache when seek #21272 Currently, when a columnIter is used for seek, then page cache is not set; When this colunIter is used for later read data, then page cache could not be used.	2023-06-28 15:53:40 +08:00
Gabriel	e348b9464e	[scan](freeblocks) use ConcurrentQueue to replace vector for free blocks (#21241 )	2023-06-28 15:10:07 +08:00
Gabriel	a4fdf7324a	[Bug](javaudf) fix BE crash if javaudf is push down (#21139 )	2023-06-28 15:01:24 +08:00
Pxl	1fc1e76fc7	[Bug](alter table) return error status to avoid core dump on schema change meet invalid input (#21273 ) return error status to avoid core dump on schema change meet invalid input	2023-06-28 14:20:16 +08:00
zhannngchen	21b30820fd	[fix](partial-update) fix a coredump in commit_phase_update_delete_bitmap (#21254 )	2023-06-28 11:47:07 +08:00
Xin Liao	de9172e476	[enhancement](merge-on-write) replace map with vector for segment handle caches (#21162 )	2023-06-28 11:33:02 +08:00
Xin Liao	5d1fb33f2d	[enhancement](merge-on-write) increasing the max_write_buffer_number parameter to improve save meta performance (#21243 )	2023-06-28 11:32:11 +08:00
amory	b1e973b721	[Improve](func)support array to window-func first-last-value arg type (#21201 ) * support array to windown-func first-last-value arg type * add regress test for first-last-value of array type * update * format be:	2023-06-28 10:02:00 +08:00
caiconghui	db50face41	[fix](time_zone) be compatible with doris old version for CST time_zone when load orc file in broker load (#21263 ) Fix error for broker load with orc file when time_zone is CST of which message is "Failed to create orc row reader. reason = Can't open /usr/share/zoneinfo/CST" Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-06-28 09:44:42 +08:00
YueW	92882ebd91	[fix](inverted index) update output rowset index meta with input rowset when drop inverted index (#21248 )	2023-06-27 23:54:35 +08:00
Kaijie Chen	d545e00bc7	[improve](error) include detailed messages in rowset reader init error (#21229 )	2023-06-27 20:45:14 +08:00
zzzxl	4061783674	[Fix](invert index)fix s3 failed to check the directory (#21232 )	2023-06-27 20:01:46 +08:00
Mingyu Chen	7c569fd9db	[fix](s3_writer) init member's value to avoid undefined behavior (#21233 )	2023-06-27 20:01:20 +08:00
Yongqiang YANG	29b3d39561	[enhancement](memory) print stacktrace for large allocation (#21069 )	2023-06-27 19:39:51 +08:00
HappenLee	609410d82b	[opt](hashmap) memset the hashmap memory to improve performance (#21225 )	2023-06-27 19:30:57 +08:00
Adonis Ling	c470bf56a5	[chore](build) Fix compilation errors reported by GCC-13 (#21215 ) Add missing headers to fix the compilation errors reported by GCC-13.	2023-06-27 17:04:44 +08:00
zhannngchen	ec0e398c50	[enhancement](merge-on-write) record precise primary key index size (#21196 )	2023-06-27 16:50:09 +08:00
Pxl	70ddf64126	[Chore](agg-state) add documentation about agg_state, add group_concat agg_state test case (#21147 ) add documentation about agg_state, add group_concat agg_state test case	2023-06-27 11:28:19 +08:00
Yulei-Yang	e0b20f0437	[feature](function) add ip function ipv4numtostring (alias inet_ntoa) (#20936 )	2023-06-27 10:17:40 +08:00
airborne12	b2dc4a8cb9	[Fix](inverted index) check inverted index file existence befor data compaction (#21173 )	2023-06-26 19:55:55 +08:00
lihangyu	50c1d55769	[Improve](dynamic schema) support filtering invalid data (#21160 ) * [Improve](dynamic schema) support filtering invalid data 1. Support dynamic schema to filter illegal data. 2. Expand the regular expression for ColumnName to support more column names. 3. Be compatible with PropertyAnalyzer and support legacy tables. 4. Default disable parse multi dimenssion array, since some bug unresolved	2023-06-26 19:32:43 +08:00
HappenLee	5fdd9b9254	[Bug](RuntimeFiter) Fix bf error change the murmurhash to crc32 in regression test p2 (#21167 )	2023-06-26 16:39:45 +08:00
YueW	960e04b0ed	[fix](inverted index) fix build inverted index failed but not return immediately (#21165 )	2023-06-26 14:05:12 +08:00
ZhangYu0123	66005570c9	[fix](regression) fix p1 test_backup_restore fail caused by http download 401 invalid token error #21107	2023-06-26 12:56:46 +08:00
Mingyu Chen	1dec592e91	[improvement](fs_bench) optimize the usage of fs benchmark tool for hdfs (#21154 ) Optimize the usage of fs benchmark tool: 1. Remove `Open` benchmark, it is useless. 2. Remove `Delete` benchmark, it is dangerous. 3. Add `SingleRead` benchmark, user can specify an exist file to test read operation: `sh bin/run-fs-benchmark.sh --conf=conf/hdfs_read.conf --fs_type=hdfs --operation=single_read` 4. Modify the `run-fs-benchmark.sh`, remove `OPTS` section, use options in `fs_benchmark_tool` directly 5. Add some custom counters in the benchmark result, eg: ``` -------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------------------- HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 6864 ms 2385 ms 1 ReadRate=200.936M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3919 ms 1828 ms 1 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3839 ms 1819 ms 1 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_mean 4874 ms 2011 ms 3 ReadRate=304.054M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_median 3919 ms 1828 ms 3 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_stddev 1724 ms 324 ms 3 ReadRate=89.3768M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_cv 35.37 % 16.11 % 3 ReadRate=29.40% HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_max 6864 ms 2385 ms 3 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_min 3839 ms 1819 ms 3 ReadRate=200.936M/s ``` - For `open_read` and `single_read`, add `ReadRate` as `bytes per second`. - For `create_write`, add `WriteRate` as `bytes per second`. - For `exists` and `rename`, add `ExistsCost` and `RenameCost` as `time cost per one operation`.	2023-06-26 11:37:14 +08:00
Kang	2e6d91aa99	[chore](block) temporarily disable DCHECK for column name equality in MutableBlock (#21116 ) * tempororyly disable DCHECK for column name equality in MutableBlock::add_rows * num columns EQ to LE	2023-06-26 10:49:27 +08:00
yiguolei	28abeef72b	[performace](colddata) opt cold data read performance (#21141 ) In #10370, we try to opt string evaluate performance by rewrite the predicate using dict value. But it has to check if the string column is full dict encoding. So that we add a logic to read the last page of the string column to check it. But it has some bad performance for cold data because it has to load the column's ordinal index and zone map index. In some scenario for example, select * from table where pk_col=1. If the query condition is primary key, the result maybe just a few rows but the result may have 100 columns, it will cost a lot of time to load these indices. We could find a lot of time is spending on block_init_time. In my test, a table with 50 string columns and query with primary key. The first read time will reduce from 220ms to 40ms.	2023-06-26 10:39:20 +08:00
Xinyi Zou	6f7759b08d	[fix](memory) fix mem tracker grace exit (#21136 )	2023-06-26 10:28:24 +08:00
airborne12	1ac8cdec7e	[Fix](inverted index) fix inverted query cache for chinese tokenizer (#21106 ) 1. query cache for chinese tokenizer is confusing when just converting w_char to char. 2. seperate query_type from inverted_index_reader to clean code.	2023-06-25 22:04:02 +08:00
Lijia Liu	76bdcf1d26	[improvement](pipeline) task group scan entity (#19924 )	2023-06-25 14:43:35 +08:00
Qi Chen	d49c412c59	[Feature](multi-catalog) Add hdfs benchmark tools. (#21074 )	2023-06-25 09:35:27 +08:00
HappenLee	601120db04	[Bug](pipeline) access map may cause coredump in sink buffer (#21108 )	2023-06-24 23:03:59 +08:00
Xin Liao	691a988c97	[enhancement](merge-on-write) add async publish task when version is discontinuous for merge on write table when clone (#21025 ) version discontinuity may occur when clone. To deal with this case, add async publish task when version is discontinuous.	2023-06-22 21:50:14 +08:00
zhangstar333	a33521b2ce	[enhancement](exchange) add filter for exchange node in BE (#21087 )	2023-06-22 01:04:47 +08:00
zclllyybb	49bbe88327	[fix](log) fix the too large warning log of BE (#21027 )	2023-06-22 00:39:04 +08:00
TsukiokaKogane	3dfeee3946	[fix](typesystem) fix wrong return type argument cause type check fail (#21082 )	2023-06-22 00:04:46 +08:00
Xinyi Zou	2c9bdd64fa	[fix](memory) arena support memory reuse after clear() (#21033 )	2023-06-21 23:27:21 +08:00
Gabriel	2ce8cfbebd	[profile](sort) add some metrics in profile (#21056 )	2023-06-21 22:57:46 +08:00
Xinyi Zou	661e1ae7c5	[fix](memory) no switch bthread context in UBSAN compile (#21064 ) When UBSAN is compiled, all memory will be tracked to the orphan (unknown) mem tracker, and the bthread context and mem tracker will no longer be switched. The supplementary fixes are as follows: #20999	2023-06-21 21:14:07 +08:00
HHoflittlefish777	b2c4e51be1	[fix](load) delete lazy open DCheck when unkown load id (#21083 )	2023-06-21 20:42:31 +08:00
Chenyang Sun	18a0824eb3	[fix](compaction)Modify time series compaction policy default config (#21079 )	2023-06-21 20:29:58 +08:00
DongLiang-0	442a734ef5	[improvement](config) update be config max_runnings_transactions_per_txn_map default value (#21060 )	2023-06-21 20:29:13 +08:00

1 2 3 4 5 ...

4865 Commits