doris

Author	SHA1	Message	Date
Amos Bird	0b33824eef	[fix][Vectorized] Fix nullptr deref in data sink (#11473 ) brpc cache may return nullptr.	2022-08-22 11:44:55 +08:00
Xinyi Zou	92cef580f3	[enhancement](memory) Reduce virtual memory used by PaddedPODArray (#11816 )	2022-08-22 11:33:07 +08:00
Ashin Gau	6d925054de	[feature-wip](parquet-reader) decode parquet time & datetime & decimal (#11845 ) 1. Spark can set the timestamp precision by the following configuration: spark.sql.parquet.outputTimestampType = INT96(NANOS), TIMESTAMP_MICROS, TIMESTAMP_MILLIS DATETIME V1 only keeps the second precision, DATETIME V2 keeps the microsecond precision. 2. If using DECIMAL V2, the BE saves the value as decimal128, and keeps the precision of decimal as (precision=27, scale=9). DECIMAL V3 can maintain the right precision of decimal	2022-08-22 10:15:35 +08:00
Jerry Hu	dc8f64b3e3	[improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801 )	2022-08-22 10:12:06 +08:00
jiafeng.zhang	915d8989c5	[feature](spark-load)Spark load supports string type data import (#11927 )	2022-08-22 08:56:59 +08:00
Xinyi Zou	b1fd701493	[fix](memtracker) Improve memory tracking accuracy for exec nodes (#11947 )	2022-08-22 08:56:05 +08:00
camby	83ea4ea984	[refractor](bitmap) bitmap serialize and deserialize refractor (#11921 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-22 08:52:20 +08:00
Xinyi Zou	5eb5444476	[fix](memtracker) Remove useless memory exceed check #11939	2022-08-22 08:40:19 +08:00
Adonis Ling	982c5f06b5	[fix](build) Resolve the conflicts when building be with java-udf (#11938 )	2022-08-20 18:24:32 +08:00
Pxl	64dc3b360f	[Bug](function) fix dcheck fail on close vexpr ctx (#11908 )	2022-08-19 19:11:10 +08:00
carlvinhust2012	f66e42f848	[optimization](array-type) support the decimal/datetime as the nest type of array in print_value (#11784 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-19 17:59:09 +08:00
yixiutt	1b0b5b5f09	[Enhancement](load) add hidden_columns in stream load param (#11625 ) Stream load will ignore invisible columns if no http header columns specified, but in some case user cannot get all columns if columns changed frequently。 Add a hidden_columns header to support hidden columns import。User can set hidden_columns such as __DORIS_DELETE_SIGN__ and add this column in stream load data so we can delete this line. For example: curl -u root -v --location-trusted -H "hidden_columns: __DORIS_DELETE_SIGN__" -H "format: json" -H "strip_outer_array: true" -H "jsonpaths: [\"$.id\", \"$.name\",\"$.__DORIS_DELETE_SIGN__\"]" -T 1.json http://{beip}:{be_port}/api/test/test1/_stream_load Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-19 14:57:11 +08:00
yixiutt	01bd7f224b	[bugifx](compaction) fix filter_delete if schema has sequence column (#11909 ) introduced in #11721. Use last column as delete sign, but if sequence column exist, it's wrong. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-19 14:56:06 +08:00
Gabriel	1f9eec5462	[Regression](datev2) Add test cases for datev2/datetimev2 (#11831 )	2022-08-19 10:57:55 +08:00
Pxl	089fe01aea	[Feature](vectorized alter table) set vectorized alter table to default open (#11897 )	2022-08-19 10:57:00 +08:00
chenlinzhong	7a505cf040	[remote-udaf](optimize) Optimize RPC exception handling logic (#11680 )	2022-08-19 10:25:01 +08:00
Xinyi Zou	fcae979798	[fix](memtracker) Fix PartitionedAggregationNode DCHECK when mem exceed limit (#11902 )	2022-08-19 09:56:49 +08:00
Yongqiang YANG	8eb9ac3b04	[impovement](sink) print load_id when sink fails (#11893 )	2022-08-19 08:48:02 +08:00
slothever	124b4f7694	[feature-wip](parquet-reader) row group reader ut finish (#11887 ) Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-08-18 17:18:14 +08:00
Pxl	c0dc51b453	[Bug](Vectorzed alter table)modify schema change cast validate (#11864 )	2022-08-18 16:05:48 +08:00
Gabriel	1da39771e3	[Bug](runtime filter) Fix bug for runtime filter in concurrent scanners (#11848 )	2022-08-18 14:47:08 +08:00
Gabriel	b8a33d2629	[Improvement](load) turn `enable_vectorized_load` on by default (#11833 )	2022-08-18 14:43:09 +08:00
Pxl	cac317430f	[Bug](aggregation) fix core dump on 2nd phase aggregate (#11843 )	2022-08-18 14:42:34 +08:00
Xinyi Zou	b300b4faa0	[enhancement](memtracker) Optimize readability of mem exceed limit error message #11877	2022-08-18 14:39:41 +08:00
HappenLee	d505d1a5ae	[Vectorized](compaction) filter delete data in base compaction (#11721 ) * [Vectorized](compaction) filter delete data in base compaction Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-08-18 14:22:59 +08:00
Stalary	e1a1a04c2f	[Enhancement](Doe) Be query es use fe generate dsl. (#11840 )	2022-08-18 10:31:17 +08:00
lihangyu	cfb90b39c7	(vec-stream-load-json) simdjson throw execption lead to core dump (#11880 ) when config::enable_simdjson_parser=true in vec streamload, may lead to core dump when json input invalid format string like '{ "a', or all the fields is null like '{}', this may lead to simdjson lib throw some unhandled expection like `Objects and arrays can only be iterated when they are first encountered`.We should take care of these cases Signed-off-by: eldenmoon <15605149486@163.com>	2022-08-18 10:27:34 +08:00
zxealous	881670566c	[fix]Fix the coredump when an IOError occurs in be (#11857 )	2022-08-18 09:13:41 +08:00
AlexYue	8b10a1a3f7	[enhancement](VSlotRef) enhance column_id check in execute function during runtime (#11862 ) The column id check in VSlotRef::execute function before is too strict for fuzzy test to continuously produce random query. Temporarily loosen the check logic. Moreover, there exists some careless call to VExpr::get_const_col, it might return a nullptr but not every function call checks if it's valid. It's an underlying problem.	2022-08-18 09:12:26 +08:00
HappenLee	582be130dd	[Feature] (ODBC) support read/write emoji of utf16 via odbc table (#11863 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-08-18 09:09:02 +08:00
yixiutt	11dc5cad83	[feature-wip](unique-key-merge-on-write) add min/max key in segment (#11830 ) some feature: 1. add min max key in segment footer to speed up get_row_ranges_by_keys 2. do not load pk bloom filter in query Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-17 18:11:39 +08:00
AlexYue	50ef6e35be	[enhancement](RowDescriptor) enhance tuple_idx check during runtime (#11835 )	2022-08-17 17:50:48 +08:00
zhangstar333	7df8c6f493	[vectorized](improvement) improve agg function of bitmap_union with f… (#11822 ) * [vectorized](improvement) improve agg function of bitmap_union with fastuinon	2022-08-17 14:13:01 +08:00
Gabriel	18b84b2dfe	[Bug](compile) fix compiling problem (#11851 ) fix compiling problem	2022-08-17 13:44:57 +08:00
Gabriel	ba3e0b3f96	[feature](compaction) allow to set disable_auto_compaction for tables (#11743 )	2022-08-17 11:05:47 +08:00
Xin Liao	12c4d1f4dd	[feature-wip](unique-key-merge-on-write) unique key table with MOW supports sequence column (#11808 )	2022-08-17 10:56:14 +08:00
Xin Liao	c3e6a841c1	[feature-wip](unique-key-merge-on-write) fix that sort segments by segment id in descending order (#11811 )	2022-08-17 10:54:30 +08:00
wangbo	3a49156e30	[performance] (vectorization)optimize In Expr (#11826 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-08-17 10:46:37 +08:00
yiguolei	c715209a7e	[refactor](dpp) remove original dpp writer (#11838 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-17 10:42:29 +08:00
Lightman	3e13b7d2c2	[Bugfix](light-shema-change) fix _finish_clone dead lock (#11823 ) In engine_clone_task.cpp, it use tablet->tablet_schema() to create rowset, but in the method, it need a lock that already locked in engine_clone_task.cpp:514. It use cloned_tablet_meta->tablet_schema() originally, but modified in #11131. It need to revert to use cloned_tablet_meta->tablet_schema().	2022-08-17 09:10:08 +08:00
camby	fadc78c6cf	[fix](str_to_date) str_to_date support format without leading zero (#11817 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-16 18:23:16 +08:00
plat1ko	30175010c7	Fix nullptr in perform_remote_tablet_gc (#11820 )	2022-08-16 16:50:21 +08:00
slothever	f39f57636b	[feature-wip](parquet-reader) update column read model and add page index (#11601 )	2022-08-16 15:04:07 +08:00
lihangyu	01383c3217	[Enhancement](stream-load-json) using simdjson to parse json (#11665 ) Currently we use rapidjson to parse json document, It's fast but not fast enough compare to simdjson.And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by sprintf, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in _set_column_value we could iterate through the json document by for (auto field: object_val) {xxx}, this is much faster than looking up a field by it's field name like objectValue.FindMember("k1").The third optimization is the at_pointer interface simdjson provided, this could directly get the json field from original document.	2022-08-16 14:49:50 +08:00
plat1ko	fecfdd78bf	[enhancement](status) Fix Status related macros to enable RVO or move ctor (#11753 )	2022-08-16 14:40:35 +08:00
Xinyi Zou	c124470408	[enhancement](memory) Fix too much cache leads to less memory available for queries (#11751 ) Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache. For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock. Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.	2022-08-16 14:35:57 +08:00
Kang	4be6e70f1c	[fix](query) fix orderby keys limit return less or no result (#11757 ) The bug is caused by use _num_rows_read for limit check. _num_rows_read is count of rows read from storage, but may be filtered by filter_block for WHERE predicate. Add a _num_rows_return, which is rows after filter_block for WHERE predicate, for count for really returned rows.	2022-08-16 14:31:47 +08:00
Xinyi Zou	7d836cf0c7	[fix](memtracker) Fix flush memtable to reduce load channel mem not executed (#11771 ) The memory value automatically tracked by the tcmalloc hook in the DeltaWriter is smaller than the value recorded manually in the memtable, because the first 4096-byte Chunk requested by each MemPool when the memtable is initialized is not tracked to the DeltaWriter by the hook. The values of the two are not equal, causing the mem_consumption() == _mem_table->memory_usage branch judgment to fail.	2022-08-16 14:30:45 +08:00
Xinyi Zou	2a1803c646	[enhancement](memtracker) Optimize query memory accuracy (#11740 ) Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory. At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.	2022-08-16 14:23:28 +08:00
yixiutt	573588693c	[bugfix](load) get max versio in read lock (#11806 ) Introduced by #11195。 Get max version from tablet meta should in read lock in multi-thread load。	2022-08-16 12:25:29 +08:00

1 2 3 4 5 ...

2632 Commits