doris

Author	SHA1	Message	Date
Mingyu Chen	cd8f0713ea	[refactor](new-scan) remove old vectorized scan node (#14029 )	2022-11-09 08:39:20 +08:00
Kang	151842a1fe	[feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430 ) Introduce a SQL syntax for creating inverted index and related metadata changes. ``` -- create table with INVERTED index CREATE TABLE httplogs ( ts datetime, clientip varchar(20), request string, status smallint, size int, INDEX idx_size (size) USING INVERTED, INDEX idx_status (status) USING INVERTED, INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none") ) DUPLICATE KEY(ts) DISTRIBUTED BY RANDOM BUCKETS 10 -- add an INVERTED index to a table CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english"); ```	2022-11-08 23:46:53 +08:00
Tiewei Fang	826cfdaf93	[feature](information_schema) add `backends` information_schema table (#13086 )	2022-11-08 22:15:10 +08:00
Pxl	ae3c513d74	use extern template to date_time_add (#13970 )	2022-11-08 22:11:41 +08:00
luozenglin	115c6bd411	[fix](keyranges) fix the split error of keyranges (#14049 ) fix the split error of keyranges	2022-11-08 22:09:16 +08:00
Pxl	df89e46761	[fix](build) fix compile fail on Segment::open (#14058 )	2022-11-08 14:38:40 +08:00
zhangstar333	f7ecb6d79f	[Bug](Bitmap) fix sub_bitmap calculate wrong result to return null (#13978 ) fix sub_bitmap calculate wrong result to return null	2022-11-08 14:10:12 +08:00
slothever	c2a01e84b4	[feature-wip](multi-catalog) fix page index filter bug (#14015 ) Fix page index filter not take effect when multiple columns Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-11-08 12:10:12 +08:00
Pxl	9d8b4bc176	[Enhancement](Dictionary-codec) update dict once on same segment (#13936 ) update dict once on same segment	2022-11-08 10:59:35 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
Zhengguo Yang	e1654bc6ef	[Enhancement](function) add to_bitmap() function with int type (#13973 ) to_bitmap function only support string param only，add to_bitmap() function with int type, this can avoid convert int type to string and then convert string to int	2022-11-08 09:15:26 +08:00
Kang	34f43ac781	[bug](like function)fix like '' (empty string) get wrong result with all rows #14035	2022-11-08 08:51:39 +08:00
luozenglin	6ed443c7e8	[enhancement](profile) add instanceNum, tableIds to profile. (#13985 )	2022-11-08 08:49:16 +08:00
starocean999	95591ce49a	[refactor](cv)wait on condition variable more gently (#12620 )	2022-11-08 08:40:31 +08:00
zhannngchen	d1cbaa1de8	[fix](load) fix a bug that reduce memory work on hard limit might be triggered twice (#13967 ) When the load mem hard limit reached, all load channel should wait on the lock of LoadChannelMgr, util current reduce mem work finished. In current implementation, there's a bug might cause some threads be woke up before reduce mem work finished: thread A found that soft limit reached, picked a load channel and waiting for reduce memory work finish. The memory keep increasing thread B found that hard limit reached (either the load mem hard limit, or process soft limit), it picked a load channel to reduce memory and set the variable _should_wait_flush to true thread C found that _should_wait_flush is true, waiting on _wait_flush_cond thread A finished it's reduce memory work, found that _should_wait_flush is true, set it to false, and notify all threads. thread C is woke up and pick a load channel to do the reduce memory work, and now thread B's work is not finished. We can see 2 threads doing reduce memory work when hard limit reached, it's quite confusing.	2022-11-08 00:07:52 +08:00
yiguolei	32fea672b0	[chore](gutil) remove some gutil macros and solve some macro conflict with brpc (#13954 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-07 13:39:52 +08:00
Yiliang Qiu	e8d2fb6778	[feature](function)add search functions: multi_search_all_positions & multi_match_any (#13763 ) Co-authored-by: yiliang qiu <yiliang.qiu@qq.com>	2022-11-07 11:50:55 +08:00
lihangyu	7ffe88b579	[feature-array](array-type) Add array function array_popback (#13641 ) Remove the last element from array. ``` mysql> select array_popback(['test', NULL, 'value']); +-----------------------------------------------------+ \| array_popback(ARRAY('test', NULL, 'value')) \| +-----------------------------------------------------+ \| [test, NULL] \| +-----------------------------------------------------+ ```	2022-11-07 10:48:16 +08:00
Xinyi Zou	c7b2b90504	[fix](memtracker) Fix DCHECK !std::count(_consumer_tracker_stack.begin(), _consumer_tracker_stack.end(), tracker)	2022-11-06 16:41:03 +08:00
Tiewei Fang	27549564a7	[feature](table-valued-function) Support S3 tvf (#13959 ) This pr does three things： 1. Modified the framework of table-valued-function(tvf). 2. be support `fetch_table_schema` rpc. 3. Implemented `S3(path, AK, SK, format)` table-valued-function.	2022-11-06 11:04:26 +08:00
zhengyu	f29e43fee9	[fix](storage) rm unacessary check (#13986 ) (#13988 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-11-05 23:46:30 +08:00
Gabriel	2ee7ba79a8	[Improvement](javaudf) improve java loader usage (#13962 )	2022-11-05 13:20:04 +08:00
TengJianPing	04830af039	[fix](tablet sink) fallback to non-vectorized interface in tablet_sink if is in progress of upgrding from 1.1-lts to 1.2-lts (#13966 )	2022-11-05 10:19:51 +08:00
Xinyi Zou	f87be09d69	[fix](load) Fix load channel mgr lock (#13960 ) hot fix load channel mgr lock	2022-11-05 00:48:30 +08:00
jiafeng.zhang	a19e6881c7	[chore](be web ui)upgrade jquery version to 3.6.0 (#13942 ) * upgrade jquery version to 3.6.0 * update license dist	2022-11-04 16:20:17 +08:00
zhengyu	554f566217	[enhancement](compaction) introduce segment compaction (#12609 ) (#12866 ) ## Design ### Trigger Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare. ### Target Selection We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction. ### Compaction Process A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings: - build a MergeIterator from the target segments - create a new segment writer - for each block readed from MergeIterator, the Writer append it ### SegID handling SegID must remain consecutive after segment compaction. If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4: - we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3 - delete seg_0, seg_1, seg_2 and seg_3 - rename seg_0-3 to seg_0 - rename seg_4 to seg_1 It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.	2022-11-04 14:12:51 +08:00
Gabriel	948e080b31	[minor](error msg) Fix wrong error message (#13950 )	2022-11-04 13:49:46 +08:00
yinzhijian	e09033276e	[fix](runtime-filter) build thread destruct first may cause probe thread coredump (#13911 )	2022-11-04 09:29:37 +08:00
zhannngchen	698541e58d	[improvement](exec) add more debug info on fragment exec error (#13899 )	2022-11-04 08:55:31 +08:00
Gabriel	9869915279	[refactor](crossjoin) refactor cross join (#13896 )	2022-11-03 22:42:56 +08:00
Gabriel	0a228a68d6	[Improvement](javaudf) support different date argument for date/datetime type (#13920 )	2022-11-03 20:33:20 +08:00
AlexYue	5d7b51dcc2	[BugFix](Concat) output of string concat function exceeds UINT makes crash (#13916 )	2022-11-03 19:44:44 +08:00
camby	ff935ca1a0	[enhancement](chore) remove debug log which is really too frequent #13909 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-03 19:32:44 +08:00
xy720	d183199319	[Bug](array-type) Fix array product calculate decimal type return wrong result (#13794 )	2022-11-03 17:26:34 +08:00
luozenglin	ee934483eb	[Enhancement](function) optimize the `upper` and `lower` functions using the simd instruction. (#13326 ) optimize the `upper` and `lower` functions using the simd instruction.	2022-11-03 15:12:25 +08:00
zhangstar333	5fe3342aa3	[Vectorized](function) support bitmap_to_array function (#13926 )	2022-11-03 14:29:28 +08:00
Gabriel	bfba058ecf	[Feature](join) Support null aware left anti join (#13871 )	2022-11-03 12:11:25 +08:00
TengJianPing	5a700223fe	[fix](function) fix coredump cause by return type mismatch of vectorized repeat function (#13868 ) Will not support repeat function during upgrade in vectorized engine.	2022-11-03 09:53:02 +08:00
Xinyi Zou	32a029d9dc	[enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795 )	2022-11-03 09:47:12 +08:00
Zhengguo Yang	b3c6af0059	[Bugfix](MV) Fixed load negative values into bitmap type materialized views successfully under non-vectorization (#13719 ) * [Bugfix](MV) Fixed load negative values into bitmap type materialized views successfully under non-vectorization	2022-11-03 09:21:38 +08:00
Xin Liao	37e4a1769d	[fix](sequence) fix that update table core dump with sequence column (#13847 ) * [fix](sequence) fix that update table core dump with sequence column * update	2022-11-03 09:02:21 +08:00
Adonis Ling	1ee6518e00	[fix](unique-key-merge-on-write) Types don't match when calling IndexedColumnIterator::seek_at_or_after (#13885 )	2022-11-03 08:50:29 +08:00
Ashin Gau	28a4a8dc17	[fix](storage) evaluate_and of ComparisonPredicateBase has logical error (#13895 )	2022-11-03 08:48:28 +08:00
Mingyu Chen	7b4c2cabb4	[feature](new-scan) support transactional insert in new scan framework (#13858 ) Support running transactional insert operation with new scan framework. eg: admin set frontend config("enable_new_load_scan_node" = "true"); begin; insert into tbl1 values(1,2); insert into tbl1 values(3,4); insert into tbl1 values(5,6); commit; Add some limitation to transactional insert Do not support non-literal value in insert stmt Fix some issue about array type: Forbid cast other non-array type to NESTED array type, it may cause BE crash. Add getStringValueForArray() method for Expr, to get valid string-formatted array type value. Add useLocalSessionState=true in regression-test jdbc url without this config, the jdbc driver will send some init cmd each time it connect to server, such as select @@session.tx_read_only. But when we use transactional insert, after begin command, Doris do not support any other type of stmt except for insert, commit or rollback. So adding this config to let the jdbc NOT send cmd when connecting.	2022-11-03 08:36:07 +08:00
HappenLee	228e5afad8	[Load](Sink) remove validate the column data when data is NULL (#13919 )	2022-11-03 08:33:45 +08:00
qiye	b83744d2f6	[feature](function)add regexp functions: regexp_replace_one, regexp_extract_all (#13766 )	2022-11-02 23:15:57 +08:00
HappenLee	fbc8b7311f	[Opt](function) opt the function of ndv (#13887 )	2022-11-02 22:21:20 +08:00
Jerry Hu	62f765b7f5	[improvement](scan) speed up inserting strings into ColumnString (#13397 )	2022-11-02 22:19:02 +08:00
zhangstar333	374303186c	[Vectorized](function) support topn_array function (#13869 )	2022-11-02 19:49:23 +08:00
Adonis Ling	ba918b40e2	[chore](macOS) Fix compilation errors caused by the deprecated function (#13890 )	2022-11-02 13:34:51 +08:00

1 2 3 4 5 ...

3133 Commits