doris

Author	SHA1	Message	Date
zzzzzzzs	6fe060b79e	[fix](streamload) fix http_stream retry mechanism (#24978 ) If a failure occurs, doris may retry. Due to ctx->is_read_schema is a global variable that has not been reset in a timely manner, which may cause exceptions. --------- Co-authored-by: yiguolei <676222867@qq.com>	2023-10-08 11:16:21 +08:00
bobhan1	642e5cdb69	[Fix](Status) Make `Status` `[[nodiscard]]` and handle returned `Status` correctly (#23395 )	2023-09-29 22:38:52 +08:00
zhengyu	d23bedf170	[fix](single-replica-load) fix duplicated done run in request_slave_tablet_pull_rowset (#25013 ) BE will crash because done run twice when try_offer() failed in request_slave_tablet_pull_rowset. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-09-28 21:08:18 +08:00
Lijia Liu	864a0f9bcb	[opt](pipeline) Make pipeline fragment context send_report asynchronized (#23142 )	2023-09-28 17:55:53 +08:00
Xinyi Zou	fc12362a6d	[feature-wip](arrow-flight)(step2) FE support Arrow Flight server (#24314 ) This is a POC, the design documentation will be updated soon	2023-09-20 14:42:54 +08:00
Yongqiang YANG	3cac6806b4	[fix](txn) persist txn record of single replica load and ccr ingestion (#24543 ) Otherwise txn would be dropped when a be reboots.	2023-09-19 15:10:38 +08:00
plat1ko	b9ddcbf729	[feature](merge-cloud) Rewrite code related to IOContext (#24269 )	2023-09-15 19:57:58 +08:00
yiguolei	9c681692bd	Revert "[fix] fix http_stream retry mechanism (#23969 )" (#24407 ) This reverts commit 05e365ea137eb8c92b8e7eedc7d1435e83f065ae.	2023-09-15 10:07:53 +08:00
zzzzzzzs	05e365ea13	[fix] fix http_stream retry mechanism (#23969 ) Co-authored-by: yiguolei <676222867@qq.com>	2023-09-14 21:41:11 +08:00
meiyi	82dc970916	[feature](insert) Support group commit insert (#22829 )	2023-09-08 15:51:03 +08:00
yiguolei	f2ebe65ea4	[enhancement](exchange) not use thread pool to handle exchange block (#23970 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-09-07 19:40:48 +08:00
plat1ko	25b6e4deb2	[fix](daemon) Fix incorrect initialization order of daemon services (#23578 ) Current initialization dependency: Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo │ │ BackendService ─┘ However, original code incorrectly initialize Daemon before StorageEngine. This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.	2023-08-31 19:46:38 +08:00
hzq	c083336bbe	[Improvement](pipeline) Cancel outdated query if original fe restarts (#23582 ) If any FE restarts, queries that is emitted from this FE will be cancelled. Implementation of #23704	2023-08-31 17:58:52 +08:00
abmdocrt	da9eb79ac4	[Enhancement](Schema hash) Remove schema hash in tablet info (#23516 )	2023-08-29 10:05:12 +08:00
Siyang Tang	650cc25ea4	[fix](light-schema-change) fix schema consistency check failed (#23283 )	2023-08-28 16:40:30 +08:00
Mingyu Chen	40be6a0b05	[fix](hive) do not split compress data file and support lz4/snappy block codec (#23245 ) 1. do not split compress data file Some data file in hive is compressed with gzip, deflate, etc. These kinds of file can not be splitted. 2. Support lz4 block codec for hive scan node, use lz4 block codec instead of lz4 frame codec 4. Support snappy block codec For hadoop snappy 5. Optimize the `count()` query of csv file For query like `select count() from tbl`, only need to split the line, no need to split the column. Need to pick to branch-2.0 after this PR: #22304	2023-08-26 12:59:05 +08:00
Kaijie Chen	98d0a2f6c1	[feature](move-memtable)[3/7] add load stream manager and rpc service (#23415 ) Co-authored-by: zhengyu <freeman.zhang1992@gmail.com> Co-authored-by: Yongqiang YANG <dataroaring@gmail.com> Co-authored-by: laihui <1353307710@qq.com>	2023-08-25 00:08:04 +08:00
zhengyu	d4642b47b4	[fix](InternalService) add short-cut return when offer failed (#23239 ) During offer_failed(), rpc done will be executed so the response will be sent and released. Further access to such object will cause NPE error. So let us just return after offer_failed().	2023-08-21 21:00:49 +08:00
Chuanle Chen	71807ceb5f	[Enhancement](tvf) Table value function support reading local file (#17404 ) I tested the local tvf with tpch queries. First, generate `lineitem` datasets with 6001215 rows, and load it into `lineitem` table by: ``` insert into lineitem select c11, c1, c4, c2, c3, c5, c6, c7, c8, c9, c10, c12, c13, c14, c15, c16 from local( "file_path" = "tools/tpch-tools/bin/tpch-data/lineitem.tbl.1", "backend_id" = "10003", "format" = "csv", "column_separator" = "\|" ); ``` Then, run `q1` and `q16` tpch queries, the query result is correct. It can also analyze the BE's log directly like: ``` mysql> select * from local( "file_path" = "log/be.out", "backend_id" = "10006", "format" = "csv") where c1 like "%start_time%" limit 10; +--------------------------------------------------------+ \| c1 \| +--------------------------------------------------------+ \| start time: 2023年 08月 07日星期一 23:20:32 CST \| \| start time: 2023年 08月 07日星期一 23:32:10 CST \| \| start time: 2023年 08月 08日星期二 00:20:50 CST \| \| start time: 2023年 08月 08日星期二 00:29:15 CST \| +--------------------------------------------------------+ ```	2023-08-10 20:07:42 +08:00
zzzzzzzs	66784cef71	[Enhancement](Load) Stream Load using SQL (#22509 ) This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first. thanks @Cai-Yao @yiguolei	2023-08-08 13:49:04 +08:00
Pxl	7839a0e708	[Bug](brpc) fix brpc failed on big query came concurrently (#22600 ) fix PriorityThreadPool get_info get wrong number change brpc pool from priority to fifo do not use brpc pool when send eos	2023-08-05 21:24:32 +08:00
Mingyu Chen	1ed1b69485	[refactor](reader) move reader from vec/exec/scan to vec/exec/format (#22371 ) This readers should be in vec/exec/format	2023-08-04 09:47:20 +08:00
Pxl	c4cee5122b	[Chore](brpc) make error messages more verbose when brpc pool offer failed (#22558 )	2023-08-03 22:02:37 +08:00
Pxl	3d0d7a427b	[Chore](brpc) display pool name when try offer failed (#22514 )	2023-08-02 22:31:33 +08:00
Xinyi Zou	c25b9071ad	[opt](conf) Modify brpc work pool conf default value #22406 Default, if less than or equal 32 core, the following are 128, 128, 10240, 10240 in turn. if greater than 32 core, the following are core num * 4, core num * 4, core num * 320, core num * 320 in turn brpc_heavy_work_pool_threads brpc_light_work_pool_threads brpc_heavy_work_pool_max_queue_size brpc_light_work_pool_max_queue_size	2023-07-31 20:38:34 +08:00
Xinyi Zou	3b1be39033	[fix](load) load core dump print load id (#22388 ) save the load id to the thread context, expect all task ids to be saved in thread context, compaction/schema change/etc.	2023-07-31 18:29:38 +08:00
lihangyu	0cc3232d6f	[Improve](topn opt) modify fetch rpc timeout from 20s to 30s, since fetch is quite heavy sometimes (#22163 )	2023-07-28 17:56:18 +08:00
Pxl	19ba6bec38	[Improvement](pipeline) support send eos on local exchange and remove some unused code (#22086 ) support send eos on local exchange and remove some unused code	2023-07-24 09:25:32 +08:00
HHoflittlefish777	c6063ed92f	[Revert](lazy open) revert lazy open and add case (#21821 )	2023-07-18 19:41:33 +08:00
lihangyu	9cad929e96	[Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. (#21741 ) * [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. related pr #20732 There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large. * not use unused rowsets	2023-07-13 11:46:12 +08:00
Pxl	ca71048f7f	[Chore](status) avoid empty error msg on status (#21454 ) avoid empty error msg on status	2023-07-11 13:48:16 +08:00
Mingyu Chen	4ad3a7a8de	[fix](exec) run exec_plan_fragment in pthread to avoid BE crash (#21343 ) If there is only one fragment of a query plan, FE will call `exec_plan_fragment` rpc to BE. And on BE side, the `exec_plan_fragment()` will be executed directly in bthread, but it may call some JNI method like `AttachCurrentThread()`, which will return error in bthread. So I modify the `exec_plan_fragment` to make sure it will be executed in pthread pool.	2023-07-01 12:29:22 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
Xinyi Zou	622ef63c69	[fix](memory) fix `bthread_setspecific` error in rpc done.run() (#20999 )	2023-06-20 21:00:45 +08:00
Pxl	01e53f4e67	[Bug](materialized-view) fix problems about create mv on ssb_flat q4.1 failed (#20658 ) fix problems about create mv on ssb_flat q4.1 failed	2023-06-15 14:38:21 +08:00
Pxl	a0d4f11667	[Bug](function) catch error state in function cast to avoid core dump (#20751 ) catch error state in function cast to avoid core dump	2023-06-14 17:34:34 +08:00
Qi Chen	73ad885e19	[Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679 ) After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables. Support hive3 transactional hive full acid tables. Hive2 transactional hive full acid tables need to run major compactions.	2023-06-13 08:55:16 +08:00
Pxl	ab7ac31d89	[Chore](case) fix failed on test_big_pad when enable pipeline engine #20644	2023-06-12 09:15:55 +08:00
Xinyi Zou	e801e3b737	[fix](memory) Fix crash at `bthread_setspecific` in `brpc::Socket::CheckHealth()` (#20450 ) Only switch to bthread local when modifying the mem tracker in the thread context. No longer switches to bthread local by default when bthread starts mem tracker increases brpc IOBufBlockMemory memory remove thread mem tracker metrics	2023-06-08 19:48:19 +08:00
Pxl	fbbf4c420e	[Bug](Agg-State) fix agg state function get wrong input argument list (#20546 ) fix agg state function get wrong input argument list	2023-06-07 17:32:48 +08:00
wangbo	65100d8083	[improvement](profile)add max/min rpc time (#20339 )	2023-06-06 12:03:01 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
airborne12	ac8599fedb	[Fix](single replica load) fix indices_size key not found core (#20047 )	2023-05-27 13:28:07 +08:00
lihangyu	317338913c	[Bug](topn) Fix topn fetch set real default value (#20074 ) 1. Before this PR if rowset does not contain column which should be read for related SlotDescriptor will call `insert_default` to column, but it's not this real defautl value.Real default value relevant information should be provided by the frontend side. 2. Support fetch when light schema change is not enabled, but disable for AGG or UNIQUE MOR model	2023-05-26 16:06:55 +08:00
yiguolei	0ed817ed1a	[improvement](status) should send query timeout status to be, instead of internal error (#20016 ) If a query is cancelled, the reason is very unclear and we do not know the call stack. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-26 15:11:17 +08:00
airborne12	303bee6fa3	[Fix](single replica load) add inverted index copy for single replica load (#19663 ) * [Fix](single replica load) add inverted index copy for single replica load	2023-05-18 14:13:41 +08:00
lihangyu	e22f5891d2	[WIP](row store) two phase opt read row store (#18654 )	2023-05-16 13:21:58 +08:00
HHoflittlefish777	f8ef25bb10	[enhancement](load) lazy-open necessary partitions when load (#18874 )	2023-05-14 16:09:55 +08:00
yiguolei	69ebb90225	[bugfix](core) be will core when coordinator callback (#19497 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-10 21:46:43 +08:00

1 2 3 4

155 Commits