doris

Author	SHA1	Message	Date
HappenLee	8ba2d79fe1	[Bug] Change DateTimeValue Memmory Layout To Old (#7022 ) Change DateTimeValue Memmory Layout To Old to fix compatibility problems	2021-11-08 21:56:14 +08:00
Pxl	29ca77622f	[Refactor] Refactor part of RuntimeFilter's code (#6998 ) #6997	2021-11-07 17:40:45 +08:00
ccoffline	ca8268f1c9	[Feature] Extend logger interface, support structured log output (#6600 ) Support structured logging.	2021-11-07 17:39:53 +08:00
Mingyu Chen	4f13f98424	[Bug] Fix bug that memtracker in delta writer will be visited before initializd. (#7013 )	2021-11-06 13:29:49 +08:00
Zhengguo Yang	5ca271299a	[refactor] set `forward_to_master` true by default (#7017 ) * ot set forward_to_master true by default * Update docs/zh-CN/administrator-guide/variables.md	2021-11-06 13:27:26 +08:00
Zhengguo Yang	760fc02bfe	Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache (#6916 ) Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache add a config used for auto check and reset bprc stub	2021-11-05 09:45:37 +08:00
Mingyu Chen	db1c281be5	[Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947 ) ## Case In the load process, each tablet will have a memtable to save the incoming data, and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then a new memtable will be created to save the following data/ Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`. If N is large, it will cost too much memory. So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will try to flush all current memtables to disk(even if their size are not reach 100MB). So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller than 100MB, resulting in too many small segment files. ## Solution When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part of them. For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach 20MB, the total size reach 1GB, and flush will occur. If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger than 20MB. The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough. In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB, after modification, the average size is 82MB	2021-11-01 10:51:50 +08:00
Mingyu Chen	e8cabfff27	[S3] Support path style endpoint (#6962 ) Add a use_path_style property for S3 Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property Fix some S3 URI bugs Add some logs for tracing load process.	2021-11-01 10:48:10 +08:00
Pxl	d4249e4f2d	[Bug] fix Runtime filter can't find fragment-id when apply_filter called early (#6923 ) #6921	2021-10-27 09:54:52 +08:00
Mingyu Chen	adb6bfdf74	[Bug] Fix bug that truncate table may change the storage medium property (#6905 )	2021-10-25 10:07:27 +08:00
Mingyu Chen	ed7a873a44	[Memory Usage] Implement segment lru cache to save memory of BE (#6829 )	2021-10-25 10:07:15 +08:00
HappenLee	da99749e7f	[Bug] Fix bug that BE will crash when backup using S3 (#6855 )	2021-10-17 22:54:42 +08:00
Xinyi Zou	eff076b355	[BUG] Fix printing ReservationTrackerCounters cause BE crash when mem_limit is reached (#6849 ) When the memory usage of BE reaches mem_limit, printing ReservationTrackerCounters through MemTracker may cause BE crash in high concurrency. ReservationTrackerCounters is not actually used in the current Doris, and the memory tracker in Doris will be redesigned in the future.	2021-10-16 21:57:09 +08:00
Mingyu Chen	59017cebe6	[ARM64] Fix some problem when compiling on ARM64 platform (#6836 ) 1. Refactor the create method of hdfs reader & writer. libhdfs3 does not support arm64. So we should not support hdfs reader & writer on arm64. 2. And micro for LowerUpperImpl	2021-10-16 21:56:49 +08:00
Zhengguo Yang	24d38614a0	[Dependency] Upgrade thirdparty libs (#6766 ) Upgrade the following dependecies: libevent -> 2.1.12 OpenSSL 1.0.2k -> 1.1.1l thrift 0.9.3 -> 0.13.0 protobuf 3.5.1 -> 3.14.0 gflags 2.2.0 -> 2.2.2 glog 0.3.3 -> 0.4.0 googletest 1.8.0 -> 1.10.0 snappy 1.1.7 -> 1.1.8 gperftools 2.7 -> 2.9.1 lz4 1.7.5 -> 1.9.3 curl 7.54.1 -> 7.79.0 re2 2017-05-01 -> 2021-02-02 zstd 1.3.7 -> 1.5.0 brotli 1.0.7 -> 1.0.9 flatbuffers 1.10.0 -> 2.0.0 apache-arrow 0.15.1 -> 5.0.0 CRoaring 0.2.60 -> 0.3.4 orc 1.5.8 -> 1.6.6 libdivide 4.0.0 -> 5.0 brpc 0.97 -> 1.0.0-rc02 librdkafka 1.7.0 -> 1.8.0 after this pr compile doris should use build-env:1.4.0	2021-10-15 13:03:04 +08:00
Mingyu Chen	5ef3f59928	[Optimize][RoutineLoad] Avoid sending tasks if there is no data to be consumed (#6805 ) 1 Avoid sending tasks if there is no data to be consumed By fetching latest offset of partition before sending tasks.(Fix [Optimize] Avoid too many abort task in routine load job #6803 ) 2 Add a preCheckNeedSchedule phase in update() of routine load. To avoid taking write lock of job for long time when getting all kafka partitions from kafka server. 3 Upgrade librdkafka's version to 1.7.0 to fix a bug of "Local: Unknown partition" See offsetsForTimes fails with 'Local: Unknown partition' edenhill/librdkafka#3295 4 Avoid unnecessary storage migration task if there is no that storage medium on BE. Fix [Bug] Too many unnecessary storage migration tasks #6804	2021-10-13 11:39:01 +08:00
Mingyu Chen	ad3c9390a2	[Bug] Fix bdbje getDatabaseNames() bug and scan node close bug (#6769 ) 1. This bug is introduced from #6582 2. Optimize the error log of Address used used error msg. 3. Add some document about compilation. 1. Add a custom thirdparty download url. 2. Add a custom com.alibaba maven jar package for DataX. 4. Fix bug that BE crash when closing scan node, introduced from #6622.	2021-09-29 11:11:28 +08:00
EmmyMiao87	bdc8c98008	[Outfile] Support hdfs in select outfile clause (#6644 ) Support hdfs in select outfile clause without broker. This PR implement a HDFS writer in BE which is used to write HDFS file directly without using broker. Also the hdfs outfile clause syntax check has been added in FE. The syntax: ``` select * from xx into outfile "hdfs://user/outfile_" format as csv properties ("hdfs.fs.dafultFS" = "xxx", "hdfs.hdfs_user" = "xxx"); ``` Note that all hdfs configurations need to carry a prefix `hdfs.`.	2021-09-24 10:07:11 +08:00
Zhengguo Yang	5c45e26644	Fixed zone map init error for string type (#6667 ) Fixed the problem that the StringValue memory generated by Expr may be released before use Fixed from_string for String type may overflow	2021-09-23 09:44:22 +08:00
Mingyu Chen	521fb15a9b	[Bug] Fix some memory bugs (#6699 ) 1. Fix a memory leak in `collect_iterator.cpp` (Fix #6700) 2. Add a new BE config `max_segment_num_per_rowset` to limit the num of segment in new rowset.(Fix #6701) 3. Make the error msg of stream load more friendly.	2021-09-22 12:30:14 +08:00
Zhengguo Yang	332ba4cded	[config] use thrift_rpc_timeout_ms config replace hard code value (#6637 ) use thrift_rpc_timeout_ms config to replace hard code value	2021-09-16 10:22:57 +08:00
Zhengguo Yang	61c9d11fdb	support change column type from decimal to string (#6643 )	2021-09-14 15:56:44 +08:00
Yunfeng,Wu	b3ae607fe9	[Sprak-Doris-Connector] support boolean data type (#6601 ) 1. Support boolean data type for spark-doris-connector because Doris has previously supported the boolean data type 2. Bug-Fix for the Doris BE core when spark request data from be	2021-09-12 10:07:23 +08:00
Mingyu Chen	b2f1e21a3b	[Bugs] Fix some bugs (#6586 ) * fix regex lazy * fix result file core * fix dynamic partition replica and table name length bug * fix replicanum 0 * fix delete bug * renew proxy Co-authored-by: morningman <chenmingyu@baidu.com>	2021-09-10 09:53:30 +08:00
Zhengguo Yang	4f744333c2	fix some core in local test: (#6594 ) 1. insert very large string value may coredump 2. some analitic functiuon and agg function result may be incorrect 3. string compare may be coredump when string type is too large 4. string type in delete condition can not process correctly 5. add text/blob as alias of string to compitable with mysql 6. fix string type min/max agg may process incorrectly	2021-09-10 09:52:03 +08:00
Mingyu Chen	74ddea8d83	[Optimize] Remove some unused code to reduce lock contention (#6566 ) 1. Remove global runtime profile counter 2. Remove unused thread token register	2021-09-07 11:56:12 +08:00
EmmyMiao87	9469b2ce1a	[Outfile] Support concurrent export of query results (#6539 ) This pr mainly supports 1. Export query result sets concurrently 2. Query result set export supports s3 protocol Among them, there are several preconditions for concurrently exporting query result sets 1. Enable concurrent export variables 2. The query itself can be exported concurrently (some queries containing sort nodes at the top level cannot be exported concurrently) 3. Export the s3 protocol used instead of the broker After exporting the result set concurrently, the file prefix is changed to outfile_{query_instance_id}_filenumber.{file_format}	2021-09-07 11:53:32 +08:00
Zhengguo Yang	9f7d4cf741	[BUG] fix bugs with string type (#6538 ) * fix bugs with string type 1. not support string with agg type min/max 2. agg_update with large string may coredump 3. stringval with large string may coredump 4. not support string as partition key	2021-09-01 15:59:55 +08:00
caiconghui	0393c9b3b9	[Optimize] Support send batch parallelism for olap table sink (#6397 ) * Support send batch parallelism for olap table sink Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-30 11:03:09 +08:00
Mingyu Chen	3f2fdd236f	Add scan thread token (#6443 )	2021-08-27 10:56:17 +08:00
Mingyu Chen	fa290383dc	[Doc] Modify README to add some statistical indicators (#6486 ) 1. Add license/total line/release badegs. 2. Add monthly active contributor and contributor growth graph 3. fix a pom.xml bug 4. Modify some routine load log on BE side	2021-08-25 09:36:26 +08:00
caiconghui	7e30b28f3a	[Optimize] Speed up converting the data of other types to string in mysql_result_writer (#6384 ) Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-24 22:30:58 +08:00
Zhengguo Yang	146060dfc0	[Bug]Fix result_writer may coredump (#6482 ) fix result_writer may coredump, let BufferControlBlock owns the memory	2021-08-22 22:04:00 +08:00
Mingyu Chen	fa382f8602	[Bug][MemLimit] Modify the memory limit of storage page cache (#6451 ) This CL mainly changes: 1. the `storage_page_cache_limit` is based on config `mem_limit` the default is 20% of `mem_limit`. 2. the `buffer_pool_limit` is based on config `mem_limit` the default is 20% of `mem_limit`. 3. the `buffer_pool_clean_pages_limit` is based on config `buffer_pool_limit` the default is 50% of `buffer_pool_limit` 4. Fix some show bugs of lru cache hit ratio and usage ratio 5. Fix a create view bug that `notEvalNondeterministicFunction` should be reset after analyze.	2021-08-19 14:16:53 +08:00
Zhengguo Yang	0c5c3f7d87	Fixed the problem that there may be redundant retries when the query result export fails (#6436 )	2021-08-18 09:06:02 +08:00
Zhengguo Yang	8738ce380b	Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391 )	2021-08-18 09:05:40 +08:00
Mingyu Chen	2030c44dba	[Log] Modify some log level on BE side (#6381 )	2021-08-14 10:25:45 +08:00
HappenLee	9216735cfa	[New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329 ) 1. FE vectorized plan code 2. Function register vec function 3. Diff function nullable type 4. New thirdparty code and new thrift struct	2021-08-11 14:54:06 +08:00
zhangstar333	612684fb2e	[DOC]Add a profile counter of local exchange send bytes (#6372 ) Add a profile counter of local exchange send bytes: LocalBytesSent	2021-08-07 21:32:44 +08:00
caiconghui	d1007afe80	Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient (#6361 ) * [Optimize] optimize the speed of converting integer to string * Use fmt and std::from_chars to make convert integer to string and convert string to integer more efficient Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-04 10:55:19 +08:00
Mingyu Chen	2c208e932b	[Bug][RoutineLoad] Avoid TOO_MANY_TASKS error (#6342 ) Use `commitAsync` to commit offset to kafka, instead of using `commitSync`, which may block for a long time. Also assign a group.id to routine load if user not specified "property.group.id" property, so that all consumer of this job will use same group.id instead of a random id for each consume task.	2021-08-03 11:59:06 +08:00
weizuo93	cf1fcdd614	fix BE coredump in UserFunctionCache (#6331 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-07-30 09:24:30 +08:00
HappenLee	6597a338dc	[Feature] Support config max length of zone map index (#6293 )	2021-07-30 09:23:11 +08:00
Zhengguo Yang	5a7237062f	remove palo_internal_service.proto and PInternalService from code base, because it is not used now (#6341 )	2021-07-30 09:22:50 +08:00
stdpain	776df2effc	[BUG][stack-buffer-overflow] fix overflow while calculate hash code in ArrayType and fix some warning	2021-07-27 13:41:00 +08:00
HappenLee	02a00cdf35	[Bug] Fix the bug in `from_date_format_str` function (#6273 )	2021-07-21 12:31:37 +08:00
pengxiangyu	7592f52d2e	[Feature][Insert] Add transaction for the operation of insert #6244 (#6245 ) ## Proposed changes Add transaction for the operation of insert. It will cost less time than non-transaction(it will cost 1/1000 time) when you want to insert a amount of rows. ### Syntax ``` BEGIN [ WITH LABEL label]; INSERT INTO table_name ... [COMMIT \| ROLLBACK]; ``` ### Example commit a transaction: ``` begin; insert into Tbl values(11, 22, 33); commit; ``` rollback a transaction: ``` begin; insert into Tbl values(11, 22, 33); rollback; ``` commit a transaction with label: ``` begin with label test_label; insert into Tbl values(11, 22, 33); commit; ``` ### Description ``` begin: begin a transaction, the next insert will execute in the transaction until commit/rollback; commit: commit the transaction, the data in the transaction will be inserted into the table; rollback: abort the transaction, nothing will be inserted into the table; ``` ### The main realization principle: ``` 1. begin a transaction in the session. next sql is executed in the transaction; 2. insert sql will be parser and get the database name and table name, they will be used to select a be and create a pipe to accept data; 3. all inserted values will be sent to the be and write into the pipe; 4. a thread will get the data from the pipe, then write them to disk; 5. commit will complete this transaction and make these data visible; 6. rollback will abort this transaction ``` ### Some restrictions on the use of update syntax. 1. Only ```insert``` can be called in a transaction. 2. If something error happened, ```commit``` will not succeed, it will ```rollback``` directly; 3. By default, if part of insert in the transaction is invalid, ```commit``` will only insert the other correct data into the table. 4. If you need ```commit``` return failed when any insert in the transaction is invalid, you need execute ```set enable_insert_strict = true``` before ```begin```.	2021-07-21 10:54:11 +08:00
qiye	a1a37c8cba	[Feature] Support calc constant expr by BE (#6233 ) At present, some constant expression calculations are implemented on the FE side, but they are incomplete, and some expressions cannot be completely consistent with the value calculated by BE (such as part of the time function) Therefore, we provide a way to pass all the constants in SQL to BE for calculation, and then begin to analyze and plan SQL. This method can also solve the problem that some complex constant calculations issued by BI cannot be processed on the FE side. Here through a session variable enable_fold_constant_by_be to control this function, which is disabled by default.	2021-07-19 10:25:53 +08:00
HappenLee	fae3eff2e6	[Bug] Fix the bug of cast string to datetime return not null (#6228 )	2021-07-17 10:55:08 +08:00
HappenLee	8de09cbd21	[Bug-fix] Decimal Divide, Mod Zero Result Should be NULL. (#6051 )	2021-07-17 10:43:06 +08:00

1 2 3 4 5 ...

405 Commits