doris

Author	SHA1	Message	Date
Mingyu Chen	29838f07da	[HTTP][API] Add backends info API for spark/flink connector (#6984 ) Doris should provide a http api to return backends list for connectors to submit stream load, and without privilege checking, which can let common user to use it	2021-11-05 09:43:06 +08:00
pengxiangyu	599ecb1f30	[Function] Add bitmap function bitmap_subset_limit (#6980 ) Add bitmap function bitmap_subset_limit. This function will return subset in specified index.	2021-11-04 12:14:47 +08:00
wei zhao	d19a971582	[Revert] Revert RestService.java (#6994 )	2021-11-04 12:13:18 +08:00
jiafeng.zhang	2351c421b4	Revert "[HTTP][API] Add Backend By Rest API (#6999 )" (#7004 ) This reverts commit f509e936573f8d6fdaf4de036bc3c6abef26a182.	2021-11-04 10:25:09 +08:00
weajun	d268d17f2a	Fix the SQL execution error caused by tablet not being found due to Colocate join (#7002 ) * fixbug sql execution sometimes due to failed to get tablet	2021-11-04 09:21:52 +08:00
wudi	f509e93657	[HTTP][API] Add Backend By Rest API (#6999 ) * [HTTP][API] add backend rest api * [HTTP][API] add backends rest api * change api response Co-authored-by: wudi <wud3@shuhaisc.com>	2021-11-04 09:21:07 +08:00
xy720	aeec9c45e6	[Function] Add bitmap-xor-count function for doris (#6982 ) Add bitmap-xor-count function for doris relate to #6875	2021-11-02 16:37:00 +08:00
flynn	f0a71a067b	[Build] Generate compile_command.json (#6976 ) Set cmake to generate compile_commands.json, which is useful for lsp like clangd, cquery, et.	2021-11-02 16:36:35 +08:00
Lijia Liu	9c24334956	[BUG][Schedule] Fix getMixLoadScore error. (#6975 )	2021-11-02 16:36:05 +08:00
wei zhao	f39a5bc1d0	[Feature] Spark connector supports to specify fields to write (#6973 ) 1. By default , Spark connector must write all fields value to `Doris` table . In this feature , user can specify part of fields to write , even specify the order of the fields to write. eg: I have a table named `student` which has three columns (name,gender,age) , creating table sql as following: ```sql create table student (name varchar(255), gender varchar(10), age int) duplicate key (name) distributed by hash(name) buckets 2; ``` Now , I just want to write values to two columns : name , gender. The code as following: ```scala val df = spark.createDataFrame(Seq( ("m", "zhangsan"), ("f", "lisi"), ("m", "wangwu") )) df.write .format("doris") .option("doris.fenodes", dorisFeNodes) .option("doris.table.identifier", dorisTable) .option("user", dorisUser) .option("password", dorisPwd) //specify your fields or the order .option("doris.write.field", "gender,name") .save() ```	2021-11-02 16:35:29 +08:00
xuliuzhe	aba7d2ccae	[Thirdparty] Fix flatbuffers download url error (#6968 ) Change google flatbuffers download URL in thirdparty.var.sh	2021-11-02 16:34:17 +08:00
GoGoWen	2d10300547	[Bug] Fix schema change fail as memory allocation on row block sorting (#6932 ) schema change fail as memory allocation fail on row block sorting. however, it should do internal sorting first before schema change fail as memory allocation fail on row block sorting in case there are enough memory after internal sorting.	2021-11-02 16:33:38 +08:00
tarepanda1024	019e60e7bc	[BUG] fix Calc capacityCoefficient mistake #6898 (#6899 ) fix #6898	2021-11-02 16:32:44 +08:00
zhangstar333	1ff3d708ca	[Function] add functions of bitmap_and/or_count (#6912 ) issue #6875 add bitmap_and_count/ bitmap_or_count	2021-11-01 14:00:07 +08:00
luozenglin	c7a3116f98	[Function] add bitmap function of bitmap_has_all (#6918 ) The 'bitmap_has_all' function returns true if the first bitmap contains all the elements of the second bitmap.	2021-11-01 12:50:47 +08:00
wei zhao	210625b358	[Doc] Update fe-idea developer guide for latest version (#6963 )	2021-11-01 11:42:13 +08:00
qiye	65ded82778	[Function] add BE bitmap function bitmap_subset_in_range (#6917 ) Add bitmap function bitmap_subset_in_range. This function will return subset in specified range (not include the range_end).	2021-11-01 11:05:19 +08:00
Mingyu Chen	db1c281be5	[Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947 ) ## Case In the load process, each tablet will have a memtable to save the incoming data, and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then a new memtable will be created to save the following data/ Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`. If N is large, it will cost too much memory. So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will try to flush all current memtables to disk(even if their size are not reach 100MB). So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller than 100MB, resulting in too many small segment files. ## Solution When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part of them. For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach 20MB, the total size reach 1GB, and flush will occur. If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger than 20MB. The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough. In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB, after modification, the average size is 82MB	2021-11-01 10:51:50 +08:00
jiafeng.zhang	80f61c823b	Docker 1.4.1 Compile Environment, First Compile Description (#6943 )	2021-11-01 10:49:45 +08:00
Mingyu Chen	e8cabfff27	[S3] Support path style endpoint (#6962 ) Add a use_path_style property for S3 Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property Fix some S3 URI bugs Add some logs for tracing load process.	2021-11-01 10:48:10 +08:00
Pxl	28030294f7	[Feature] Support bitmap_and_not & bitmap_and_not_count (#6910 ) Support bitmap_and_not & bitmap_and_not_count.	2021-11-01 10:11:54 +08:00
xy720	f47919136a	[Bug] Fix failure to stop sync job (#6950 )	2021-10-30 18:17:15 +08:00
zhuixun	a842d41b87	[Function] add BE bitmap function bitmap_max (#6942 ) Support bitmap_max.	2021-10-30 18:16:38 +08:00
HappenLee	c3b133bdb3	[Refactor] Refactor the reader code (#6866 ) 1. Removed useless redundant code logic 2. Change reader to interface, add tuple reader to simplify the structure of reader	2021-10-30 18:15:28 +08:00
wei zhao	466cd5dd09	[Optimize] Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x (#6956 ) * Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-10-29 17:06:05 +08:00
jiafeng.zhang	1f65de1a5d	Fix spark connector build error (#6948 ) pom.xml error	2021-10-29 14:59:05 +08:00
wunan1210	addfff74c4	support use char like \x01 in flink-doris-sink column & line delimiter (#6937 ) * support use char like \x01 in flink-doris-sink column & line delimiter * extend imports * add docs	2021-10-29 13:56:52 +08:00
EmmyMiao87	df43752257	[Docs] Fix error KEY url (#6955 )	2021-10-29 12:07:44 +08:00
jiafeng.zhang	b0926a317e	Modify Chinese comment (#6951 ) Modify Chinese comment	2021-10-28 13:56:59 +08:00
Zhengguo Yang	4170aabf83	[Optimize] optimize some session variable and profile (#6920 ) 1. optimize error message when using batch delete 2. rename session variable is_report_success to enable_profile 3. add table name to OlapScanner profile	2021-10-27 18:03:12 +08:00
dh-cloud	a4a7e642b4	[Enhance] Add BackendHbResponse info (#6929 ) when be has excepiton, fe doesn't log the BackendHbResponse info, so we can't know which be has exception the exception log is： `WARN (heartbeat mgr\|31) [HeartbeatMgr.runAfterCatalogReady():141] get bad heartbeat response: type: BACKEND, status: BAD, msg: java.net.ConnectException: Connection refused (Connection refused) ` so need add toString(), then fe can log the BackendHbResponse info	2021-10-27 09:56:07 +08:00
Mingyu Chen	00fe9deaeb	[Benchmark] Add star schema benchmark tools (#6925 ) This CL mainly changes: 1. Add star schema benchmark tools in `tools/ssb-tools`, for user to easy load and test with SSB data set. 2. Disable the segment cache for some read scenario such as compaction and alter operation.(Fix #6924 ) 3. Fix a bug that `max_segment_num_per_rowset` won't work(Fix #6926) 4. Enable `enable_batch_delete_by_default` by default.	2021-10-27 09:55:36 +08:00
Pxl	d4249e4f2d	[Bug] fix Runtime filter can't find fragment-id when apply_filter called early (#6923 ) #6921	2021-10-27 09:54:52 +08:00
HappenLee	77a954d02c	[Bug] Fix treat tuple_is_null_predicate is const expr cause core problem (#6919 ) Fix treat tuple_is_null_predicate is const expr cause core problem	2021-10-27 09:54:25 +08:00
Lijia Liu	29a4ff4bbe	[Cache][Bug] Correct update cache timeout unit (#6888 ) Now FE update cache use MICROSECONDS as TimeUnit. Replace it by MILLISECONDS.	2021-10-27 09:53:58 +08:00
Zhengguo Yang	4f9b46d403	Fix String type column using zonemap to filter data maybe core dump (#6939 ) Fix String type column using zonemap to filter data maybe core dump, because of not allocating memory before parsing string type zonemap	2021-10-27 09:25:38 +08:00
luzhijing	9d4e6d8362	[Spark-Doris-Connector] fixed some spark-doris-connector doc typo	2021-10-26 18:23:53 +08:00
Mingyu Chen	adb6bfdf74	[Bug] Fix bug that truncate table may change the storage medium property (#6905 )	2021-10-25 10:07:27 +08:00
Mingyu Chen	ed7a873a44	[Memory Usage] Implement segment lru cache to save memory of BE (#6829 )	2021-10-25 10:07:15 +08:00
Mingyu Chen	2d298143cc	[Bug] Fix bug of decommission (#6826 )	2021-10-25 10:07:04 +08:00
xiaokangguo	ebb4c282b1	[Flink]Simplify the use of flink connector (#6892 ) 1. Simplify the use of flink connector like other stream sink by GenericDorisSinkFunction. 2. Add the use cases of flink connector. ## Use case ``` env.fromElements("{\"longitude\": \"116.405419\", \"city\": \"北京\", \"latitude\": \"39.916927\"}") .addSink( DorisSink.sink( DorisOptions.builder() .setFenodes("FE_IP:8030") .setTableIdentifier("db.table") .setUsername("root") .setPassword("").build() )); ```	2021-10-23 18:10:47 +08:00
Zeno Yang	469b05d708	[Cache][Bug] Fix sql_key of getting and updating Cache is inconsistent (#6903 ) Fix #6735	2021-10-23 16:54:00 +08:00
qiye	090d99b690	[Docs] fix urls and format in routine load docs (#6896 ) fix urls and format in routine load docs	2021-10-23 16:52:33 +08:00
qiye	88760d66d1	[MetaTool]add error message when loading meta by meta tool (#6893 ) When loading meta by meta_tool goes wrong, we only get an error code from `json2pb`, which is inconvenient for us to locate the problem. This change is adding error message when loading meta goes wrong. Log change is like below. ``` # before ./meta_tool --root_path=/home/disk1/qjl/mydoris/be/storage --operation=load_meta --json_meta_path=/home/disk1/qjl/data/meta-json.json WARNING: Logging before InitGoogleLogging() is written to STDERR I1020 11:41:56.564241 74937 data_dir.cpp:837] path: /home/disk1/qjl/mydoris/be/storage total capacity: 7750843404288, available capacity: 7583325925376 I1020 11:41:56.564415 74937 data_dir.cpp:275] path: /home/disk1/qjl/mydoris/be/storage, hash: 7528840506668047470 load meta failed, status:-1410 # after ./meta_tool --root_path=/home/disk1/qjl/mydoris/be/storage --operation=load_meta --json_meta_path=/home/disk1/qjl/data/meta-json.json WARNING: Logging before InitGoogleLogging() is written to STDERR I1020 14:41:40.084342 50727 data_dir.cpp:837] path: /home/disk1/qjl/mydoris/be/storage total capacity: 7750843404288, available capacity: 7584601022464 I1020 14:41:40.084496 50727 data_dir.cpp:275] path: /home/disk1/qjl/mydoris/be/storage, hash: 7528840506668047470 E1020 14:41:40.163007 50727 tablet_meta_manager.cpp:161] JSON to protobuf message failed: Fail to decode base64 string=0 load meta failed, status:-1410 ```	2021-10-23 16:51:58 +08:00
xy720	7b50409ada	[Bug][Binlog] Fix the number of versions may exceed the limit during data synchronization (#6889 ) Bug detail: #6887 To solve this problem, the commit of transaction must meet any of the following conditions to avoid commit too freqently: 1. The current accumulated event quantity is greater than the `min_sync_commit_size`. 2. The current accumulated data size is greater than the `min_bytes_sync_commit`. In addition, when the accumulated data size exceeds `max_bytes_sync_commit`, the transaction needs to be committed immediately. Before: ![a5e0a2ba01ec4935144253fe0a364af7](https://user-images.githubusercontent.com/22125576/137933545-77018e89-fa2e-4d45-ae5d-84638cc0506a.png) After: ![4577ec53afa47452c847bd01fa7db56c](https://user-images.githubusercontent.com/22125576/137933592-146bef90-1346-47e4-996e-4f30a25d73bc.png)	2021-10-23 16:47:32 +08:00
Wei	696790b397	[Refactor] remove unused code (#6879 )	2021-10-23 16:47:10 +08:00
dohongdayi	a8e3a74ac6	[Bug] Fix bug to reject request with no SQL in TableQueryPlanAction (#6843 ) String.valueOf() returns string "null" with null input, in which case requests with no SQL will be accepted by TableQueryPlanAction unexpectedly with potential risk.	2021-10-23 16:46:24 +08:00
Yun Tang	6029082c2a	[Flink][Bug] Fix potential NPE when cancel DorisSourceFunction (#6838 ) Fix potential NPE of `scalaValueReader` when cancelling DorisSourceFunction.	2021-10-23 16:45:24 +08:00
Mingyu Chen	149ce9ecf4	[Bug][Memory Leak] Fix the issue of Catalog instance leakage (#6895 ) The Checkpoint Catalog instance may be incorrectly stored in MetricRepo, causing memory leaks	2021-10-23 16:44:51 +08:00
zh0122	3267455eca	Replace replica_allocation to replication_allocation (#6870 ) Fix #6869	2021-10-20 15:32:35 +08:00

... 49 50 51 52 53 ...

5948 Commits