doris

Author	SHA1	Message	Date
wangyongfeng	4c6cbdf463	[Bug] Fix version nav button loaded multiple times in docs website header (#7062 ) * Fix version nav button loaded multiple times Co-authored-by: 943155336 <wangyongfeng> Co-authored-by: jiafeng.zhang <zhangjf1@gmail.com>	2021-11-09 18:23:44 +08:00
wangyongfeng	906c305a19	[Bug] Fix docs website home page last news icon loading failure (#7057 ) * Fix last news icon loading failure Co-authored-by: 943155336 <wangyongfeng> Co-authored-by: jiafeng.zhang <zhangjf1@gmail.com>	2021-11-09 17:34:42 +08:00
EmmyMiao87	5d946ccd5e	[Docs] Add hdfs outfile example (#7052 )	2021-11-09 10:02:28 +08:00
wangyongfeng	34637589c5	[Website][Doc] Add the sharing blog function to the document site (#7047 ) Add the sharing blog function to the document site, including the blog list and detail page. At the same time, a guide on how to share blogs has been added to the developer guide.	2021-11-09 10:01:23 +08:00
jiafeng.zhang	31f3eb4a3c	[Doc] Use Flink CDC to realize real-time MySQL data into Apache Doris (#6933 ) * Best Practices ，Use Flink CDC to realize real-time MySQL data into Apache Doris	2021-11-06 16:18:19 +08:00
Xinyi Zou	e69249c082	sub_bitmap (#6977 ) Starting from the offset position, intercept the specified limit bitmap elements and return a bitmap subset. Types of chang	2021-11-06 13:31:03 +08:00
Zhengguo Yang	5ca271299a	[refactor] set `forward_to_master` true by default (#7017 ) * ot set forward_to_master true by default * Update docs/zh-CN/administrator-guide/variables.md	2021-11-06 13:27:26 +08:00
Zhengguo Yang	760fc02bfe	Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache (#6916 ) Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache add a config used for auto check and reset bprc stub	2021-11-05 09:45:37 +08:00
Mingyu Chen	29838f07da	[HTTP][API] Add backends info API for spark/flink connector (#6984 ) Doris should provide a http api to return backends list for connectors to submit stream load, and without privilege checking, which can let common user to use it	2021-11-05 09:43:06 +08:00
pengxiangyu	599ecb1f30	[Function] Add bitmap function bitmap_subset_limit (#6980 ) Add bitmap function bitmap_subset_limit. This function will return subset in specified index.	2021-11-04 12:14:47 +08:00
xy720	aeec9c45e6	[Function] Add bitmap-xor-count function for doris (#6982 ) Add bitmap-xor-count function for doris relate to #6875	2021-11-02 16:37:00 +08:00
wei zhao	f39a5bc1d0	[Feature] Spark connector supports to specify fields to write (#6973 ) 1. By default , Spark connector must write all fields value to `Doris` table . In this feature , user can specify part of fields to write , even specify the order of the fields to write. eg: I have a table named `student` which has three columns (name,gender,age) , creating table sql as following: ```sql create table student (name varchar(255), gender varchar(10), age int) duplicate key (name) distributed by hash(name) buckets 2; ``` Now , I just want to write values to two columns : name , gender. The code as following: ```scala val df = spark.createDataFrame(Seq( ("m", "zhangsan"), ("f", "lisi"), ("m", "wangwu") )) df.write .format("doris") .option("doris.fenodes", dorisFeNodes) .option("doris.table.identifier", dorisTable) .option("user", dorisUser) .option("password", dorisPwd) //specify your fields or the order .option("doris.write.field", "gender,name") .save() ```	2021-11-02 16:35:29 +08:00
zhangstar333	1ff3d708ca	[Function] add functions of bitmap_and/or_count (#6912 ) issue #6875 add bitmap_and_count/ bitmap_or_count	2021-11-01 14:00:07 +08:00
luozenglin	c7a3116f98	[Function] add bitmap function of bitmap_has_all (#6918 ) The 'bitmap_has_all' function returns true if the first bitmap contains all the elements of the second bitmap.	2021-11-01 12:50:47 +08:00
wei zhao	210625b358	[Doc] Update fe-idea developer guide for latest version (#6963 )	2021-11-01 11:42:13 +08:00
qiye	65ded82778	[Function] add BE bitmap function bitmap_subset_in_range (#6917 ) Add bitmap function bitmap_subset_in_range. This function will return subset in specified range (not include the range_end).	2021-11-01 11:05:19 +08:00
Mingyu Chen	db1c281be5	[Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947 ) ## Case In the load process, each tablet will have a memtable to save the incoming data, and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then a new memtable will be created to save the following data/ Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`. If N is large, it will cost too much memory. So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will try to flush all current memtables to disk(even if their size are not reach 100MB). So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller than 100MB, resulting in too many small segment files. ## Solution When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part of them. For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach 20MB, the total size reach 1GB, and flush will occur. If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger than 20MB. The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough. In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB, after modification, the average size is 82MB	2021-11-01 10:51:50 +08:00
jiafeng.zhang	80f61c823b	Docker 1.4.1 Compile Environment, First Compile Description (#6943 )	2021-11-01 10:49:45 +08:00
Mingyu Chen	e8cabfff27	[S3] Support path style endpoint (#6962 ) Add a use_path_style property for S3 Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property Fix some S3 URI bugs Add some logs for tracing load process.	2021-11-01 10:48:10 +08:00
Pxl	28030294f7	[Feature] Support bitmap_and_not & bitmap_and_not_count (#6910 ) Support bitmap_and_not & bitmap_and_not_count.	2021-11-01 10:11:54 +08:00
zhuixun	a842d41b87	[Function] add BE bitmap function bitmap_max (#6942 ) Support bitmap_max.	2021-10-30 18:16:38 +08:00
wunan1210	addfff74c4	support use char like \x01 in flink-doris-sink column & line delimiter (#6937 ) * support use char like \x01 in flink-doris-sink column & line delimiter * extend imports * add docs	2021-10-29 13:56:52 +08:00
EmmyMiao87	df43752257	[Docs] Fix error KEY url (#6955 )	2021-10-29 12:07:44 +08:00
Zhengguo Yang	4170aabf83	[Optimize] optimize some session variable and profile (#6920 ) 1. optimize error message when using batch delete 2. rename session variable is_report_success to enable_profile 3. add table name to OlapScanner profile	2021-10-27 18:03:12 +08:00
Mingyu Chen	00fe9deaeb	[Benchmark] Add star schema benchmark tools (#6925 ) This CL mainly changes: 1. Add star schema benchmark tools in `tools/ssb-tools`, for user to easy load and test with SSB data set. 2. Disable the segment cache for some read scenario such as compaction and alter operation.(Fix #6924 ) 3. Fix a bug that `max_segment_num_per_rowset` won't work(Fix #6926) 4. Enable `enable_batch_delete_by_default` by default.	2021-10-27 09:55:36 +08:00
luzhijing	9d4e6d8362	[Spark-Doris-Connector] fixed some spark-doris-connector doc typo	2021-10-26 18:23:53 +08:00
Mingyu Chen	ed7a873a44	[Memory Usage] Implement segment lru cache to save memory of BE (#6829 )	2021-10-25 10:07:15 +08:00
xiaokangguo	ebb4c282b1	[Flink]Simplify the use of flink connector (#6892 ) 1. Simplify the use of flink connector like other stream sink by GenericDorisSinkFunction. 2. Add the use cases of flink connector. ## Use case ``` env.fromElements("{\"longitude\": \"116.405419\", \"city\": \"北京\", \"latitude\": \"39.916927\"}") .addSink( DorisSink.sink( DorisOptions.builder() .setFenodes("FE_IP:8030") .setTableIdentifier("db.table") .setUsername("root") .setPassword("").build() )); ```	2021-10-23 18:10:47 +08:00
qiye	090d99b690	[Docs] fix urls and format in routine load docs (#6896 ) fix urls and format in routine load docs	2021-10-23 16:52:33 +08:00
xy720	7b50409ada	[Bug][Binlog] Fix the number of versions may exceed the limit during data synchronization (#6889 ) Bug detail: #6887 To solve this problem, the commit of transaction must meet any of the following conditions to avoid commit too freqently: 1. The current accumulated event quantity is greater than the `min_sync_commit_size`. 2. The current accumulated data size is greater than the `min_bytes_sync_commit`. In addition, when the accumulated data size exceeds `max_bytes_sync_commit`, the transaction needs to be committed immediately. Before: ![a5e0a2ba01ec4935144253fe0a364af7](https://user-images.githubusercontent.com/22125576/137933545-77018e89-fa2e-4d45-ae5d-84638cc0506a.png) After: ![4577ec53afa47452c847bd01fa7db56c](https://user-images.githubusercontent.com/22125576/137933592-146bef90-1346-47e4-996e-4f30a25d73bc.png)	2021-10-23 16:47:32 +08:00
zh0122	3267455eca	Replace replica_allocation to replication_allocation (#6870 ) Fix #6869	2021-10-20 15:32:35 +08:00
Mingyu Chen	51e210869a	[ARM64] Fix some problem when compiling on ARM64 platform (#6836 ) (#6872 ) With thirdparties 1.4.0 to 1.4.1 1. Add patch for aws-c-cal-0.4.5 2. Add some solutions for `undefined reference libpsl` 3. Move libgsasl to fix link problme of libcurl. 4. Downgrade openssl to 1.0.2k to fix problem of low version glibc	2021-10-19 13:26:02 +08:00
xy720	bd25d1a828	[Doc] Add documents for MySQL Binlog Load (#6859 ) * add zh-CN docs * add en docs and image * fix * fix	2021-10-19 10:25:42 +08:00
MHBoy	e96882f6c5	Update materialized_view.md (#6867 )	2021-10-19 10:24:38 +08:00
wunan1210	fbd75c88d0	[Docs] Fix exporter document error (#6864 ) * fix exporter document error * update en doc	2021-10-19 10:24:08 +08:00
zhoubintao	bb2b29c64f	[Doc] Add type BOOLEAN when enter 'help create table' in mysql client (#6852 ) some user do not know Doris support type boolean, they use TINYINT, so i add type BOOLEAN when enter 'help create table' in mysql client. currently, type BOOLEAN size is 1 byte, but the value of boolean column only in {0,1} , which waste some memory, and i want change it's implement to 1 bit in the future.	2021-10-17 22:54:12 +08:00
Mingyu Chen	59017cebe6	[ARM64] Fix some problem when compiling on ARM64 platform (#6836 ) 1. Refactor the create method of hdfs reader & writer. libhdfs3 does not support arm64. So we should not support hdfs reader & writer on arm64. 2. And micro for LowerUpperImpl	2021-10-16 21:56:49 +08:00
Zhengguo Yang	607eef8d4d	[Doc] Update compile docs add 0.15 build support. (#6850 )	2021-10-15 18:37:24 +08:00
Mingyu Chen	fcd15edbf9	[Export] Support export job with label (#6835 ) ``` EXPORT TABLE xxx ... PROPERTIES ( "label" = "mylabel", ... ); ``` And than user can use label to get the info by SHOW EXPORT stmt: ``` show export from db where label="mylabel"; ``` For compatibility, if not specified, a random label will be used. And for history jobs, the label will be "export_job_id"; Not like LOAD stmt, here we specify label in `properties` because this will not cause grammatical conflicts, and there is no need to modify the meta version of the metadata.	2021-10-15 10:18:11 +08:00
zhoubintao	ad949c2f65	Optimize Hex and add related Doc (#6697 ) I tested hex in a 1000w times for loop with random numbers， old hex avg time cost is 4.92 s，optimize hex avg time cost is 0.46 s which faster nearly 10x.	2021-10-13 11:36:14 +08:00
EmmyMiao87	6cbefa9f10	[Docs] Update materialized view document (#6710 ) * [Docs] Update materialized view document	2021-10-13 11:35:23 +08:00
Gabriel	30bf6c0d1d	[DOC] minor update (#6820 )	2021-10-13 09:14:56 +08:00
jiafeng.zhang	f439e5e533	[Doc] Documentation error (#6797 ) Documentation error	2021-10-10 23:08:16 +08:00
jiafeng.zhang	bd19491b5b	[Doc] Modify the description of dynamic partition hot partition (#6764 ) Modify the description of dynamic partition hot partition	2021-10-10 23:06:14 +08:00
qiye	675aef7d75	[AliasFunction] Add support for cast in alias function (#6754 ) support #6753	2021-10-10 23:05:44 +08:00
jiafeng.zhang	4232f787ad	[Doc] datax doriswriter use case (#6612 ) datax doriswriter use case	2021-10-10 23:03:12 +08:00
wei zhao	237a8ae948	[Feature] support spark connector sink data using sql (#6796 ) Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-10-09 15:47:36 +08:00
Mingyu Chen	7a20d6d4c2	[Doc] Modify document of resource tag (#6778 ) Fix typo	2021-10-03 11:37:45 +08:00
shee	e7707c8180	[FOLLOWUP] create table like clause support copy rollup (#6580 ) * Remove `ALL` key word to make grammar more clear. Co-authored-by: qzsee <shizhiqiang03@meituan.com>	2021-09-30 18:26:21 +08:00
Mingyu Chen	ad3c9390a2	[Bug] Fix bdbje getDatabaseNames() bug and scan node close bug (#6769 ) 1. This bug is introduced from #6582 2. Optimize the error log of Address used used error msg. 3. Add some document about compilation. 1. Add a custom thirdparty download url. 2. Add a custom com.alibaba maven jar package for DataX. 4. Fix bug that BE crash when closing scan node, introduced from #6622.	2021-09-29 11:11:28 +08:00

1 2 3 4 5 ...

785 Commits