Commit Graph

3471 Commits

Author SHA1 Message Date
4c6cbdf463 [Bug] Fix version nav button loaded multiple times in docs website header (#7062)
* Fix version nav button loaded multiple times

Co-authored-by: 943155336 <wangyongfeng>
Co-authored-by: jiafeng.zhang <zhangjf1@gmail.com>
2021-11-09 18:23:44 +08:00
088a16d33b Chinese annotation modification (#6958)
* Modify Chinese comment (#6951)
2021-11-09 18:00:14 +08:00
906c305a19 [Bug] Fix docs website home page last news icon loading failure (#7057)
* Fix last news icon loading failure

Co-authored-by: 943155336 <wangyongfeng>
Co-authored-by: jiafeng.zhang <zhangjf1@gmail.com>
2021-11-09 17:34:42 +08:00
5d946ccd5e [Docs] Add hdfs outfile example (#7052) 2021-11-09 10:02:28 +08:00
b54a12ef11 [Build]Compile and output the jar file, add Spark, Flink version and Scala version (#7051)
The jar file compiled by Flink and Spark Connector, with the corresponding Flink, Spark version
and Scala version at compile time, so that users can know whether the version number matches when using it.

Example of output file name:doris-spark-1.0.0-spark-3.2.0_2.12.jar
2021-11-09 10:02:08 +08:00
34637589c5 [Website][Doc] Add the sharing blog function to the document site (#7047)
Add the sharing blog function to the document site, including the blog list and detail page. At the same time, a guide on how to share blogs has been added to the developer guide.
2021-11-09 10:01:23 +08:00
Pxl
fc62090558 [Bug] fix Log tags empty reference core dump (#7043)
key may have been destructed when key reference is called.
2021-11-09 10:00:08 +08:00
8ba2d79fe1 [Bug] Change DateTimeValue Memmory Layout To Old (#7022)
Change DateTimeValue Memmory Layout To Old to fix compatibility problems
2021-11-08 21:56:14 +08:00
9c12060db3 [Compile] Fix FE compile problem (#7029)
Co-authored-by: morningman <chenmingyu@baidu.com>
2021-11-08 10:35:49 +08:00
Pxl
29ca77622f [Refactor] Refactor part of RuntimeFilter's code (#6998)
#6997
2021-11-07 17:40:45 +08:00
9b1a80114e [Bug] Fix some return logic error in init BE encoding_map (#6936)
Checking _encoding_map in the original code to return in advance will cause some encoding methods cannot be pushed to default_encoding_type_map_ or value_seek_encoding_map_ in EncodingInfoResolver constructor.
E.g:
EncodingInfoResolver::EncodingInfoResolver() {
....
    _add_map<OLAP_FIELD_TYPE_BOOL, PLAIN_ENCODING>();
    _add_map<OLAP_FIELD_TYPE_BOOL, PLAIN_ENCODING, true>();
...
}
The second line code is invilid.
2021-11-07 17:40:18 +08:00
ca8268f1c9 [Feature] Extend logger interface, support structured log output (#6600)
Support structured logging.
2021-11-07 17:39:53 +08:00
3dd55701ba [Config] Support custom config handler (#6577)
Support custom config handler callback and types.
2021-11-07 17:39:24 +08:00
31f3eb4a3c [Doc] Use Flink CDC to realize real-time MySQL data into Apache Doris (#6933)
* Best Practices ,Use Flink CDC to realize real-time MySQL data into Apache Doris
2021-11-06 16:18:19 +08:00
e69249c082 sub_bitmap (#6977)
Starting from the offset position, intercept the specified limit bitmap elements and return a bitmap subset.

Types of chang
2021-11-06 13:31:03 +08:00
4f13f98424 [Bug] Fix bug that memtracker in delta writer will be visited before initializd. (#7013) 2021-11-06 13:29:49 +08:00
974a894688 Update Spring version to fix CVE-2020-5421 (#7023) 2021-11-06 13:29:24 +08:00
3cef2fb0a8 Union stmt support 'OutFileClause' (#7026)
The union(set operation) stmt also need to analyze 'OutFileClause'.

Whether the fragment is colocate only needs to check the plan node belonging to this fragment.
2021-11-06 13:28:52 +08:00
5ca271299a [refactor] set forward_to_master true by default (#7017)
* ot set forward_to_master true by default

* Update docs/zh-CN/administrator-guide/variables.md
2021-11-06 13:27:26 +08:00
760fc02bfe Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache (#6916)
Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache
add a config used for auto check and reset bprc stub
2021-11-05 09:45:37 +08:00
9171859c38 fix issue for JournalEntity (#7005)
fix a log class incorrect issue in JournalEntity.java.
2021-11-05 09:45:10 +08:00
1f196442f7 [Bug] Fix the nullptr of core in schema change (#7003)
schema change fail as memory allocation fail on row block sorting, however, it should do internal sorting first before schema change fail as memory allocation fail on row block sorting in case there are enough memory after internal sorting.
2021-11-05 09:44:08 +08:00
995fa992f7 Fix hadoop load failed when enable batch delete in unique table (#6996) 2021-11-05 09:43:28 +08:00
29838f07da [HTTP][API] Add backends info API for spark/flink connector (#6984)
Doris should provide a http api to return backends list for connectors to submit stream load,
and without privilege checking, which can let common user to use it
2021-11-05 09:43:06 +08:00
599ecb1f30 [Function] Add bitmap function bitmap_subset_limit (#6980)
Add bitmap function bitmap_subset_limit.
This function will return subset in specified index.
2021-11-04 12:14:47 +08:00
d19a971582 [Revert] Revert RestService.java (#6994) 2021-11-04 12:13:18 +08:00
2351c421b4 Revert "[HTTP][API] Add Backend By Rest API (#6999)" (#7004)
This reverts commit f509e936573f8d6fdaf4de036bc3c6abef26a182.
2021-11-04 10:25:09 +08:00
d268d17f2a Fix the SQL execution error caused by tablet not being found due to Colocate join (#7002)
* fixbug sql execution sometimes due to failed to get tablet
2021-11-04 09:21:52 +08:00
f509e93657 [HTTP][API] Add Backend By Rest API (#6999)
* [HTTP][API] add backend rest api

* [HTTP][API] add backends rest api

* change api response

Co-authored-by: wudi <wud3@shuhaisc.com>
2021-11-04 09:21:07 +08:00
aeec9c45e6 [Function] Add bitmap-xor-count function for doris (#6982)
Add bitmap-xor-count function for doris

relate to #6875
2021-11-02 16:37:00 +08:00
f0a71a067b [Build] Generate compile_command.json (#6976)
Set cmake to generate compile_commands.json, which is useful for lsp like clangd, cquery, et.
2021-11-02 16:36:35 +08:00
9c24334956 [BUG][Schedule] Fix getMixLoadScore error. (#6975) 2021-11-02 16:36:05 +08:00
f39a5bc1d0 [Feature] Spark connector supports to specify fields to write (#6973)
1. By default , Spark connector must write all fields value to `Doris` table .
In this feature , user can specify part of fields to write ,  even specify the order of the fields to write.

eg:
I have a table named `student` which has three columns (name,gender,age) ,
creating table sql as following:
```sql
create table student (name varchar(255), gender varchar(10), age int) duplicate key (name) distributed by hash(name) buckets 2;
```
Now , I just want  to write values to two columns : name , gender.
The code as following:
```scala
    val df = spark.createDataFrame(Seq(
      ("m", "zhangsan"),
      ("f", "lisi"),
      ("m", "wangwu")
    ))
    df.write
      .format("doris")
      .option("doris.fenodes", dorisFeNodes)
      .option("doris.table.identifier", dorisTable)
      .option("user", dorisUser)
      .option("password", dorisPwd)
      //specify your fields or the order
      .option("doris.write.field", "gender,name")
      .save()
```
2021-11-02 16:35:29 +08:00
aba7d2ccae [Thirdparty] Fix flatbuffers download url error (#6968)
Change google flatbuffers download URL in thirdparty.var.sh
2021-11-02 16:34:17 +08:00
2d10300547 [Bug] Fix schema change fail as memory allocation on row block sorting (#6932)
schema change fail as memory allocation fail on row block sorting.
however, it should do internal sorting first before schema change fail
as memory allocation fail on row block sorting in case there are enough
memory after internal sorting.
2021-11-02 16:33:38 +08:00
019e60e7bc [BUG] fix Calc capacityCoefficient mistake #6898 (#6899)
fix #6898
2021-11-02 16:32:44 +08:00
1ff3d708ca [Function] add functions of bitmap_and/or_count (#6912)
issue #6875
add bitmap_and_count/ bitmap_or_count
2021-11-01 14:00:07 +08:00
c7a3116f98 [Function] add bitmap function of bitmap_has_all (#6918)
The 'bitmap_has_all' function returns true if the first bitmap contains all the elements of the second bitmap.
2021-11-01 12:50:47 +08:00
210625b358 [Doc] Update fe-idea developer guide for latest version (#6963) 2021-11-01 11:42:13 +08:00
65ded82778 [Function] add BE bitmap function bitmap_subset_in_range (#6917)
Add bitmap function bitmap_subset_in_range.
This function will return subset in specified range (not include the range_end).
2021-11-01 11:05:19 +08:00
db1c281be5 [Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947)
## Case

In the load process, each tablet will have a memtable to save the incoming data,
and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then
a new memtable will be created to save the following data/

Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`.
If N is large, it will cost too much memory.

So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will
try to flush all current memtables to disk(even if their size are not reach 100MB).

So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller
than 100MB, resulting in too many small segment files.

## Solution

When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part
of them.
For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach
20MB, the total size reach 1GB, and flush will occur.

If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with
size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger
than 20MB.

The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough.

In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB,
after modification, the average size is 82MB
2021-11-01 10:51:50 +08:00
80f61c823b Docker 1.4.1 Compile Environment, First Compile Description (#6943) 2021-11-01 10:49:45 +08:00
e8cabfff27 [S3] Support path style endpoint (#6962)
Add a use_path_style property for S3
Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property
Fix some S3 URI bugs
Add some logs for tracing load process.
2021-11-01 10:48:10 +08:00
Pxl
28030294f7 [Feature] Support bitmap_and_not & bitmap_and_not_count (#6910)
Support bitmap_and_not & bitmap_and_not_count.
2021-11-01 10:11:54 +08:00
f47919136a [Bug] Fix failure to stop sync job (#6950) 2021-10-30 18:17:15 +08:00
a842d41b87 [Function] add BE bitmap function bitmap_max (#6942)
Support bitmap_max.
2021-10-30 18:16:38 +08:00
c3b133bdb3 [Refactor] Refactor the reader code (#6866)
1. Removed useless redundant code logic
2. Change reader to interface, add tuple reader to simplify the structure of reader
2021-10-30 18:15:28 +08:00
466cd5dd09 [Optimize] Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x (#6956)
* Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x
Co-authored-by: wei.zhao <wei.zhao@aispeech.com>
2021-10-29 17:06:05 +08:00
1f65de1a5d Fix spark connector build error (#6948)
pom.xml error
2021-10-29 14:59:05 +08:00
addfff74c4 support use char like \x01 in flink-doris-sink column & line delimiter (#6937)
* support use char like \x01 in flink-doris-sink column & line delimiter

* extend imports

* add docs
2021-10-29 13:56:52 +08:00