Commit Graph

525 Commits

Author SHA1 Message Date
869fe2bc5d [Improvement](outfile) Support ORC format in outfile (#13019) 2022-10-08 20:56:32 +08:00
7b75c2df54 [fix](BE) fix the stream load error when upgrade BE from 1.1.2 to master (#13058) 2022-10-05 12:13:26 +08:00
e167aa120f [fix](jdbc) fix insert into date type to oracle using wrong type (#12883)
using JDBC insert into date type to ORACLE,
it's should be use to_date function convert string to java.sql.date
2022-10-04 21:24:33 +08:00
3294b18674 [Improvement](datev2) fix some compatible problems for datev2 (#13079) 2022-09-30 13:56:01 +08:00
d80b7b9689 [feature-wip](new-scan) support more load situation (#12953) 2022-09-27 21:48:32 +08:00
Pxl
9607f60845 [Feature](serialize) move block_data_version to fe heart beat (#12667)
Move block_data_version from be config to fe heart beat
2022-09-27 18:25:54 +08:00
3f99dd5c4b [function](bitmap) support bitmap_hash64 (#12992) 2022-09-27 12:16:02 +08:00
1bb42a7bc0 [function](hash) add support of murmur_hash3_64 (#12923) 2022-09-26 14:23:37 +08:00
9c03deb150 [fix](log)Audit log status is incorrect (#12824)
Audit log status is incorrect
2022-09-26 09:57:52 +08:00
59699a4321 [feature](JSON datatype)Support JSON datatype (#10322)
Add `JSON` datatype, following features are implemented by this PR:
1. `CREATE` tables with `JSON` type columns
2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE 
3. `SELECT` JSON columns

Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type)

* add JSONB data storage format type

* fix JsonLiteral resolve bug

* add DataTypeJson case in data_type_factory

* add JSON syntax check in FE

* add operators for jsonb_document, currently not support comparison between any JSON type value

* add ColumnJson and DataTypeJson

* add JsonField to store JsonValue

* add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral

* add push_json for MysqlResultWriter

* JSON column need no zone_map_index

* Revert "JSON column need no zone_map_index"

This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79.

* add JSON writer and reader, ignore zone-map for JSON column

* add json_to_string for DataTypeJson

* add olap_data_convertor for JSON type

* add some enum

* add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions

* fix column_json offsets overflow bug, format code

* remove useless TODOs, add CmpType cases for JSON type

* add license header

* format license

* format be codes

* resolve rebase master conflicts

* fix bugs for CREATE and meta related code

* refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes

* modification be codes along code review advice

* fix rebase conflicts with master

* add unit test for json_value and column_json

* fix rebase error

* rename json to jsonb

* fix some data convert bugs, set Mysql type to JSON
2022-09-25 14:06:49 +08:00
f7e3ca29b5 [Opt](Vectorized) Support push down no grouping agg (#12803)
Support push down no grouping agg
2022-09-23 18:29:54 +08:00
617820b1f5 [Refactor](parquet) refactor parquet write to uniform and consistent logic (#12730) 2022-09-23 09:12:34 +08:00
70ab9cb43e [feature](http) refactor version info and add new http api for get version info (#12513)
Refactor version info and add new http api for get version info
2022-09-22 10:53:04 +08:00
ec2b3bf220 [feature-wip](new-scan)Refactor VFileScanner, support broker load, remove unused functions in VScanner base class. (#12793)
Refactor of scanners. Support broker load.
This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner.
Work still in progress.
2022-09-21 12:49:56 +08:00
4f98146e83 [enhancement](tracing) Support forward to master tracing (#12290) 2022-09-18 17:39:04 +08:00
0a95ebf602 [feature](Nereids) Add scalar function code generator and some function trait (#12671)
This pr did these things:
1. Change the nullable mode of 'from_unixtime' and 'parse_url' from DEPEND_ON_ARGUMENT to ALWAYS_NULLABLE, which nullable configuration was missing previously.
2. Add some new interfaces for origin NullableMode. This change inspired by the grammar of scala's mix-in trait, It help us to quickly understand the traits of function without read the lengthy procedural code and save the work to write some template code, like `class Substring extends ScalarFunction implements ImplicitCastInputTypes, PropagateNullable`. These are the interfaces:
   - PropagateNullable: equals to NullableMode.DEPEND_ON_ARGUMENT
   - AlwaysNullable: equals to NullableMode.ALWAYS_NULLABLE
   - AlwaysNotNullable: equals to NullableMode.ALWAYS_NOT_NULLABLE
   - others ComputeNullable: equals to NullableMode.CUSTOM
3. Add `GenerateScalarFunction` to generate nereids-style function code from legacy functions, but not actual generate any new function class yet, because the function's trait is not ready for use. I need add some traits for the legacy function's CompareMode and NonDeterministic, this thought is the same as ComputeNullable.
2022-09-16 21:27:30 +08:00
9d6c199553 [Bug](vec) Fix avg overflow in clickbench (#12621) 2022-09-16 14:43:40 +08:00
22a8d35999 [Feature](vectorized) support jdbc sink for insert into data to table (#12534) 2022-09-15 11:08:41 +08:00
e413a2b8e9 [Opt](vectorized) Use new way to do hash shffle to speed up query (#12586) 2022-09-15 11:08:04 +08:00
f50054f547 [Enhancement](array-type) record offsets info to speed up the seek performance (#12293)
Store the offset rather than the length in file for the data with array type. The new file format can improve the seek performance. Please refer to #12246 to get the performance report.

Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
2022-09-14 22:41:54 +08:00
Pxl
0ead048b93 [Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456)
1. remove ColumnString terminating zero
    2. add a data_version for pblock
    3. change EncryptionMode to enum class
2022-09-14 21:25:22 +08:00
6bf5fc6db5 [improvement](storage) For debugging problems: add session variable skip_storage_engine_merge to treat agg and unique data model as dup model (#11952)
For debug purpose:
Add session variable skip_storage_engine_merge, when set to true, tables of aggregate key model and unique key model will be read as duplicate key model.
Add session variable skip_delete_predicate, when set to true, rows deleted with delete statement will be selected.
2022-09-13 19:18:56 +08:00
dc80a993bc [feature-wip](new-scan) New load scanner. (#12275)
Related pr:
https://github.com/apache/doris/pull/11582
https://github.com/apache/doris/pull/12048

Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node.
The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark.

Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.
2022-09-13 13:36:34 +08:00
c8e9a32bb2 [Function](cbrt)Add cbrt function for doris (#12523)
Add cbrt function for doris
2022-09-12 19:58:45 +08:00
09b45f2b71 [Function](ELT)Add elt function (#12321) 2022-09-07 15:21:08 +08:00
42bdde8750 [Feature](Vectorized) support jdbc scan node (#12010) 2022-09-07 10:29:41 +08:00
e7303c12c7 [Enhancement](array-type) Support Floating/Decimal type for array aggregation functions (#12271) 2022-09-03 09:55:56 +08:00
3bcab8bbef [feature](function) support now/current_timestamp functions with precision (#12219)
* [feature](function) support now/current_timestamp functions with precision
2022-09-01 14:35:12 +08:00
22430cd7bb [feature](stmt) add ADMIN COPY TABLET stmt for local debug (#12176)
Add a new stmt ADMIN COPY TABLET for easy copy a tablet to local env to reproduce problem.
See document for more details.
2022-08-31 09:06:49 +08:00
2f192019d3 [bugfix](delete hanlder) delete predicate is merged and could not find schema cause core dump (#12161)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-30 09:18:21 +08:00
2b3a5b5fdd [fix](array-type) add ARRAY_BOOLEAN support for lots of array functions #12079
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-08-26 18:00:29 +08:00
ccff3f5711 [bugfix](light weight schema change) support delete condition in schema change (#11869)
* [bugfix](light weight schema change) support delete condition in schema change


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-26 11:45:55 +08:00
e3ab2caef8 [improvement](sink) Support local exchange for multi fragment instances (#12017) 2022-08-25 19:28:23 +08:00
c22d097b59 [improvement](compress) Support compress/decompress block with lz4 (#11955) 2022-08-22 17:35:43 +08:00
dc8f64b3e3 [improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801) 2022-08-22 10:12:06 +08:00
1b0b5b5f09 [Enhancement](load) add hidden_columns in stream load param (#11625)
Stream load will ignore invisible columns if no http header columns
specified, but in some case user cannot get all columns if columns
changed frequently。
Add a hidden_columns header to support hidden columns import。User can
set hidden_columns such as __DORIS_DELETE_SIGN__ and add this column
in stream load data so we can delete this line.
For example:
curl -u root -v --location-trusted -H "hidden_columns: __DORIS_DELETE_SIGN__" -H
"format: json" -H "strip_outer_array: true" -H "jsonpaths: [\"$.id\",
\"$.name\",\"$.__DORIS_DELETE_SIGN__\"]" -T 1.json
http://{beip}:{be_port}/api/test/test1/_stream_load

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-19 14:57:11 +08:00
4fa53b4cdb [chore](workflow) Add shellcheck to check shell scripts (#11744) 2022-08-18 16:07:28 +08:00
11dc5cad83 [feature-wip](unique-key-merge-on-write) add min/max key in segment (#11830)
some feature:
1. add min max key in segment footer to speed up get_row_ranges_by_keys
2. do not load pk bloom filter in query

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-17 18:11:39 +08:00
ba3e0b3f96 [feature](compaction) allow to set disable_auto_compaction for tables (#11743) 2022-08-17 11:05:47 +08:00
c715209a7e [refactor](dpp) remove original dpp writer (#11838)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-17 10:42:29 +08:00
8c8f48c4c2 [feature-wip](array-type) add the array_join function (#11406)
this pr is used to add the array_join function.
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-15 11:43:17 +08:00
854b4b1b47 [fix](storage-policy) fix bug that missing field when refreshing storage policy (#11580)
1. Change all required fields to optional
    Although they all "required", but it not recommended to use `required`, because it is hard to modify in future.
2. Fix a missing field bug
2022-08-12 20:11:54 +08:00
7d97aa194b [feature-wip](datev2) Support to use datev2 as partition column (#11618) 2022-08-12 11:54:01 +08:00
Pxl
f5fe622a1b [Bug](materialized view) fix create materialized view fail
1. remove referenced_column(seems unused now).
2. fix mv slot ref id wrong.
3. add type check for hll_hash.
4. enable non-nullable column change to nullable column.
2022-08-12 09:49:16 +08:00
5d66839035 [feature-wip](unique-key-merge-on-write) push down runtime filter on unique key with merge on write table (#11695) 2022-08-11 22:50:13 +08:00
c8418d13b5 [improvement](config)Use session variable to replace configuration for 'enable_function_pushdown' (#11641) 2022-08-10 19:25:02 +08:00
df47b6941d [feature-wip](array-type) support the array type in reverse function (#11213)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-09 20:49:09 +08:00
169996d8e4 [feature](information_schema) add rowsets table into information_s… (#11266)
* [feature](information_schema) add 'segments' table into information_schema
2022-08-09 18:15:54 +08:00
f9b151744d optimize topn query if order by columns is prefix of sort keys of table (#10694)
* [feature](planner): push limit to olapscan when meet sort.

* if olap_scan_node's sort_info is set, push sort_limit, read_orderby_key
and read_orderby_key_reverse for olap scanner

* There is a common query pattern to find latest time serials data.
 eg. SELECT * from t_log WHERE t>t1 AND t<t2 ORDER BY t DESC LIMIT 100

If the ORDER BY columns is the prefix of the sort key of table, it can
be greatly optimized to read much fewer data instead of read all data
between t1 and t2.

By leveraging the same order of ORDER BY columns and sort key of table,
just read the LIMIT N rows for each related segment and merge N rows.

1. set read_orderby_key to true for read_params and _reader_context
   if olap_scan_node's sort info is set.
2. set read_orderby_key_reverse to true for read_params and _reader_context
   if is_asc_order is false.
3. rowset reader force merge read segments if read_orderby_key is true.
4. block reader and tablet reader force merge read rowsets if read_orderby_key is true.

5. for ORDER BY DESC, read and compare in reverse order
5.1 segment iterator read backward using a new BackwardBitmapRangeIterator and
    reverse the result block before return to caller.
5.2 VCollectIterator::LevelIteratorComparator, VMergeIteratorContext return
    opposite result for _is_reverse order in its compare function.

Co-authored-by: jackwener <jakevingoo@gmail.com>
2022-08-09 09:08:44 +08:00
f730a048b1 [feature-wip](load) Support single replica load (#10298)
During load process, the same operation are performed on all replicas such as sort and aggregation,
which are resource-intensive.
Concurrent data load would consume much CPU and memory resources.
It's better to perform write process (writing data into MemTable and then data flush) on single replica
and synchronize data files to other replicas before transaction finished.
2022-08-02 11:44:18 +08:00