Commit Graph

837 Commits

Author SHA1 Message Date
0b8bc51b72 [fix](inverted index) Fix key column match query failed (#18436)
* [fix](inverted index) Fix key column match query failed

* [chore](regression case) add regression case

* [fix] fix regression case no order by
2023-04-08 15:45:08 +08:00
759f1da32e [Enhencement](Backends) add HostName filed in backends table and delete backends table in information_schema (#18156)
1.  Add `HostName` field for `show backends` statement and `backends()` tvf.
2. delete the `backends` table in `information_schema` database
2023-04-07 08:30:42 +08:00
54dbb4af67 [vectorzied](jdbc) refactor jdbc table read array type (#18187)
jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object
it's difficult to maintain and read the code,
so change all database's array result to string, then add a cast function from string to doris array type
2023-04-04 11:57:04 +08:00
Pxl
e77833bfa1 [Bug](materialized-view) fix where clause persistence replay incorrect (#18228)
fix where clause persistence replay incorrect
2023-04-03 12:49:01 +08:00
d9fe5f7b67 [enhancement](memory) Remove MemPool and replace it with Arena (#17820)
Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not.

Some comparisons between MemPool and Arena:

 1. Expansion
     Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression;
     MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk
     
     After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K.

 2. Alignment
     MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment;
     Arena has no default alignment;

 3. Memory reuse
     Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time.
     MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation

 4. Realloc
     Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is:
         1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start
         2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory
     MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools

 5. check mem limit
     MemPool checks the mem limit, and Arena checks at the Allocator layer.

 6. Support for ASAN
     Arena does something extra

 7. Error handling
     MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception.
Tests that Arena can consider

 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory;

 2. Support clear, memory multiplexing;

 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful.

 4. In some cases, it may be possible to allocate backwards to find chunks t
2023-03-29 20:56:49 +08:00
ac5b47e515 [bugfix](addlog) expr context is not closed and will core during deconstructor (#18134) 2023-03-27 21:59:46 +08:00
b0948ea4cd [Fix](SAP Hana External Table) fix that SAP Hana external table can not insert batch values (#17957)
In the batch insertion scenario, sap hana database does not support syntax insert into tables values (...),(...);
what it supports is:
```sql
INSERT INTO table(col1,col2)
SELECT c1v1, c2v1 FROM dummy
UNION ALL
SELECT c1v2, c2v2 FROM dummy;
```
2023-03-23 18:49:50 +08:00
Pxl
40ca250678 [Feature](materialized-view) support where clause on create materialized view (#17534)
support where clause on create materialized view
2023-03-22 11:25:13 +08:00
cb79e42e5c [refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586)
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system

Not test:
- cold & host data separation case.
2023-03-21 21:08:38 +08:00
bd8e3e6405 [refactor](date) unify DateTimeValue and VecDateTimeValue (#17670) 2023-03-20 16:27:08 +08:00
dd53bc1c8d [unify type system](remove unused type desc) remove some code (#17921)
There are many type definitions in BE. Should unify the type system and simplify the development.



---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-19 14:05:02 +08:00
921e8192b2 [fix](multi-catalog) fix hana jdbc catalog insert error (#17838) 2023-03-16 07:25:19 +08:00
Pxl
16fc3a0e22 [Chore](compile) remove some unused static on inline function to reduce compile time (#17603)
remove some unused static on inline function to reduce compile time
2023-03-13 11:11:59 +08:00
4ba93efc98 [Enhance](DOE)Support parse default es iso datetime string (#17412)
* support parse default es iso datetime string
2023-03-10 09:59:20 +08:00
a745ab1703 [fix](schema scanner) fix query some schema table report invalid parameter (#17626)
Example:

SELECT ROUTINE_SCHEMA AS PROCEDURE_CAT, NULL AS PROCEDURE_SCHEM,ROUTINE_NAME AS PROCEDURE_NAME,NULL AS NUM_INPUT_PARAMS,NULL AS NUM_OUTPUT_PARAMS,NULL AS NUM_RESULT_SETS,ROUTINE_COMMENT AS REMARKS,IF(ROUTINE_TYPE = 'FUNCTION', 2,IF(ROUTINE_TYPE= 'PROCEDURE', 1, 0)) AS PROCEDURE_TYPE FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_SCHEMA = DATABASE();
ERROR 1105 (HY000): errCode = 2, detailMessage = invalid parameter

This wrong and some BI tools could not work correctly.
2023-03-10 08:52:09 +08:00
08f0170895 [fix](olap) The 'scan key' generated by the 'is null' expression causes incorrect query results (#17569) 2023-03-10 08:51:06 +08:00
a767472c56 [fix](DOE)Fix es p0 case error (#17502)
Fix es array parse error, introduced by #16806
2023-03-08 08:06:30 +08:00
ee1be6edd7 [chore](fe) enhance_mysql_data_type (#17429) 2023-03-06 10:42:01 +08:00
a8f20eb4ac [Enhencement](schema_scanner) Optimize the performance of reading information schema tables (#17371)
batch fill block
batch call rpc from FE to get table desc
For 34w colunms

SELECT COUNT( * ) FROM information_schema.columns;
time: 10.3s --> 0.4s
2023-03-06 09:53:01 +08:00
f1db0d9501 [Enhencement](File Reader) delete old file_reader (#17261)
* delete old file_reader

* fix 1
2023-03-01 20:24:03 +08:00
68e9a66aa0 [Enchancement](schema scanner) add SchemaScanner profile (#17230)
Add some profile information to the schema scanner to facilitate performance optimization.

Example:

SchemaScanner:
      -  FillBlockTime:  9s131ms
      -  GetDbTime:  12.816ms
      -  GetDescribeTime:  1s645ms
      -  GetTableTime:  25.433ms
2023-03-01 08:34:27 +08:00
9bcc3ae283 [Fix](DOE)Fix be core dump when parse es epoch_millis date format (#17100) 2023-02-28 20:09:35 +08:00
1771d1e5e7 [fix](value-range) fix the value range of non-nullable column contains null causes query short key index error. (#16943)
* [fix](value-range) fix the value range of non-nullable column contains null causes query short key index error.
2023-02-28 11:15:32 +08:00
c39914c0a0 [feature](partition)add default list partition (#15509)
This pr implements the list default partition referred in related #15507.
It's similar as GreenPlum's default's partition which would store all data not satisfying prior partition key's
constraints and optimizer wouldn't filter default partition which means default partition would be scanned
each time you try to select data from one table with default partition.

User could either create one table with default partition or alter add one default partition.

```sql
PARTITION LIST(key) {
PARTITION p1 values in (xx,xx),
PARTITION DEFAULT
}

ALTER TABLE XXX ADD PARTITION DEFAULT
```

We don't support automatically migrate data inside default partition which meets newly added partition key's
constraint to newly add partition when alter add new partition. User should select default partition using new 
constraints as predicate and insert them to new partition.

```sql
insert into tbl select * from tbl partition default where partition_key=xx;
```
2023-02-24 15:24:59 +08:00
be047f11aa [BugFix](csv_reader) csv_reader support datev2/datetimev2 (#17031) 2023-02-24 11:13:48 +08:00
92ecd16573 (feature)[DOE]Support array for Doris on ES (#16941)
* (feature)[DOE]Support array for Doris on ES
2023-02-23 19:31:18 +08:00
0b3e18d060 [chore](macOS) Support LLVM Clang 15 (#16991)
Remove the deprecated classes std::codecvt_utf8_utf16<char16_t> and std::wstring_convert.
Use libiconv to convert UTF-8 strings to UTF-16LE ones.
2023-02-22 15:04:48 +08:00
e2e6a0dd83 [Feature](load) Support mutable property for partition (#16036)
The background is described in this issue: #15723,
where users used Apache Druid to satisfy such lambada requirements before.
We will not make Doris dropping data not belonged to current time window automatically like Druid,
which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way:

1. Support mutable property for a partition.
2. The mutable property of a partition is passed from FE to BE in a load procedure
3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio',
   so that data write to immutable partition will be neglected and not cause load failure.

Use Example:

1. Add immutable partition or modify an partition to be immutable:
- alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true');
- alter table test_tbl modify partition xx set ('mutable' = 'false');

2. Write 5 records into table, two of then belongs to immutable partition
2023-02-18 23:09:34 +08:00
0bb6005143 [Improvement](thrift) optimize thrift messages (#16383)
Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment
2023-02-16 11:07:46 +08:00
af1329936e [Improvement](ES)Supprt datav2 and datetimev2 for es query (#16633)
* Supprt datav2 and datetimev2 for es query
2023-02-14 14:47:00 +08:00
f3ab55d27d [Optimization](index) Optimization for no need to read raw data for index column that only in where clause (#16569) 2023-02-14 00:12:45 +08:00
09b7c22f6b [Opt](exec) remove unless null key when no split in convert key range (#16624) 2023-02-11 15:44:35 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
ad141747b4 [fix](inverted index) fix array type inverted index query error (#16582) 2023-02-10 17:57:15 +08:00
43eca4f209 [Feature-WIP](inverted index) Implementation for alter inverted index. (#16371)
implementation for add/drop inverted index.
2023-02-10 17:56:17 +08:00
1f631c388d [enhance](cooldown)accelerate cooldown task produce efficiency (#16089) 2023-02-10 16:58:27 +08:00
06788bc2d0 [Bug](pipeline) Fix projection on streaming operator (#16592) 2023-02-10 15:57:26 +08:00
9114896178 [DecimalV3](opt) opt the function of decimalv3 to_string logic (#16427) 2023-02-07 13:28:07 +08:00
125b60b4b9 [improvement](compatibility) add DATA_TYPE in information schema for new types #16391
Add DATA_TYPE in information schema for types: datev2, datatimev2, decimal, jsonb. It was 'unknown' for these types and cause problem for tools such as BI using information schema.
2023-02-03 22:28:42 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
545b91f8f7 [bug](jdbc) fix jdbc insert decimalv3 be core dump (#16353) 2023-02-03 10:00:06 +08:00
557159d3ce [feature](JdbcExternalCatalog) support insert data in JdbcExternalCatalog (#16271) 2023-02-02 17:31:33 +08:00
eba70f972e [improvement](global context) remove some unused method from runtime state (#16329)
This is part of #16296.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-02 10:24:55 +08:00
Pxl
ca73c60442 [Chore](build) enable ignored-qualifiers check (#16196)
enable ignored-qualifiers check
2023-02-01 15:15:59 +08:00
90b12143a3 [refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237) 2023-01-30 16:45:14 +08:00
4b6a4b3cf7 [refactor](remove unused code) Remove unused mempool declare or function params (#16222)
* Remove unused mempool declare or function params

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-30 13:03:18 +08:00
69e748b076 [fix](schema scanner)change schema_scanner::get_next_row to get_next_block (#15718) 2023-01-30 10:01:50 +08:00
62ff20d462 [fix](compile) fix the compile error when WITH_MYSQL is ON. (#16195) 2023-01-29 22:52:44 +08:00
Pxl
2b5f95f08a [Bug](function) remove datev2 signature of hour_ceil/hour_floor #16168 2023-01-29 11:27:56 +08:00
e49766483e [refactor](remove unused code) remove many xxxVal structure (#16143)
remove many xxxVal structure
remove BetaRowsetWriter::_add_row
remove anyval_util.cpp
remove non-vectorized geo functions
remove non-vectorized like predicate
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-28 14:17:43 +08:00