Commit Graph

166 Commits

Author SHA1 Message Date
Pxl
8ed4045df9 [Chore](primitive-type) remove VecPrimitiveTypeTraits (#22842) 2023-08-23 08:37:40 +08:00
Pxl
477961dc21 [Chore](agg) refactor of hash map (#22958)
refactor of hash map
2023-08-18 17:59:30 +08:00
Pxl
3f55d5d4d5 [Chore](excution) change some log fatal and dcheck to exception (#22890)
change some log fatal and dcheck to exception
2023-08-15 10:45:00 +08:00
4e880288c6 [refactor]use clear concept to replace std::enable_if_t (#22801)
---------

Signed-off-by: flynn <fenglv15@mails.ucas.ac.cn>
2023-08-12 15:10:30 +08:00
Pxl
56392e21ae [Bug](decimalv3) fix decimalv3 keyrange set wrong number #22818 2023-08-10 18:15:40 +08:00
b25d52b736 [feature](cast) remove some unused in functioncast and support some function in nereids (#22729)
1 ConvertImplGenericFromString do not need a template StringColumnType
2 remove timev1 in function cast
3 support time_to_sec , sec_to_time in nereids
2023-08-10 10:10:32 +08:00
Pxl
098bab7b30 [Bug](exchange) disable implicit conversion of block to bool (#22534)
disable implicit conversion of block to bool
2023-08-03 20:37:14 +08:00
5584d7a5ba [Improve](point query) Improve lookup connection cache from DoubleBuffer to LRU cache for better item pruning (#22041) 2023-07-27 22:22:50 +08:00
8973610543 [feature](datetime) "timediff" supports calculating microseconds (#21371) 2023-07-10 19:21:32 +08:00
36524f2b72 [improvement](functions) avoid copying of block in create_block_with_nested_columns (#21526)
avoid copying of block in create_block_with_nested_columns
2023-07-10 17:21:23 +08:00
2678afd2db [fix][improvement](fs) add HdfsIO profile and modification time (#21638)
Refactor the interface of create_file_reader

the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore.
Now the file handle cache can get correct file's modification time from FileDescription.
Add HdfsIO for hdfs file reader
pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442
2023-07-08 14:49:44 +08:00
242a35fa80 [fix](s3) fix s3 fs benchmark tool (#21401)
1. fix concurrency bug of s3 fs benchmark tool, to avoid crash on multi thread.
2. Add `prefetch_read` operation to test prefetch reader.
3. add `AWS_EC2_METADATA_DISABLED` env in `start_be.sh` to avoid call ec2 metadata when creating s3 client.
4. add `AWS_MAX_ATTEMPTS` env in `start_be.sh` to avoid warning log of s3 sdk.
2023-07-05 16:20:58 +08:00
Pxl
f7c724f8a3 [Bug](excution) avoid core dump on filter_block_internal and add debug information (#21433)
avoid core dump on filter_block_internal and add debug information
2023-07-03 18:10:30 +08:00
2e6d91aa99 [chore](block) temporarily disable DCHECK for column name equality in MutableBlock (#21116)
* tempororyly disable DCHECK for column name equality in MutableBlock::add_rows

* num columns EQ to LE
2023-06-26 10:49:27 +08:00
2c11ce0a02 [bugfix](topn) fix key topn merge block conflict with index predicate result columns (#20820) 2023-06-20 21:23:00 +08:00
08fff8923f [improvement](serde) Optimizing the performance of mysql result writter (#20928)
When converting query results into MySQL format, it involves transforming columnar data storage into row-based storage. This process raises the question of choosing between sequential reading and sequential writing. In reality, sequential writing is the better choice for performance optimization.

Test with 9M rows contains more than 20 columns, this patch can reduce the conversion time from 20s to 11s.
2023-06-19 12:29:01 +08:00
93b53cf2f4 [improvement](exception-safe) create and prepare node/sink support exception safe (#20551) 2023-06-09 21:06:59 +08:00
b0bbff0fd1 [performance](load) improve memtable sort performance (#20392) 2023-06-04 20:33:15 +08:00
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
068a32bc49 [Improvement](memory) faststring use Allocator #19762
After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator.

Currently page body compress will catch memory alloc failure exception
2023-05-18 15:00:49 +08:00
Pxl
7f73749b88 [Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704)
fix distributionColumnIds not updated correct when outputColumnUnique
2023-05-17 00:13:10 +08:00
Pxl
4eb2604789 [Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455)
1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE`
2. make assert_cast support cast to derived
3. change some static cast to assert cast
4. support sum(bool)/avg(bool)
2023-05-15 11:50:02 +08:00
1d421a26d9 [bugfix](memory) merge block may allocate failed (#19507) 2023-05-11 10:42:47 +08:00
Pxl
dff669899a [Feature](generic-aggregation) add some type define for generic aggregate functions support (#19252)
add some type define for generic aggregate functions support
2023-05-06 11:30:13 +08:00
de0e89d1b4 [feature](function) Modified cast as time to behave more like MySQL (#18565)
Because the underlying type of time was float64, select cast("19:22:18" as time) would result in a null value in the past.
Results in the following:
2023-04-22 06:11:59 +08:00
63a76ed115 [refactor](exceptionsafe) disallow call new method explicitly (#18830)
disallow call new method explicitly
force to use create_shared or create_unique to use shared ptr
placement new is allowed
reference https://abseil.io/tips/42 to add factory method to all class.
I think we should follow this guide because if throw exception in new method, the program will terminate.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-21 09:13:24 +08:00
c4e469c82c [feature](agg) Support spill to disk in aggregation (#18051) 2023-04-20 18:59:08 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
a68af93d30 [fix](compile) Fix block.cpp compilation failure (#18797) 2023-04-19 08:49:23 +08:00
79c446c89f [enhancement](exception) Column filter/replicate supports exception safety (#18503) 2023-04-18 19:23:09 +08:00
c704351273 [enhancement](memory) Refactor memory limit exceeded behavior (#18590)
No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.

Optimize mem limit exceeded log printing

Optimize compilation time
2023-04-14 10:42:35 +08:00
012a261f69 [FIX](complex-type) fixed complex type with create_column_const_with_default_value #18463 2023-04-10 14:11:15 +08:00
f38e00b4c0 [refactor](typesystem) using typeindex to create column instead of type name because type name is not stable (#18328)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-09 18:08:31 +08:00
Pxl
307170030c [Bug](materialized-view) fix core dump when create mv have case different with base table (#18206)
fix core dump when create mv have case different with base table
2023-03-31 12:32:09 +08:00
05db6e9b55 [refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009)
Follow #17586.
This PR mainly changes:

Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.

Except for tests in #17586, I also tested some case related to fs io:

clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
2023-03-29 09:00:52 +08:00
09e346e47c [fix](type) Data precision is lost when converting DOUBLE type data to DECIMAL (#17191) (#17562)
1. Fix bug when converting DOUBLE to DECIMAL;
2. Fix bug when converting DOUBLE to DECIMALV3;
2023-03-28 09:46:43 +08:00
78abb40fdc [improvement](string) throw exception instead of log fatal if string column exceed total size limit (#17989)
Throw exception instead of log fatal if string column exceed total size limit, so that we can catch it and let query fail, instead of causing be exit.
2023-03-27 08:55:26 +08:00
Pxl
a8753faeb1 [Bug](function) fix column complex not resize after filter (#18043) 2023-03-25 21:48:13 +08:00
7ae51c856e [refactor](unify exception) unify exception definition and error code (#18006)
* [refactor](unify exception) unify exception definition and error code


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-25 12:41:07 +08:00
Pxl
40ca250678 [Feature](materialized-view) support where clause on create materialized view (#17534)
support where clause on create materialized view
2023-03-22 11:25:13 +08:00
Pxl
401836f523 [Bug](planner) fix core dump when lateral view above union node and have predicate (#17912)
fix core dump when lateral view above union node and have predicate
2023-03-22 11:24:45 +08:00
cb79e42e5c [refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586)
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system

Not test:
- cold & host data separation case.
2023-03-21 21:08:38 +08:00
dd53bc1c8d [unify type system](remove unused type desc) remove some code (#17921)
There are many type definitions in BE. Should unify the type system and simplify the development.



---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-19 14:05:02 +08:00
a993ac91d4 [bugfix](jsonb memory leak) there are memory leak in jsonb field (#17922)
* [bugfix](jsonb memory leak) there are memory leak in jsonb field


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-19 14:04:14 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
9b7596f1c6 [Feature](Dynamic schema table) step1 support schema change expression (#17494)
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
2023-03-13 15:12:42 +08:00
Pxl
16fc3a0e22 [Chore](compile) remove some unused static on inline function to reduce compile time (#17603)
remove some unused static on inline function to reduce compile time
2023-03-13 11:11:59 +08:00
f9baf9c556 [improvement](scan) Support pushdown execute expr ctx (#15917)
In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:

Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.
This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:

Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.
2023-03-10 08:35:32 +08:00
caacee253d [fix](olap)Crashing caused by IS NULL expression (#17463)
Issue Number: close #17462
2023-03-07 15:32:52 +08:00
94e9a226a6 [Bug](Block compression) Fix bug if config::compress_rowbatches=false then the block column values could be empty (#17325) 2023-03-03 10:31:12 +08:00