Commit Graph

55 Commits

Author SHA1 Message Date
25358564ca [Fix](compile) Fix gcc compile on master (#33864)
This is imported by #33511. wrongly used

ColumnStr<T> ();

which violate C++20 standard(see https://wg21.cmeerw.net/cwg/issue2237) but still supported by clang up until now(see llvm/llvm-project#58112)
2024-04-19 23:41:37 +08:00
5b616da543 [refine](Operator) When _stop_emplace_flag is not set to true, perform batch processing on the block. (#33173) 2024-04-17 23:42:12 +08:00
Pxl
8fd6d4c41b [Chore](build) add -Wconversion and remove some unused code (#33127)
add -Wconversion and remove some unused code
2024-04-10 15:26:08 +08:00
Pxl
e4993a19e5 [Chore](column) remove ColumnVectorHelper (#33036)
remove ColumnVectorHelper
2024-04-10 11:56:41 +08:00
f2a38e6345 [chore](columns) remove update_hashes_with_value for SipHash (#31224) 2024-02-22 13:01:48 +08:00
Pxl
bb4575a392 [Improvement](join) optimization for build_side_output_column (#30826)
optimization for build_side_output_column
2024-02-19 17:22:03 +08:00
8ff8d94697 [fix](ip) change IPv6 to little-endian byte order storage (like IPv4) (#30730) 2024-02-05 21:56:57 +08:00
82aa304706 [Opt](exec) opt the repeat node code (#30683) 2024-02-01 23:14:14 +08:00
221308f78a [fix](datatype) fix bugs for IPv4/v6 datatype and add some basic regression test cases (#30261) 2024-01-31 23:53:39 +08:00
9ebacb1faa [fix](expr) fix performance problem caused by too many virtual function call (#28508) 2023-12-18 12:01:55 +08:00
81a0f8c041 [Feature](function) support generating const values from tvf numbers (#28051)
If specified, got a column of constant. otherwise an incremental series like it always be.

mysql> select * from numbers("number" = "5", "const_value" = "-123");
+--------+
| number |
+--------+
|   -123 |
|   -123 |
|   -123 |
|   -123 |
|   -123 |
+--------+
5 rows in set (0.11 sec)
2023-12-07 22:26:43 +08:00
Pxl
e3d2425d47 [Improvement](join) remove insert_indices_from_join and special judge for -1 (#27779)
remove insert_indices_from_join and special judge for -1
2023-12-04 11:03:22 +08:00
Pxl
d969047b50 [Refactor](join) refactor of hash join (#27557)
Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table

Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: BiteTheDDDDt <pxl290@qq.com>
2023-11-28 19:46:00 +08:00
fe7ff6f113 [Opt](functions) Opt tvf number for performance regression framework (#27582)
Opt tvf number for performance regression framework
2023-11-28 10:43:51 +08:00
3ad865fef9 [refactor](storage) Expressing the types of computation layer and storage layer in PrimitiveTypeTraits (#26191) 2023-11-15 21:34:49 +08:00
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
44b51bf0b9 [Feature](Variant) support variant load (#26572) 2023-11-08 00:37:57 -06:00
Pxl
642c149e6a remove datetime_value and move vecdatetime_value to doris namespace (#25695)
remove datetime_value and move vecdatetime_value to doris namespace
2023-10-20 22:08:17 +08:00
b964ab76b3 [refactor](shuffle) Simplify hash partitioning strategy (#25596) 2023-10-19 19:28:22 +08:00
Pxl
f4e2eb6564 remove unused code and adjust clang-tidy checks (#25405)
remove unused code and adjust clang-tidy checks
2023-10-13 16:27:37 +08:00
Pxl
1a0344df16 [Improvement](hash) refactor of hash map context (#24966)
refactor of hash map context
2023-10-12 18:10:21 +08:00
913282b29b [refactor](column) remove get_data_type in IColumn (#25242) 2023-10-10 20:27:15 +08:00
53b46b7e6c [FIX](filter) update for filter_by_select logic (#25007)
this pr is aim to update for filter_by_select logic and change delete limit

only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
2023-10-09 21:27:40 +08:00
4de3df6a46 [refactor](column) remove unused method and column definitions (#25152)
remove unused method and column definitions
using primitive type in predicate column to check datev1 and datev2
2023-10-09 17:14:35 +08:00
Pxl
9451382428 [Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns (#22216)
optimization for AggregationMethodKeysFixed::insert_keys_into_columns
2023-07-26 16:19:15 +08:00
b35cfc5d5e [opt](join) Opt the performance of join probe (#21845) 2023-07-19 01:21:22 +08:00
d0eb4d7da3 [Improve](hash-fun)improve nested hash with range #21699
Issue Number: close #xxx

when cal array hash, elem size is not need to seed hash
hash = HashUtil::zlib_crc_hash(reinterpret_cast<const char*>(&elem_size),
                                                   sizeof(elem_size), hash);
but we need to be care [[], [1]] vs [[1], []], when array nested array , and nested array is empty, we should make hash seed to
make difference
2. use range for one hash value to avoid virtual function call in loop.
which double the performance. I make it in ut

column: array[int64]
50 rows , and single array has 10w elements
2023-07-11 14:40:40 +08:00
b7d6a70868 [FIX](datatype) Implement hash func with array/map/struct type (#21334)
we do not Implement any hash functions in array/map/struct column , so we use sql like this will make be core

select * from (
        select
            bdp.nc_num,
            collect_list(distinct(bd.catalog_name)) as catalog_name,
            material_qty
        from
            dataease.bu_delivery_product bdp
            left join dataease.bu_trans_transfer btt on bdp.delivery_product_id = btt.delivery_product_id
            left join dataease.bu_delivery bd on bdp.delivery_id = bd.delivery_id
        where
            bd.val_status in ('10', '20', '30', '90')
            and bd.delivery_type in (0, 1, 2)
        group by
            nc_num,
            material_qty
        union
        ALL
        select
            bdp.nc_num,
            collect_list(distinct(bd.catalog_name)) as catalog_name,
            material_qty
        from
            dataease.bu_trans_transfer btt
            left join dataease.bu_delivery_product bdp on bdp.delivery_product_id = btt.delivery_product_id
            left join dataease.bu_delivery bd on bdp.delivery_id = bd.delivery_id
        where
            bd.val_status in ('10', '20', '30', '90')
            and bd.delivery_type in (0, 1, 2)
        group by
            nc_num,
            material_qty
) aa;
core :
2023-06-30 17:11:35 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
f38e00b4c0 [refactor](typesystem) using typeindex to create column instead of type name because type name is not stable (#18328)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-09 18:08:31 +08:00
08adf914f9 [improvement](vec) avoid creating a new column while filtering mutable columns (#16850)
Currently, when filtering a column, a new column will be created to store the filtering result, which will cause some performance loss。 ssb-flat without pushdown expr from 19s to 15s.
2023-02-21 09:47:21 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
e279c90965 [fix](ColumnVector) ColumnVector::insert_date_column crashed #14839
ColumnVector::insert_date_column make BE crashed with large data(>512 rows).


Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-12-06 09:06:57 +08:00
cd3450bd9d [Improvement](join) optimize join probing phase (#13357) 2022-10-18 12:37:17 +08:00
1cd4e5cec6 refractor insert_xxx functions (#13088)
As mentioned in #13074, there will be some problem in ColumnVector<int>::insert_many_in_copy_way.
Column::insert_xxx functions will append some data, they should reserve or resize before append data.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-10 11:54:27 +08:00
3cfaae0031 [Improvement](sort) Use heap sort to optimize sort node (#12700) 2022-09-21 10:01:52 +08:00
35b97a5af0 [Opt](hash) Speed up insert from dict data map and not datetime (#12670)
Speed up dict data read and not datetime. same target #12636
2022-09-17 17:02:43 +08:00
e413a2b8e9 [Opt](vectorized) Use new way to do hash shffle to speed up query (#12586) 2022-09-15 11:08:04 +08:00
d913ca5731 [Opt](vectorized) Speed up bucket shuffle join hash compute (#12407)
* [Opt](vectorized) Speed up bucket shuffle join hash compute
2022-09-13 20:19:22 +08:00
66491ec137 [Improvement](sort) improve partial sort algorithm (#12349)
* [Improvement](sort) improve partial sort algorithm
2022-09-09 15:44:18 +08:00
54d1630c42 [Opt](vectorized) speed up hash function compute in hash partition (#12334)
After do the opt of hash function, the compute of siphash in HASH_PARTITION in vdata_stream_sender

Before: 1s800ms
After: 800ms
2022-09-07 10:11:40 +08:00
922b04fdc1 [Improvement](vectorized) change static_cast to assert_cast for reference (#12379)
* [Improvement](vectorized) change `static_cast` to `assert_cast` for reference
2022-09-07 09:27:13 +08:00
90fb3b7783 [Improvement](load) accelerate tablet sink (#12174) 2022-09-01 10:08:09 +08:00
b7c9007776 [improvement][agg]Process aggregated results in the vectorized way (#11084) 2022-07-22 22:04:43 +08:00
e293fbd277 [improvement]pre-serialize aggregation keys (#10700) 2022-07-09 06:21:56 +08:00
d73f170eeb [optimize](storage)optimize date in storage layer (#8967)
* opt date in storage

* code style

Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-06-23 12:29:10 +08:00
Pxl
681f960257 [fix](storage)(vectorized) query get wrong result when read datetime type column (#8872) 2022-04-18 19:34:06 +08:00
d711d64dda [fix](vectorization)Some small fix for SegmentIter Vectorization (#8267)
1. No longer using short-circuit to evaluate date type, because the cost of read date type is small,
    lazy materialization has higher costs.
2. Fix read hll/bitmap/date type error results.
2022-03-08 13:13:17 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
b9f0b5565c [refactor](storage) refactor some interfaces of storage layer column (#8064)
1 format binary plain
2 remove batch_set_null_bitmap
3 fix segiter return value
4 set insert_many_binary_data args
2022-02-18 10:54:51 +08:00