If specified, got a column of constant. otherwise an incremental series like it always be.
mysql> select * from numbers("number" = "5", "const_value" = "-123");
+--------+
| number |
+--------+
| -123 |
| -123 |
| -123 |
| -123 |
| -123 |
+--------+
5 rows in set (0.11 sec)
Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: BiteTheDDDDt <pxl290@qq.com>
this pr is aim to update for filter_by_select logic and change delete limit
only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
Issue Number: close #xxx
when cal array hash, elem size is not need to seed hash
hash = HashUtil::zlib_crc_hash(reinterpret_cast<const char*>(&elem_size),
sizeof(elem_size), hash);
but we need to be care [[], [1]] vs [[1], []], when array nested array , and nested array is empty, we should make hash seed to
make difference
2. use range for one hash value to avoid virtual function call in loop.
which double the performance. I make it in ut
column: array[int64]
50 rows , and single array has 10w elements
we do not Implement any hash functions in array/map/struct column , so we use sql like this will make be core
select * from (
select
bdp.nc_num,
collect_list(distinct(bd.catalog_name)) as catalog_name,
material_qty
from
dataease.bu_delivery_product bdp
left join dataease.bu_trans_transfer btt on bdp.delivery_product_id = btt.delivery_product_id
left join dataease.bu_delivery bd on bdp.delivery_id = bd.delivery_id
where
bd.val_status in ('10', '20', '30', '90')
and bd.delivery_type in (0, 1, 2)
group by
nc_num,
material_qty
union
ALL
select
bdp.nc_num,
collect_list(distinct(bd.catalog_name)) as catalog_name,
material_qty
from
dataease.bu_trans_transfer btt
left join dataease.bu_delivery_product bdp on bdp.delivery_product_id = btt.delivery_product_id
left join dataease.bu_delivery bd on bdp.delivery_id = bd.delivery_id
where
bd.val_status in ('10', '20', '30', '90')
and bd.delivery_type in (0, 1, 2)
group by
nc_num,
material_qty
) aa;
core :
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
Currently, when filtering a column, a new column will be created to store the filtering result, which will cause some performance loss。 ssb-flat without pushdown expr from 19s to 15s.
Issue Number: close#16351
Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
As mentioned in #13074, there will be some problem in ColumnVector<int>::insert_many_in_copy_way.
Column::insert_xxx functions will append some data, they should reserve or resize before append data.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
1. No longer using short-circuit to evaluate date type, because the cost of read date type is small,
lazy materialization has higher costs.
2. Fix read hll/bitmap/date type error results.