And Refactor ColumnRangeValue and OlapScanNode
This patch mainly do the following:
- Fix issue #5071
- Change type_min in ColumnRangeValue as static
- Add Class of type_limit make code clear
- Refactor the function of normalize_in_and_eq_predicate
issue:#5031
1. Support ODBC Sink for insert into data to ODBC external table.
2. Support Transaction for ODBC sink to make sure insert into data is atomicital.
3. The document about ODBC sink has been modified
The close method of OlapTabletSink may be called twice.
In the open_internal() method of plan_fragment_executor, close is called once.
If an error occurs in this call, it will be called again in fragment_mgr.
So here we use a flag to prevent repeated close operations.
Co-authored-by: morningman <chenmingyu@baidu.com>
1. Add metadata table 'statistics' to store index information;
2. In the header information returned by mysql, the data type length is returned according to the actual type.
When a parquet file contains a `Map/List/Struct` structure, Doris can not recognize the column correctly,
and throws exception 'Invalid column: xxxx', that means Doris can not find the column.
The `Map` structure will be recognized into two columns: `key and value`.
The follow is the schema of a parquet file recognized by Doris. This patch tries to solve this problem.
When a user tries to load parquet file into Doris, like this path: `hdfs://hadoop/user/data/date=20201024/*`,
but acturally the path contains some none parquet files,the error is throwed
`Couldn't deserialize thrift: No more data to read.\\nDeserializing page header failed.`.
If the error message includes the file name information, we can quickly locate the errors.
Therefore, this patch try to add the file name to the error message.
A large number of small segment files will lead to low efficiency for scan operations.
Multiple small files can be merged into a large file by compaction operation.
So we could take the tablet scan frequency into consideration when selecting an tablet for compaction
and preferentially do compaction for those tablets which are scanned frequently during a
latest period of time at the present.
Using the compaction strategy of Kudu for reference, scan frequency can be calculated
for tablet during a latest period of time and be taken into consideration when calculating compaction score.
mainly includes:
- `OLAP_SCAN_NODE` profile layering: `OLAP_SCAN_NODE`,`OlapScanner`, and `SegmentIterator`.
- Delete meaningless statistical values. mainly in scan_node.cpp.
- Increase `RowsConditionsFiltered` statistical, split from `RowsDelFiltered`, the meaning is the number of rows filtered by various column indexes, only in segment V2.
- Modify the document based on the above, and enhance readability.
1. Support limit clause push down both odbc table and mysql table.
2. Code refactor of ODBC Scan Node, change `build_connect_string` and `query_string` from BE to FE to make it easily to modify
Describe the bug
DATA_TYPE in information_schema.columns is not compatible to mysql meta
To Reproduce
Steps to reproduce the behavior:
select * from information_schema.columns
Expected behavior
the result of data_type is (int, decimal, char, varchar, ...),but doris data_type is (int(11), varchar(20), ...)
Excess number will affect some BI systems or upper system can't get right type
BE can not graceful exit because some threads are running in endless
loop. This patch do the following optimization:
- Use the well encapsulated Thread and ThreadPool instead of std::thread
and std::vector<std::thread>
- Use CountDownLatch in thread's loop condition to avoid endless loop
- Introduce a new class Daemon for daemon works, like tcmalloc_gc,
memory_maintenance and calculate_metrics
- Decouple statistics type TaskWorkerPool and StorageEngine notification
by submit tasks to TaskWorkerPool's queue
- Reorder objects' stop and deconstruct in main(), i.e. stop network
services at first, then internal services
- Use libevent in pthreads mode, by calling evthread_use_pthreads(),
then EvHttpServer can exit gracefully in multi-threads
- Call brpc::Server's Stop() and ClearServices() explicitly
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
1. Support convert(expr, target_type) function, which is same as CastExpr
2. Support cast (expr as signed/unsigned int)
This is just for compatibility, the signed/unsigned specification is meaningless.
1. Fix core bug wild pointer in PlanFragmentExecutor, fix issue #4447
2. Fix core bug wild pointer json load, fix issue #4452
3. Change the declare order of ODBC type in thrift for compatibility
1. The base column of bitmap_union could must be integer. The largeint is not supported too.
2. The base column of hll_union could not be decimal.
Check error msg of const expr in Union Node
If user wants to insert a negative number into bitmap mv, Doris will thrown exception 'invalid input'.
The const value in Union Node is checked in this commit.
* Implements the grammar of the batch delete #4051
* Process create, alter table when table has delete sign column
* Support the syntax for enabling the delete column
* Automatically filtered deleted data in the select statement.
* Automatically add delete sign when create rollup table
TODO:
* Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction
```
SELECT *
FROM
(SELECT cs_order_number,
cs_warehouse_sk
FROM catalog_sales
WHERE cs_order_number = 125005
AND cs_warehouse_sk = 4) cs1
LEFT SEMI JOIN
(SELECT cs_order_number,
cs_warehouse_sk
FROM catalog_sales
WHERE cs_order_number = 125005) cs2
ON cs1.cs_order_number = cs2.cs_order_number
AND cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk;
```
The above query has an equal predicate and a not equal predicate.
If there exists not equal preidcate, the build table should be remained
as it is. So the deduplication should be removed.
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
After PR: #4135, If a mem tracker has parent, it should be created by 'CreateTracker'.
So I removed other unused constructors.
And also fix the bug described in #4344
Doris use HashTable to implement except.
If user send A except B except C, first do A except B and then except C.
After A except B, HashTable will be rebuild.
There is a bug here to throw some rows.