Commit Graph

307 Commits

Author SHA1 Message Date
0a0e46fd53 [Bug] Fix the bug of where condition a in ('A', 'B', 'V') and a in ('A') return error result (#5072)
And Refactor ColumnRangeValue and OlapScanNode

This patch mainly do the following:
- Fix issue #5071
- Change type_min in ColumnRangeValue as static
- Add Class of type_limit make code clear
- Refactor the function of normalize_in_and_eq_predicate
2020-12-15 09:29:10 +08:00
90e7f7005e [Bug] Fix bug that query multi mysql external table with union will get incomplete result (#5067)
The `eos` flag should be reset to false after opening next child of union node.
2020-12-15 09:28:39 +08:00
193db4207e [enhancement]improve performance of json load (#5055)
* imporve performance of json load
2020-12-15 09:27:51 +08:00
ff4bd1223f [Profile] Add cpu time cost in query audit (#5051) 2020-12-13 22:22:15 +08:00
115d4332aa [ODBC] Support ODBC Sink for insert into data to ODBC external table (#5033)
issue:#5031

1. Support ODBC Sink for insert into data to ODBC external table.
2. Support Transaction for ODBC sink to make sure insert into data is atomicital.
3. The document about ODBC sink has been modified
2020-12-13 21:53:27 +08:00
ca9e5c4785 [Bug] Add a flag to prevent repeated close operation of OlapTabletSink (#5034)
The close method of OlapTabletSink may be called twice.
In the open_internal() method of plan_fragment_executor, close is called once.
If an error occurs in this call, it will be called again in fragment_mgr.
So here we use a flag to prevent repeated close operations.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-12-09 09:30:09 +08:00
b9dabc3b5b [Enhance] Push down predicate on value column of unique table to base rowset (#5022) 2020-12-06 08:50:37 +08:00
6021d6fc7f [Performance Optimization] Remove push down conjuncts in olap scan node (#4999)
Push conjunct to Storage Engine as more as possible

olap scan node do not need filter data use push down conjuncts again.

fix #4986
2020-12-06 08:50:08 +08:00
b954dfd82d [Bug] Fix the bug of Largetint and Decimal json load failed. (#4983)
Use param of json load "num_as_string" to use flag kParseNumbersAsStringsFlag to parse json data.
2020-12-06 08:49:30 +08:00
1ae6de7117 [Enhance] Add "statistics" meta table and fix some mysql compatibility problem (#4991)
1. Add metadata table 'statistics' to store index information;
2. In the header information returned by mysql, the data type length is returned according to the actual type.
2020-12-03 09:38:18 +08:00
af06adb57f [Doris On ES][Bug-fix] fix boolean predicate pushdown manner (#4990)
Correct handling `boolean` field predicate through set the predicate value to `true`、`false` or `empty set` for DOE
2020-12-02 10:13:13 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
2331ce10f1 [Bug]Parquet map/list/struct structure recognize (#4968)
When a parquet file contains a `Map/List/Struct` structure, Doris can not recognize the column correctly,
and throws exception 'Invalid column: xxxx', that means Doris can not find the column.
The `Map` structure will be recognized into two columns: `key and value`.
The follow is the schema of a parquet file recognized by Doris. This patch tries to solve this problem.
2020-11-28 09:56:29 +08:00
cb749ce51d [Improvement] Add parquet file name to the error message (#4954)
When a user tries to load parquet file into Doris, like this path: `hdfs://hadoop/user/data/date=20201024/*`,
but acturally the path contains some none parquet files,the error is throwed
`Couldn't deserialize thrift: No more data to read.\\nDeserializing page header failed.`.
If the error message includes the file name information, we can quickly locate the errors.
Therefore, this patch try to add the file name to the error message.
2020-11-28 09:54:18 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
6247408689 [Compact]Take tablet scan frequency into consider when selecting tablet for compaction (#4837)
A large number of small segment files will lead to low efficiency for scan operations.
Multiple small files can be merged into a large file by compaction operation.
So we could take the tablet scan frequency into consideration when selecting an tablet for compaction
and preferentially do compaction for those tablets which are scanned frequently during a
latest period of time at the present.

Using the compaction strategy of Kudu for reference, scan frequency can be calculated
for tablet during a latest period of time and be taken into consideration when calculating compaction score.
2020-11-18 21:51:12 +08:00
448df42fb0 [Compatibility] Add table_privileges, schema_privileges and user_privileges tables(#4899)
Add privileges tables in information_schema database
2020-11-16 21:58:30 +08:00
e706a6bca4 [Doc] Running Profile document add HASH_JOIN_NODE, etc. (#4878)
- Running Profile document add `HASH_JOIN_NODE`, `CROSS_JOIN_NODE`, `UNION_NODE`, `ANALYTIC_EVAL_NODE`.
- `UNION_NODE` increase`MaterializeExprsEvaluateTime` profile.
2020-11-16 21:53:25 +08:00
18a22bd347 [BUG] Fix field error in information_schema.columns (#4858) 2020-11-15 22:01:32 +08:00
e9923100f2 [Profile][UT] Fix UT and remove useless profile (#4879)
Fix UT failed by #4825 and remove useless profile
2020-11-12 16:28:57 +08:00
66132d2836 [Feature] Running Profile OLAP_SCAN_NODE layering and enhance readability (#4825)
mainly includes:
- `OLAP_SCAN_NODE` profile layering: `OLAP_SCAN_NODE`,`OlapScanner`, and `SegmentIterator`.
- Delete meaningless statistical values. mainly in scan_node.cpp.
- Increase `RowsConditionsFiltered` statistical, split from `RowsDelFiltered`, the meaning is the number of rows filtered by various column indexes, only in segment V2.
- Modify the document based on the above, and enhance readability.
2020-11-11 21:21:25 +08:00
b1c1ffda4a [Refactor] Refactor olap scan node code (#4823)
1. Remove meaningless code in Doris
2. Replace string copy by string reference
3. Simplified the implementation of some functions
2020-11-01 09:12:23 +08:00
44498a1ae2 [Compatibility] Add table "views" in information_schema database (#4778)
To support some tools like DBeaver
2020-10-30 11:44:44 +08:00
7b2762b1b1 [Doris On ES][Bug-Fix] Can not pushdown limit when some predicate can not processed by ES (#4768)
Can not pushdown limit when some predicate not processed by ES, fixed: #4761
2020-10-21 12:10:55 +08:00
349cc9ef17 [Bug] Do not push down limit operation when ODBC table do not push all conjunct as filter. (#4764) 2020-10-21 10:12:12 +08:00
09f97f8a05 [Refactor] Fixes some be typo part 2 (#4747) 2020-10-20 09:28:57 +08:00
83f6f46c34 [Config] Limit the version number of tablet (#4687)
Add a BE config `max_tablet_version_num` to limit the version number of a single tablet.
To avoid too many versions
2020-10-13 10:08:16 +08:00
75e0ba32a1 Fixes some be typo (#4714) 2020-10-13 09:37:15 +08:00
d73d205de7 [ODBC/MySQL] Support Limit Clause Push Down For ODBC Table And MySQL Table(#4706) (#4707)
1. Support limit clause push down both odbc table and mysql table.
2. Code refactor of ODBC Scan Node, change `build_connect_string` and `query_string` from BE to FE to make it easily to modify
2020-10-11 21:11:04 +08:00
1dacadb015 [BUG] Fix DATA_TYPE in information_schema.columns is not compatible to mysql meta (#4648)
Describe the bug

DATA_TYPE in information_schema.columns is not compatible to mysql meta

To Reproduce
Steps to reproduce the behavior:
select * from information_schema.columns

Expected behavior
the result of data_type is (int, decimal, char, varchar, ...),but doris data_type is (int(11), varchar(20), ...)
Excess number will affect some BI systems or upper system can't get right type
2020-09-25 13:38:09 +08:00
a61d0de173 [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs (#4438) 2020-09-25 10:19:50 +08:00
9419c73472 [Bug] Fix bug that BE will crash when querying information_schema.columns (#4595) 2020-09-14 15:47:08 +08:00
b780df697a [refactor] Optimize threads usage mode in BE (#4440)
BE can not graceful exit because some threads are running in endless
loop. This patch do the following optimization:
- Use the well encapsulated Thread and ThreadPool instead of std::thread
  and std::vector<std::thread>
- Use CountDownLatch in thread's loop condition to avoid endless loop
- Introduce a new class Daemon for daemon works, like tcmalloc_gc,
  memory_maintenance and calculate_metrics
- Decouple statistics type TaskWorkerPool and StorageEngine notification
  by submit tasks to TaskWorkerPool's queue
- Reorder objects' stop and deconstruct in main(), i.e. stop network
  services at first, then internal services
- Use libevent in pthreads mode, by calling evthread_use_pthreads(),
  then EvHttpServer can exit gracefully in multi-threads
- Call brpc::Server's Stop() and ClearServices() explicitly
2020-09-06 20:19:14 +08:00
068707484d Support sequence column for UNIQUE_KEYS Table (#4256)
* add sequence  col

Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>
2020-09-04 10:10:17 +08:00
d7ac44ac79 [Bug] Fix bug that BE will crash when querying information_schema.columns (#4511)
This bug is introduced from #4364
2020-09-03 16:57:56 +08:00
498b06fbe2 [Metrics] Support tablet level metrics (#4428)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. 
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-09-02 10:39:41 +08:00
f218327dd9 [Mysql Compatibility] Support convert() and signed/unsigned interger cast (#4364)
1. Support convert(expr, target_type) function, which is same as CastExpr
2. Support cast (expr as signed/unsigned int)
   This is just for compatibility, the signed/unsigned specification is meaningless.
2020-08-27 12:07:58 +08:00
8b0b120aca [Profile] Add 2 Segment related metrics in query profile (#4348)
Total number of segments and filterd number of segment
2020-08-27 12:07:21 +08:00
e4e9af4577 This PR contain three things (#4448)
1. Fix core bug wild pointer in PlanFragmentExecutor, fix issue #4447
2. Fix core bug wild pointer json load, fix issue #4452
3. Change the declare order of ODBC type in thrift for compatibility
2020-08-26 10:53:53 +08:00
97d963468a [Code Cleanup] Template nest convert to c++11 syntax and style (#4442) 2020-08-26 10:51:52 +08:00
b4d8b3d9ba Forbidden the illegal column types on BITMAP_UNION OR HLL_UNION mv (#4432)
1. The base column of bitmap_union could must be integer. The largeint is not supported too.
2. The base column of hll_union could not be decimal.

Check error msg of const expr in Union Node

If user wants to insert a negative number into bitmap mv, Doris will thrown exception 'invalid input'.
The const value in Union Node is checked in this commit.
2020-08-26 10:49:32 +08:00
d61c10b761 [Delete] Support batch delete [part 1] (#4310)
* Implements the grammar of the batch delete #4051 
* Process create, alter table when table has delete sign column
* Support the syntax for enabling the delete column
* Automatically filtered deleted data in the select statement.
* Automatically add delete sign when create  rollup table
TODO:
 * Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction
2020-08-21 22:57:16 +08:00
5976395bb6 [BUG] Remove the deduplication of LEFT SEMI/ANTI JOIN with not equal predicate (#4417)
```
SELECT *
FROM
  (SELECT cs_order_number,
          cs_warehouse_sk
   FROM catalog_sales
   WHERE cs_order_number = 125005
     AND cs_warehouse_sk = 4) cs1
LEFT SEMI JOIN
  (SELECT cs_order_number,
          cs_warehouse_sk
   FROM catalog_sales
   WHERE cs_order_number = 125005) cs2
ON cs1.cs_order_number = cs2.cs_order_number
AND cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk;
```

The above query has an equal predicate and a not equal predicate.
If there exists not equal preidcate, the build table should be remained
as it is. So the deduplication should be removed.
2020-08-21 19:55:09 +08:00
b6859f1bd4 [JsonLoad] Fix bug that row num stat is not correct when loading json (#4379)
When all fields are null, the row is invalid, it should be filtered
2020-08-20 09:30:19 +08:00
4c571cb6f5 Revert "[Metrics] Support tablet level metrics (#4327)" (#4397)
This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-08-19 22:37:52 +08:00
56260a65c8 [Metrics] Support tablet level metrics (#4327)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-08-18 16:56:12 +08:00
e25108097d [Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage (#4345)
After PR: #4135, If a mem tracker has parent, it should be created by 'CreateTracker'.
So I removed other unused constructors.

And also fix the bug described in #4344
2020-08-18 16:54:55 +08:00
d5e456a3c3 [BUG] Fix except wrong answer bug (#4369)
Doris use HashTable to implement except.
If user send A except B except C, first do A except B and then except C.
After A except B, HashTable will be rebuild.
There is a bug here to throw some rows.
2020-08-18 09:23:48 +08:00
391d534ae7 [Bug]Fix bug that BE crash when load ORC file (#4350) 2020-08-17 22:55:29 +08:00
c81862ebec Remove palo::PInternalService_Stub in BE code. (#4298)
We can remove the caller in sender side.
After all node are upgraded, we can remove the callee
in receiver side.
2020-08-10 09:46:17 +08:00