doris

Author	SHA1	Message	Date
EmmyMiao87	30df9fcae9	Serialize origin stmt in Rollup Job and MV Meta (#3705 ) * Serialize origin stmt in Rollup Job and MV Meta In materialized view 2.0, the define expr is serialized in column. The method is that doris serialzie the origin stmt of Create Materialzied View Stmt in RollupJobV2 and MVMeta. The define expr will be extract from the origin stmt after meta is deserialized. The define expr is necessary for bitmap and hll materialized view. For example: MV meta: __doris_mv_bitmap_k1, bitmap_union, to_bitmap(k1) Origin stmt: select bitmap_union(to_bitmap(k1)) from table Deserialize meta: __doris_mv_bitmap_k1, bitmap_union, null After extract: the define expr `to_bitmap(k1)` from origin stmt should be extracted. __doris_mv_bitmap_v1, bitmap_union, to_bitmap(k1) (which comes from the origin stmt) Change-Id: Ic2da093188d8985f5e97be5bd094e5d60d82c9a7 * Add comment of read method Change-Id: I4e1e0f4ad0f6e76cdc43e49938de768ec3b0a0e8 * Fix ut Change-Id: I2be257d512bf541f00912a374a2e07a039fc42b4 * Change code style Change-Id: I3ab23f5c94ae781167f498fefde2d96e42e05bf9	2020-05-30 20:17:46 +08:00
Binglin Chang	5cb4063904	Fix UT ThreadPoolManagerTest failure (#3723 )	2020-05-30 10:35:07 +08:00
Yingchun Lai	43d25afa2c	[compaction] Update cumulative point calculate algorithm (#3690 ) Current cumulative point calculate algorithm may skip singleton rowset when the rowset has only one segment and with NONOVERLAPPING flag. When a tablet is new created and cumulate many singleton rowsets, cumulative point will be calculated as the max version + 1, and then cumulative compaction couldn't pick any rowsets and compaction failed, and will lead the next base compaction on this tablet with all rowsets, which can also cause memory consume problem, suppose there are thousands of rowsets. All singleton rowsets must be newly wrote by delta writer and hasn't do any compaction, we should place cumulative point before any of these rowsets.	2020-05-30 10:34:53 +08:00
Binglin Chang	7524c5ef63	[Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch (#3637 )	2020-05-30 10:33:10 +08:00
Binglin Chang	c967eaf496	[Memory Engine] Add TabletType to PartitionInfo and TabletMeta (#3668 )	2020-05-29 20:20:44 +08:00
lichaoyong	93aae6bdff	[Bug] fix mixed used of counter (#3720 ) MysqlResultWriter _sent_rows_counter and _result_send_timer are mixed used. It will results core dump when checking counter->type().	2020-05-29 15:36:21 +08:00
lichaoyong	5f1d25a31a	[Bug] Set the HttpResponseStatus for QueryProfile when query_id been not set (#3710 ) Doris can get query profile by HttpRequest ``` http://fe_host:web_port/query_profile?query_id=123456 ``` Now, if query_id is not found, the 404 error is not set in HttpHeader.	2020-05-29 10:06:43 +08:00
Dayue Gao	9c85d05e41	[Bug] RuntimeState should be destructed after DataSink (#3709 ) Fixes #3706 DataSink uses instance and query MemTracker from RuntimeState, therefore it should be destructed before RuntimeState. Otherwise memory corruption and segfault could happen.	2020-05-28 17:31:01 +08:00
worker24h	e76f712bb3	[Bug] Load data is error in json load	2020-05-28 17:28:33 +08:00
lichaoyong	8f71c7a331	Duplicate Key table core when predicate on metric column (#3699 ) ``` CREATE TABLE `query_detail` ( `query_id` varchar(100) NULL COMMENT "", `start_time` datetime NULL COMMENT "", `end_time` datetime NULL COMMENT "", `latency` int(11) NULL COMMENT "unit is milliseconds", `state` varchar(20) NULL COMMENT "RUNNING/FINISHED/FAILED", `sql` varchar(1024) NULL COMMENT "" ) DUPLICATE KEY(`query_id`) SELECT COUNT(*) FROM query_detail WHERE start_time >= '2020-05-27 14:52:16' AND start_time < '2020-05-27 14:52:31'; ``` The above query will core because of ZoneMap only in query_id. Use start_time to match ZoneMap cause this core.	2020-05-28 14:35:40 +08:00
Mingyu Chen	f89d970cfd	[Bug][Metrics] Fix bug that some of metrics can not be got (#3708 ) The metrics in a metric collector need have same type, but no need to have same unit.	2020-05-28 09:09:14 +08:00
Mingyu Chen	bc35f3a31f	[DynamicPartition] Optimize the rule of creating dynamic partition (#3679 ) Problem is described in ISSUE #3678 This CL mainly changed to rule of creating dynamic partition. 1. If time unit is DAY, the logic remains unchanged. 2. If time unit is WEEK, the logical changes are as follows: 1. Allow to set the start day of every week, the default is Monday. Optional Monday to Sunday 2. Assuming that the starting day is a Tuesday, the range of the partition is Tuesday of the week to Monday of the next week. 3. If time unit is MONTH, the logical changes are as follows: 1. Allow to set the start date of each month. The default is 1st, and can be selected from 1st to 28th. 2. Assuming that the starting date is the 2nd, the range of the partition is from the 2nd of this month to the 1st of the next month. 4. The `SHOW DYNAMIC PARTITION TABLES` statement adds a `StartOf` column to show the start day of week or month. It is recommended to refer to the example in `dynamic-partition.md` to understand. TODO: Better to support HOUR and YEAR time unit. Maybe in next PR. FIX: #3678	2020-05-27 16:42:41 +08:00
lichaoyong	1cc78fe69b	[Enhancement] Convert metric to Json format (#3635 ) Add a JSON format for existing metrics like this. ``` { "tags": { "metric":"thread_pool", "name":"thrift-server-pool", "type":"active_thread_num" }, "unit":"number", "value":3 } ``` I add a new JsonMetricVisitor to handle the transformation. It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor. Also I add 1. A unit item to indicate the metric better 2. Cloning tablet statistics divided by database. 3. Use white space to replace newline in audit.log	2020-05-27 08:49:30 +08:00
turbo jason	12c59ba889	[Thirdparty][glog][bug] convert init be log file length use fopen function (#3649 )	2020-05-26 22:42:50 +08:00
worker24h	fb66bac5fe	[Bug] Fix null pointer access in json-load (#3692 ) Add check for null pointer to avoid core dump	2020-05-26 22:41:30 +08:00
Mingyu Chen	dcd5e5df12	[AuditPlugin] Modify load label of audit plugin to avoid load confliction (#3681 ) Change the load label of audit plugin as: `audit_yyyyMMdd_HHmmss_feIdentity`. The `feIdentity` is got from the FE which run this plugin, currently just use FE's IP_editlog_port.	2020-05-26 18:23:07 +08:00
wyb	4978bd6c81	[Spark load] Add resource manager (#3418 ) 1. User interface: 1.1 Spark resource management Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3 is used for external storage. We introduced resource management to manage these external resources used by Doris. ```sql -- create spark resource CREATE EXTERNAL RESOURCE resource_name PROPERTIES ( type = spark, spark_conf_key = spark_conf_value, working_dir = path, broker = broker_name, broker.property_key = property_value ) -- drop spark resource DROP RESOURCE resource_name -- show resources SHOW RESOURCES SHOW PROC "/resources" -- privileges GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name ``` - CREATE EXTERNAL RESOURCE: FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available. PROPERTIES： 1. type: resource type. Only support spark now. 2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html. 3. working_dir: optional, used to store ETL intermediate results in spark ETL. 4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE. Example: ```sql CREATE EXTERNAL RESOURCE "spark0" PROPERTIES ( "type" = "spark", "spark.master" = "yarn", "spark.submit.deployMode" = "cluster", "spark.jars" = "xxx.jar,yyy.jar", "spark.files" = "/tmp/aaa,/tmp/bbb", "spark.yarn.queue" = "queue0", "spark.executor.memory" = "1g", "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999", "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000", "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris", "broker" = "broker0", "broker.username" = "user0", "broker.password" = "password0" ) ``` - SHOW RESOURCES: General users can only see their own resources. Admin and root users can show all resources. 1.2 Create spark load job ```sql LOAD LABEL db_name.label_name ( DATA INFILE ("/tmp/file1") INTO TABLE table_name, ... ) WITH RESOURCE resource_name [(key1 = value1, ...)] [PROPERTIES (key2 = value2, ... )] ``` Example: ```sql LOAD LABEL example_db.test_label ( DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table ) WITH RESOURCE "spark0" ( "spark.executor.memory" = "1g", "spark.files" = "/tmp/aaa,/tmp/bbb" ) PROPERTIES ("timeout" = "3600") ``` The spark configurations in load stmt can override the existing configuration in the resource for temporary use. #3010	2020-05-26 18:21:21 +08:00
Mingyu Chen	77b9acc242	[Stmt] Add rowCount column to SHOW DATA stmt (#3676 ) User can see the row count of all materialized indexes of a table. ``` mysql> show data from test; +-----------+-----------+-----------+--------------+----------+ \| TableName \| IndexName \| Size \| ReplicaCount \| RowCount \| +-----------+-----------+-----------+--------------+----------+ \| test2 \| r1 \| 10.000MB \| 30 \| 10000 \| \| \| r2 \| 20.000MB \| 30 \| 20000 \| \| \| test2 \| 50.000MB \| 30 \| 50000 \| \| \| Total \| 80.000 \| 90 \| \| +-----------+-----------+-----------+--------------+----------+ ``` Fix #3675	2020-05-26 15:53:38 +08:00
EmmyMiao87	aa4ac2d078	[Bug] Serialize storage format in rollup job (#3686 ) The segment v2 rollup job should set the storage format v2 and serialize it. If it is not serialized, the rollup of segment v2 may use the error format 'segment v1'.	2020-05-26 15:35:12 +08:00
HappenLee	f4c03fe8e2	1. Delete the code of Sort Node we do not use now. (#3666 ) Optimize the quick sort by find_the_median and try to reduce recursion level of quick sort.	2020-05-26 10:20:57 +08:00
hffariel	963d4d48aa	Override the style of sidebar's sub-direcotry (#3683 ) Override the style of sidebar's sub-directory.	2020-05-26 09:07:55 +08:00
Mingyu Chen	3ffc447b38	[OUTFILE] Support `INTO OUTFILE` to export query result (#3584 ) This CL mainly changes: 1. Support `SELECT INTO OUTFILE` command. 2. Support export query result to a file via Broker. 3. Support CSV export format with specified column separator and line delimiter.	2020-05-25 21:24:56 +08:00
yangzhg	6788cacb94	Fix unit test failed (#3642 ) Fix some unittest failed due to glog， this may be we change the ut build dir，and the log path is not exist in new build dir， so we change the log from file to stdout	2020-05-25 18:55:19 +08:00
caiconghui	e6864a1cda	Allow user to set thrift_client_timeout_ms config for thrift server (#3670 ) 1. Allow user to set thrift_client_timeout_ms config for thrift server 2. Add doc for thrift_client_timeout_ms config	2020-05-25 11:32:14 +08:00
HangyuanLiu	2608f83bdc	[WIP] Add define expr for column (#3651 ) In the materialized view 2.0 the define expr should be set in column. For example, the to_bitmap function on integer should be define in mv column. ``` create materialized view mv as select bitmap_union(to_bitmap(k1)) from table. ``` The meta of mv as following: column name: __doris_materialized_view_bitmap_k1 column aggregate type: bitmap_union column define exrp: to_bitmap(k1) The is WIP pr for materialized view 2.0. #3344	2020-05-25 11:00:29 +08:00
Mingyu Chen	ec955b8a36	[Bug] Fix bug that runningTxnNum does not equal to the real running txn num. (#3674 ) This is because the logic for modifying the number of things running is wrong. Because we did not persist the previous status(preStatus) of a transaction. Therefore, when replaying the metadata log, we cannot decide whether to modify the `runningTxnNum` value based on `preStatus`. This info is lost.	2020-05-25 10:41:38 +08:00
Binglin Chang	12ebd5d82b	Remove some outdate test (#3672 )	2020-05-25 09:23:56 +08:00
HangyuanLiu	838c1e9212	Modify HLL functions return type (#3656 ) 1、Modify hll_hash function return type to HLL 2、Make HLL_RAW_AGG is alias of HLL_UNION	2020-05-24 21:22:43 +08:00
Mingyu Chen	ef9c716682	[Bug] Fix bug that missing OP_SET_REPLICA_STATUS when reading journal (#3662 )	2020-05-22 23:04:47 +08:00
Mingyu Chen	1124808fbc	[Enhancement] Add detail msg to show the reason of publish failure. (#3647 ) Add 2 new columns `PublishTime` and `ErrMsg` to show publish version time and errors happen during the transaction process. Can be seen by executing: `SHOW PROC "/transactions/dbId/";` or `SHOW TRANSACTION WHERE ID=xx;` Currently is only record error happen in publish phase, which can help us to find out which txn is blocked. Fix #3646	2020-05-22 22:59:53 +08:00
yangzhg	ba7d2dbf7b	[Function] Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638 ) Support utf-8 encoding for string function `instr`, `locate`, `locate_pos`, `lpad`, `rpad` and add unit test for them	2020-05-22 14:34:26 +08:00
EmmyMiao87	16deac96a9	[UT][Bug] Fix the ut error of bitmap_intersect (#3664 ) Change-Id: Id32fd9381119f30786acae9b4ac61b0d5ef9df48	2020-05-22 10:29:12 +08:00
yangzhg	00d563d014	[SQL] Support more syntax in case when clause (#3625 ) support support more syntax in case-when clause with subquey. suport query like ` case when k1 > subquery1 and k2 < subquey2 then ... else ... ` or `case when subquey in null then ...`	2020-05-22 10:22:59 +08:00
EmmyMiao87	dbfe8a067f	[Doc ]Add docs of max_running_txn_num_per_db (#3657 ) Change-Id: Ibdbc19a5558b0eb3f6a5fc4ef630de255b408a92	2020-05-22 10:22:11 +08:00
morningman	74fb1b830a	[Bug] Fix bug that missing OP_SET_REPLICA_STATUS when reading journal	2020-05-22 10:01:34 +08:00
hffariel	4eade9bd95	[website] Upgrade node version in travis (#3653 ) Upgrade node version to 14 to speed up the Vuepress building process.	2020-05-22 09:14:35 +08:00
HuangWei	fb02bb5cd9	[Load] Fix mem limit in NodeChannel (#3643 )	2020-05-22 09:11:59 +08:00
worker24h	4f79036a7e	Add error code into error message (#3645 )	2020-05-21 19:14:35 +08:00
Mingyu Chen	f6b5c8839b	[Bug] Ignore loading DELETE status tablet error when restarting BE (#3641 ) Fix: #3640 Also add a `batch delete meta` feature for `meta tool` Fix #3639	2020-05-21 19:08:28 +08:00
worker24h	ef8fd1fcbe	[Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553 ) Doris support load json-data by RoutineLoad or StreamLoad	2020-05-21 13:00:49 +08:00
sduzh	792307ae54	[CMake] Different cmake build directories for different build types (#3623 ) (#3629 ) add `CMAKE_BUILD_TYPE` as the suffix of build directory.	2020-05-20 21:41:44 +08:00
EmmyMiao87	0d66e6bd15	Support bitmap_intersect (#3571 ) * Support bitmap_intersect Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data. The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object. The defination is following: FunctionName: bitmap_intersect, InputType: bitmap, OutputType: bitmap The scenario is as follows: Query which users satisfy the three tags a, b, and c at the same time. ``` select bitmap_to_string(bitmap_intersect(user_id)) from ( select bitmap_union(user_id) user_id from bitmap_intersect_test where tag in ('a', 'b', 'c') group by tag ) a ``` Closed #3552. * Add docs of bitmap_union and bitmap_intersect * Support null of bitmap_intersect	2020-05-20 21:12:02 +08:00
Binglin Chang	c54cb4b14e	[Memory Engine] Add column reader/writer (#3580 )	2020-05-20 11:09:30 +08:00
yangzhg	6be7a6232f	[Config] Add ignore config to determine whether to continue to start be when load tablet from header failed. (#3632 ) Add config ignore_load_tablet_failure to determine whether to continue to start be when load tablet from header failed.	2020-05-20 09:40:50 +08:00
yangzhg	58a6628af2	[Bug] Fix first start error after upgrade doris to support delete dulplicate table value columns (#3628 )	2020-05-20 09:39:24 +08:00
Dayue Gao	9425f17d28	[Bug] instance mem tracker should has no limit (#3592 )	2020-05-19 19:49:39 +08:00
令狐少侠	8018b1c348	[Doris on ES]Fix bug of like not translate correctly (#3602 ) Why this case happened In current implement, translation into dsl only if it is not the first charactor. Thus, when sql is write like '%abc', translation would not run. How fixed Now, translation will trigger with charactor '?' or '*' if it is the first charactor, translate directly else, check the preceding char is escaped or not to determin translation or not	2020-05-19 17:06:46 +08:00
Mingyu Chen	4cbcae1574	[Spark on Doris] Shade and provide the thrift lib in spark-doris-connector (#3631 ) Mainly changes: 1. Shade and provide the thrift lib in spark-doris-connector 2. Add a `build.sh` for spark-doris-connector 3. Move the README.md of spark-doris-connector to `docs/` 4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`	2020-05-19 14:20:21 +08:00
Mingyu Chen	7fb74db0a1	[Trace] Introduce trace util to BE Ref https://github.com/apache/incubator-doris/issues/3566 Introduce trace utility from Kudu to BE. This utility has been widely used in Kudu, Impala also import this trace utility. This trace util is used for tracing each phases in a thread, and can be dumped to string to see each phases' time cost and diagnose which phase cost more time. This util store a Trace object as a threadlocal variable, we can add trace entries which record the current file name, line number, user specified symbols and timestamp to this object, and it's able to add some counters to this Trace object. And then, it can be dumped to human readable string. There are some helpful macros defined in trace.h, here is a simple example for usage: ``` scoped_refptr<Trace> t1(new Trace); // New 2 traces scoped_refptr<Trace> t2(new Trace); t1->AddChildTrace("child_trace", t2.get()); // t1 add t2 as a child named "child_trace" TRACE_TO(t1, "step $0", 1); // Explicitly trace to t1 usleep(10); // ... do some work ADOPT_TRACE(t1.get()); // Explicitly adopt to trace to t1 TRACE("step $0", 2); // Implicitly trace to t1 { // The time spent in this scope is added to counter t1.scope_time_cost TRACE_COUNTER_SCOPE_LATENCY_US("scope_time_cost"); ADOPT_TRACE(t2.get()); // Adopt to trace to t2 for the duration of the current scope TRACE("sub start"); // Implicitly trace to t2 usleep(10); // ... do some work TRACE("sub before loop"); for (int i = 0; i < 10; ++i) { TRACE_COUNTER_INCREMENT("iterate_count", 1); // Increase counter t2.iterate_count MicrosecondsInt64 start_time = GetMonoTimeMicros(); usleep(10); // ... do some work MicrosecondsInt64 end_time = GetMonoTimeMicros(); int64_t dur = end_time - start_time; // t2's simple histogram metric with name prefixed with "lbm_writes" const char* counter = BUCKETED_COUNTER_NAME("lbm_writes", dur); TRACE_COUNTER_INCREMENT(counter, 1); } TRACE("sub after loop"); } TRACE("goodbye $0", "cruel world"); // Automatically restore to trace to t1 std::cout << t1->DumpToString(Trace::INCLUDE_ALL) << std::endl; ``` output looks like: ``` 0514 02:16:07.988054 (+ 0us) trace_test.cpp:76] step 1 0514 02:16:07.988112 (+ 58us) trace_test.cpp:80] step 2 0514 02:16:07.988863 (+ 751us) trace_test.cpp:103] goodbye cruel world Related trace 'child_trace': 0514 02:16:07.988120 (+ 0us) trace_test.cpp:85] sub start 0514 02:16:07.988188 (+ 68us) trace_test.cpp:88] sub before loop 0514 02:16:07.988850 (+ 662us) trace_test.cpp:101] sub after loop Metrics: {"scope_time_cost":744,"child_traces":[["child_trace",{"iterate_count":10,"lbm_writes_lt_1ms":10}]]} ``` Exclude the original source code, this patch do the following work to adapt to Doris: - Rename "kudu" namespace to "doris" - Update some names to the existing function names in Doris, i.g. strings::internal::SubstituteArg::kNoArg -> strings::internal::SubstituteArg::NoArg - Use doris::SpinLock instead of kudu::simple_spinlock which hasn't been imported - Use manual malloc() and free() instead of kudu::Arena which hasn't been imported - Use manual rapidjson::Writer instead of kudu::JsonWriter which hasn't been imported - Remove all TRACE_EVENT related unit tests since TRACE_EVENT is not imported this time - Update CMakeLists.txt NOTICE(#3622): This is a "revert of revert pull request". This pr is mainly used to synthesize the PRs whose commits were scattered and submitted due to the wrong merge method into a complete single commit.	2020-05-18 14:55:11 +08:00
Mingyu Chen	9d72d1bb87	[Refactor] Refactor some redundant code && Replace some UT by UtFrameUtils This CL have no logic changed, just do some code refactor and use new UtFrameWork to replace some old UT. NOTICE(#3622): This is a "revert of revert pull request". This pr is mainly used to synthesize the PRs whose commits were scattered and submitted due to the wrong merge method into a complete single commit.	2020-05-18 14:53:59 +08:00

... 222 223 224 225 226 ...

13073 Commits