doris

Author	SHA1	Message	Date
Yingchun Lai	f39c8b156d	[refactor] A small refactor on class DataDir (#3276 ) main refactor points are: - Use a single get_absolute_tablet_path function instead of 3 independent functions - Remove meaningless return value of register_tablet and deregister_tablet - Some typo and format	2020-04-10 00:32:22 +08:00
Mingyu Chen	ce1d5ab9ab	[Bug] Fix some bugs of install/uninstall plugins (#3267 ) 1. Avoid losing plugin if plugin failed to load when replaying When in replay process, the plugin should always be added to the plugin manager, even if that plugin failed to be loaded. 2. `show plugin` statement should show all plugins, not only the successfully installed plugins. 3. plugin's name should be unique globally and case insensitive. 4. Avoid creating new instances of plugins when doing metadata checkpoint. 5. Add a __builtin_ prefix for builtin plugins.	2020-04-09 23:04:28 +08:00
caiconghui	a5703ef114	[Performance] Support sharding txn_map_lock into more small map locks to make good performance for txn manage task (#3222 ) This PR is to enhance the performance for txn manage task, when there are so many txn in BE, the only one txn_map_lock and additional _txn_locks may cause poor performance, and now we remove the additional _txn_locks and split the txn_map_lock into many small locks.	2020-04-09 22:35:15 +08:00
HangyuanLiu	037bc53b54	[BUG] Fix cast result expr bug (#3279 ) When the result type is a date type, the result expr type should not be cast. Because in the FE function, the specific type of the date type is determined by the actual type of the return value, not by the function return value type. For example, the function `str_to_date` may return DATE or DATETIME, depends on the format pattern. DATE: ``` mysql> select str_to_date('11/09/2011', '%m/%d/%Y'); +---------------------------------------+ \| str_to_date('11/09/2011', '%m/%d/%Y') \| +---------------------------------------+ \| 2011-11-09 \| +---------------------------------------+ ``` DATETIME: ``` mysql> select str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s'); +---------------------------------------------------------+ \| str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s') \| +---------------------------------------------------------+ \| 2014-12-21 12:34:56 \| +---------------------------------------------------------+	2020-04-09 22:02:05 +08:00
yangzhg	8699bb7bd4	[Query] Optimize where clause by extracting the common predicate in the OR compound predicate. (#3278 ) Queries like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now Doris will create only one cross join node to execute this sql, the time of eval the where clause is about 200-300 ns, the total count of eval will be 2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds； ``` select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales, customer_address where ((ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ``` but this sql can be rewrite to ``` select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales, customer_address where ss_addr_sk = ca_address_sk and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ) ``` there for we can do a hash join first and then use ``` (((ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ) ``` to filter the value, in TPCDS 10g dataset, the rewritten sql only cost about 1 seconds.	2020-04-09 21:57:45 +08:00
HangyuanLiu	3dc7ef634b	[Dependency]Add cctz lib (#3280 ) Add Google/CCTZ lib in Doris	2020-04-09 19:14:09 +08:00
HuangWei	e32ed28bf4	[Storage] Use getmntent_r() for thread-safe (#3284 )	2020-04-09 14:19:09 +08:00
Yunfeng,Wu	614a76beea	[Doris on ES] Support compound_and predicate push down to Elasticsearch (#3277 ) Relate Issue: https://github.com/apache/incubator-doris/issues/3248 SQL: ``` select * from test where (k2 = 6 and k3 = 1) or (k2 = 2 and k3 =3 and k4 = 'beijing'); ``` Output filter: ``` ((#k2:[6 TO 6] #k3:[1 TO 1]) (#(#k2:[2 TO 2] #k3:[3 TO 3]) #k4:beijing))~1 ``` SQL: ``` select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and k3 =3 and (k4 = 'beijing' or k4 = 'zhaochun')); ``` Output filter: ``` (k2:[6 TO 6] k3:[7 TO 7] (#(#k2:[2 TO 2] #k3:[3 TO 3]) #((k4:beijing k4:zhaochun)~1)))~1 ``` SQL: ``` select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and abs(k3) =3 and (k4 = 'beijing' or k4 = 'zhaochun')); ``` Output filter (`abs` can not be pushed down to es, so doris on es would not process this scenario ): ``` match_all ```	2020-04-08 21:09:39 +08:00
yangzhg	f37dbbc890	Fix openssl download url is not avaliable (#3281 )	2020-04-08 19:00:48 +08:00
Dayue Gao	3557b12de5	[Bug] Avoid compacting recengly added rowset (#3271 ) This CL fixes #3270 by skipping recently added version when performing cumulative compaction. A new config named "cumulative_compaction_skip_window_seconds" is added to adjust the time window.	2020-04-08 18:58:12 +08:00
Yingchun Lai	8fc284d593	[config] Support to modify configs when BE is running without restarting (#3264 ) In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE. This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag' when BE is running without restarting it. You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`	2020-04-08 11:17:47 +08:00
Dayue Gao	d110629a5f	Optimize performance of TxnManager::build_expire_txn_map (#3269 ) It's not possible to insert duplicated transaction ids for a specific tablet, therefore we could use map<TabletInfo, vector<int64_t>> instead of map<TabletInfo, set<int64_t>> for expire_txn_map.	2020-04-07 23:54:05 +08:00
HuangWei	162b1c5d8b	[Storage] Open data dirs parallelly (#3260 )	2020-04-07 20:59:56 +08:00
EmmyMiao87	d0f87728e0	[Doc] Add example of timeout property in alter table stmt (#3274 )	2020-04-07 19:51:16 +08:00
lichaoyong	c9c58342b2	[License] Add License to codes (#3272 )	2020-04-07 16:35:13 +08:00
Mingyu Chen	1ef4cb2d24	[Bug] Base compaction failed because of overlapping of input rowsets (#3262 ) When calculating the cumulative point at first time, we should stop increasing the cumulative point when we meet a rowset with overlap flag as OVERLAPPING, even if it has only one segments.	2020-04-07 11:26:57 +08:00
frwrdt	79bac50361	Fix the bug that 'username' in broker load is invalid (#3237 )	2020-04-06 22:15:37 +08:00
HuangWei	2ed184e06a	Add config: tablet writer open rpc timeout (#3258 )	2020-04-03 16:43:56 +08:00
lichaoyong	d2307c719c	Fix be unit test error (#3259 )	2020-04-03 15:02:49 +08:00
lichaoyong	a86161f6ce	[Bug]Fix compile error (#3257 )	2020-04-03 13:38:44 +08:00
HangyuanLiu	3f247b0d2d	Fix cast date type return wrong result (#3214 ) We have multiple date type, and we also need to cast between different date types. If not cast, it will cause problems when binarypredicate	2020-04-03 12:08:18 +08:00
lichaoyong	881661ac10	Fix spell error (#3255 )	2020-04-03 10:43:09 +08:00
Mingyu Chen	fcb651329c	[Plugin] Making FE audit module pluggable (#3219 ) Currently we have implemented the plugin framework in FE. This CL make the original audit log logic pluggable. The following classes are mainly implemented: 1. AuditPlugin The interface of audit plugin 2. AuditEvent An AuditEvent contains all information about an audit event, such as a query, or a connection. 3. AuditEventProcessor Audit event processor receive all audit events and deliver them to all installed audit plugins. This CL implements two audit module plugins: 1. The builtin plugin `AuditLogBuilder`, which act same as the previous logic, to save the audit log to the `fe.audit.log` 2. An optional plugin `AuditLoader`, which will periodically inserts the audit log into a Doris table specified by the user. In this way, users can conveniently use SQL to query and analyze this audit log table. Some documents are added: 1. HELP docs of install/uninstall/show plugin. 2. Rename the `README.md` in `fe_plugins/` dir to `plugin-development-manual.md` and move it to the `docs/` dir 3. `audit-plugin.md` to introduce the usage of `AuditLoader` plugin. ISSUE： #3226	2020-04-03 09:53:50 +08:00
kangkaisen	c9ff6f68d1	Fix Rewrite count distinct bitmap and hll order by bug (#3251 )	2020-04-03 09:08:27 +08:00
yangzhg	d14726e05b	Fix join hints not work when need table reorder (#3188 ) * fix join hints not work when need table reorder fix cross join numNodes not computed * fix some typo * disable table reorder when has join hints	2020-04-02 17:13:35 +08:00
Mingyu Chen	390f462f55	[Bug] Fix read schema change job meta bug (#3244 )	2020-04-02 12:31:46 +08:00
kangkaisen	6252a271dd	Rewrite count distinct bitmap and hll in order by and having (#3232 )	2020-04-02 09:11:42 +08:00
EmmyMiao87	29b37dad49	Sql reference of materialized view (#3208 ) * Sql reference of materialized view Sql reference of Create and drop materialized view in English and Chinese. * Change discription	2020-04-01 21:22:19 +08:00
WingC	9c937180cd	[Alter]Clean SchemaChangeJobV2 when schema change CANCELLED or FINISHED (#3212 ) SchemaChangeJobV2 will use too much memory in FE, which may cause FullGC. But these data is useless after job is done, so we need to clean it up. NOTICE: update FE meta version to 80	2020-04-01 21:05:17 +08:00
yangzhg	63cee94c5c	Fix output results may incorrect when using intersect and except statements (#3228 ) output results may incorrect when using intersect and except statements	2020-04-01 20:58:43 +08:00
kangkaisen	34993a69a8	Fix colocate relocateGroup bug after decommission (#3239 )	2020-04-01 18:50:36 +08:00
lichaoyong	6a9a62901f	Fix bug of memory limit when group by varchar columns. (#3242 ) select date_format(k10, '%Y%m%d') as myk10 from baseall group by myk10; The date_format function in query above will be stored in MemPool during the query execution. If the query handles millions of rows, it will consume much memory. Should clear the MemPool at interval.	2020-04-01 18:48:18 +08:00
Dayue Gao	8a2eb8fbcf	[Bug][segment_v2] Fix a bug that NullBitmapBuilder is not reset when data page doesn't have null (#3240 ) This CL fixes a bug that could cause wrong answer for beta rowset with nullable column. The root cause is that NullBitmapBuilder is not reset when the current page doesn't contain NULL, which leads to wrong null map to be written for the next page. Added a test case to reproduce the problem.	2020-04-01 18:39:04 +08:00
Stalary	028da655a9	Increased compatibility with mysql (#3235 ) Add divPrecisionIncrement and utf8-superset transform	2020-04-01 09:57:00 +08:00
wangbo	68a801ffbe	Support Java version 64 bits Integers for BITMAP type (#3090 ) Fork from roaringbitmap's Roaring64NavigableMap, overwrite serialize/deserialize method to keep compatibility with be's bitmap storage format	2020-03-31 15:29:41 +08:00
Mingyu Chen	0554e89645	[Alter] Fix bug of assertion failure when submitting schema change job (#3181 ) When creating a schema change job, we will create a corresponding shadow replica for each replica. Here we should check the state of the replica and only create replicas in the normal state. The process here may need to be modified later. We should completely allow users to submit alter jobs under any circumstances, and then in the job scheduling process, dynamically detect changes in the replicas and do replica repairs, instead of forcing a check on submission.	2020-03-31 12:06:30 +08:00
Mingyu Chen	e9b3584d45	[Bug] Fix bug that `desc tbl all` stmt throw error: Malformed packet (#3233 )	2020-03-31 10:29:53 +08:00
Mingyu Chen	4131afe316	[Bug] NPE when using unknown function in broker load process (#3225 ) This CL fix the bug described in issue #3224 by 1. Forbid UDF in broker load process 2. Improving the function checking logic to avoid NPE when trying to get default database from ConnectionContext.	2020-03-30 18:34:41 +08:00
gengjun-git	2e1a0030bc	Add some connect samples (#3221 ) Add connect samples for golang, java , nodejs, php, python.	2020-03-30 13:54:36 +08:00
HuangWei	5f9359d618	Use SleepFor() instead of usleep() (#3211 )	2020-03-29 14:18:19 +08:00
Yingchun Lai	e4682398bd	[web] Dump configs on BE's website '/varz' (#3220 ) Dump configs on BE's website '/varz' Change NAVIGATION_BAR_PREFIX from 'Impala' to 'Doris' Format the related files by clang-format	2020-03-28 16:26:38 +08:00
HangyuanLiu	41f1ab006b	Add curdate/now function in fe (#3215 )	2020-03-28 13:39:54 +08:00
Stalary	6cf217f0c7	Fix WARNING to WARN in fe.conf sys_log_level (#3218 ) When I used it, I changed it to WARING in the comments, and the log didn't work because there was no warning-level log in Java	2020-03-28 10:13:15 +08:00
frwrdt	4a5164ab9d	Fix 'Filesystem closed' in broker load (#3216 )	2020-03-28 09:14:45 +08:00
gengjun-git	d3555e3624	[Conf][API Change] Change the default FE meta dir and BE storage_root_path 1. Change word of palo to doris in conf file. 2. Set default meta_dir to ${DORIS_HOME}/doris-meta 3. Comment out FE meta_dir, leave it to ${DORIS_HOME}/doris-meta, as exsting in FE Config.java. 4. Comment out BE storage_root_path, leave it to ${DORIS_HOME}/storage, as exsting in BE config.h. NOTICE: default config is changed.	2020-03-27 20:42:12 +08:00
EmmyMiao87	cb68e10217	[MaterializedView] Add 'IndexKeysType' field in 'Desc all table stmt' (#3209 ) After doris support aggregation materialized view on duplicate table, desc stmt of metadata is confused in sometimes. The reason is that there is no grouping information in desc stmt of metadata. For example: There are two materialized view as following. 1. create materialized view k1_k2 as select k1, k2 from table; 2. create materialzied view deduplicated_k1_k2 as select k1, k2 from table group by k1, k2; Before this commit, the metatdata in desc stmt is the same. ``` +-----------------------+-------+----------+------+-------+---------+-------+ \| IndexName \| Field \| Type \| Null \| Key \| Default \| Extra \| +-----------------------+-------+----------+------+-------+---------+-------+ \| k1_k2 \| k1 \| TINYINT \| Yes \| true \| N/A \| \| \| \| k2 \| SMALLINT \| Yes \| true \| N/A \| \| \| deduplicated_k1_k2 \| k1 \| TINYINT \| Yes \| true \| N/A \| \| \| \| k2 \| SMALLINT \| Yes \| true \| N/A \| \| +-----------------------+-------+----------+------+-------+---------+-------+ ``` So, we need to show the KeysType of materialized view in desc stmt. Now, the desc stmt of all mvs is changed as following: ``` +-----------------------+---------------+-------+----------+------+-------+---------+-------+ \| IndexName \| IndexKeysType \| Field \| Type \| Null \| Key \| Default \| Extra \| +-----------------------+---------------+-------+----------+------+-------+---------+-------+ \| k1_k2 \| DUP_KEYS \| k1 \| TINYINT \| Yes \| true \| N/A \| \| \| \| \| k2 \| SMALLINT \| Yes \| true \| N/A \| \| \| deduplicated_k1_k2 \| AGG_KEYS \| k1 \| TINYINT \| Yes \| true \| N/A \| \| \| \| \| k2 \| SMALLINT \| Yes \| true \| N/A \| \| +-----------------------+---------------+-------+----------+------+-------+---------+-------+ ``` NOTICE: this modify the the column of `desc` stmt.	2020-03-27 20:36:02 +08:00
Mingyu Chen	aa8b2f86c4	[Bug][Refactor] Fix the conflict of temp partition and dynamic partition operations (#3201 ) The bug is described in issue: #3200. This CL solve the problem by: 1. Refactor the alter operation conflict checking logic by introducing new classes `AlterOperations` and `AlterOpType`. 2. Allow add/drop temporary partition when dynamic partition feature is enabled. 3. Allow modifying table's property when there is temporary partition in table. 4. Make the properties `dynamic_partition.enable` optional, and default is true.	2020-03-27 20:25:15 +08:00
WingC	c1969a3fb3	[Conf] Make default_storage_medium configurable (#2980 ) Doris support choose medium when create table, and the cluster balance strategy is dependent between different storage medium, and most use will not specify the storage medium when create table, even they kown that they should choose a storage medium, they have no idea about the cluster's storage medium, so, I think we should make storage_medium and storage_cooldown_time configurable, and this should be the admin's responsibility. For Example, if the cluster's storage medium is HDD, but we need to change part of machines to SSD, if we change the machine, the tablets before change is stored in HDD and they can't find a dest path to migrate, and user will create table as usual, it will make all tablets stored in old machines and the new machines will only store a little tablets. Without this config the only way is admin need to traverse all partitions in cluster and change the property of storage_medium, it will increase operational and maintenance costs. So I add a FE config default_storage_medium, so that user can set the default storage medium.	2020-03-27 20:22:18 +08:00
caiconghui	32c4fc691c	Support determine isPreviousLoadFinished for some alter jobs in table level (#3196 ) This PR is to reduce the time cost for waiting transactions to be completed in same db by filter the running transactions in table level. NOTICE: Update FE meta version to 79	2020-03-27 20:16:23 +08:00
HuangWei	0462607d8d	StorageEngine: unused_rowsets use unordered_multimap (#3207 )	2020-03-27 14:30:31 +08:00

1 2 3 4 5 ...

1708 Commits