doris

Author	SHA1	Message	Date
Mingyu Chen	c8899ee5bd	[Build][ARM] Fix some compilation problems on ARM64 (#6076 ) 1. Disable libhdfs3 on ARM, because it doesn't support ARM now. 2. Add compilation doc for ARM64	2021-06-23 09:38:16 +08:00
Zhengguo Yang	68bab73c35	[Bug] Fix select random storage path maybe same at a long time (#6062 ) random_shuflle will generate same random sequence when call multiple times, although we use twice random, but when there is no change in the size relationship between the adjacent numbers, the result of the second shuffle will not change either	2021-06-20 16:16:32 +08:00
xinghuayu007	5dabf0bef5	[Alter] validate data file after alter operation success (#6022 ) Co-authored-by: wangxixu <wangxixu@xiaomi.com>	2021-06-20 16:15:14 +08:00
stdpain	1999a0c26b	[optimization] open gcc strict-aliasing optimization (#6034 ) * open gcc strict-aliasing optimization * use -Werror=strick-alias	2021-06-18 11:39:24 +08:00
Mingyu Chen	5cfe081b05	[Bug] Remove duplicate memtracker (#6041 ) * [Enhanece] Remove duplicate memtracker This problem will cause frequent creation of memtracker and affect query concurrency.	2021-06-18 11:28:37 +08:00
weizuo93	9f52f4f9e5	fix stream load error msg missing (#6050 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-06-18 09:21:12 +08:00
Mingyu Chen	d57c2344e1	[MemTracker] Refactored the hierarchical structure of memtracker (#5956 ) To avoid showing too many memtracker on BE web pages. The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE. OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata. TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task. VERBOSE is used for other more detailed memtrackers.	2021-06-16 09:44:24 +08:00
stdpain	bde60280b8	[Optimize] use string_view instead of std::string in string function (#6010 )	2021-06-16 09:40:13 +08:00
crazyleeyang	8b4721c941	[Bug] Fix kafka consumer reuse bug (#6007 ) When judging whether consumer can be reused, it is necessary to judge whether the parameter content is equal.	2021-06-16 09:39:05 +08:00
Yingchun Lai	6d6c3d9703	[Enhancement] Reduce memory consumption by releasing readers earier (#5811 ) We created multiple rowset readers to read data of one tablet, after one rowset reader has reached EOF, it can be released to reduce resource (typically memory) consumption. As the same, we can release segment reader when it reach EOF.	2021-06-16 09:37:50 +08:00
luozenglin	d33a6d1b98	[Function] Support date function: yearweek(), week(), makedate(). (#6000 )	2021-06-10 17:38:25 +08:00
HappenLee	80220af271	[Enhancement] Use Parallel Hash Map Replace Unordered Map In Dict Encodeing Map And Hyper Set (#5990 ) Use Parallel Hash Map Replace Unordered Map In Dict Encodeing Map And Hyper Set To Improve Ferformance	2021-06-10 17:38:08 +08:00
Mingyu Chen	206a711f9b	[Bug] SimplifyInvalidDateBinaryPredicatesDateRule may cause invalid query plan (#5987 ) 1. "where 1k > to_date(now())" will return EMPTYSET in query plan. 2. DateLiteral should accept date string like "2021-6-1".	2021-06-10 17:37:26 +08:00
xinghuayu007	e245aee33e	[Feature] Select outfile support parquet format (#5938 ) `Select outfile into` currently only supports to export data with CSV format. This patch extends the feature to supports parquet format. Usage: LocaFile: ``` SELECT citycode FROM table1 INTO OUTFILE "file:///root/doris/" FORMAT AS PARQUET PROPERTIES ("schema"="required,int32,siteid;", "parquet.compression"="snappy"); ``` BrokerFile: ``` SELECT siteid FROM table1 INTO OUTFILE "hdfs://host/test_sql_prc_2019_02_19/" FORMAT AS PARQUET PROPERTIES ( "broker.name" = "hdfs_broker", "broker.hadoop.security.authentication" = "kerberos", "broker.kerberos_principal" = "test", "broker.kerberos_keytab_content" = "base64" , "schema"="required,int32,siteid;" ); ``` Field `schema` is required, which defines the schema of a parquet file. Prefix `parquet.` is the parquet file properties, like compression, version, enable_dictionary.	2021-06-10 17:34:01 +08:00
Lijia Liu	4d64612b96	[ARRAY]Save array's size instead of offset. (#5983 ) * Save array's size instead of offset. * Optimize variable name * Fix comment	2021-06-10 12:32:58 +08:00
caiconghui	d9c128b744	[BrokerLoad] Support read properties for broker load when read data (#5845 ) * [BrokerLoad] support read properties for broker load when read data Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-06-09 14:59:55 +08:00
weizuo93	61af76b8fb	[Log] fix log error when commit transaction in txn manager (#5937 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-06-06 22:05:40 +08:00
stdpain	d790cc6a50	[BUG] Fixed the problem that substring function may access illegal address (#5952 )	2021-06-03 18:38:10 +08:00
weizuo93	4ef1dbf394	[Bug] Fix lack of rdlock before rowset_with_max_version() in compaction log (#5953 )	2021-06-03 10:01:35 +08:00
Mingyu Chen	81ecf3d097	[Bug] Rebuilt version graph of a tablet when there are too many orphan vertex (#5945 ) The version information of the tablet will be stored in the memory in an adjacency graph data structure. And as the new version is written and the old version is deleted, the data structure will begin to have empty vertex with no edge associations(orphan vertex). These orphan vertexs should be removed somehow.	2021-06-03 09:59:20 +08:00
曹建华	4c0a98e8bf	[BE] Optimize version retrieval efficiency. (#5831 ) * [FE] Optimize version retrieval efficiency in high-frequency import/compaction scenarios. * Jump out of the loop when encountering the reverse edge.	2021-06-02 09:58:21 +08:00
Mingyu Chen	ba868c610f	[Optimize] Optimize some tablet scheduling logic (#5926 ) 1. The partitions set by the admin repair command are prioritized to ensure that the tablets of these partitions can be repaired as soon as possible. 2. Add an FE metric "query_begin" to monitor the number of queries submitted to the Doris.	2021-05-30 23:08:59 +08:00
xinghuayu007	63c99eb4cb	[Cache][Enhancement] Assure sql cache only one version (#5793 ) For PR #5792. This patch add a new param `cache type` to distinguish sql cache and partition cache. When update sql cache, we make assure one sql key only has one version cache.	2021-05-28 13:45:47 +08:00
Xinyi Zou	80f0b5fd1c	[BUG] Fix calculation error when the memory parameter is a float value percentage (#5916 ) When parsing memory parameters in `ParseUtil::parse_mem_spec`, convert the percentage to `double` instead of `int`. The currently affected parameters include `mem_limit` and `storage_page_cache_limit`	2021-05-27 22:06:50 +08:00
stdpain	f4ebac0210	[BUG] BE core when FE get_stream_load_record (#5913 )	2021-05-27 22:06:26 +08:00
Xinyi Zou	4343354711	[BUG] Fix in memory table may cause a lot of CPU consumption when LRU Cache evict (#5908 ) According to the LRU priority, the `lru list` is split into `lru normal list` and `lru durable list`, and the two lists are traversed in sequence during LRU evict, avoiding invalid cycles.	2021-05-27 22:05:41 +08:00
EmmyMiao87	0f4a39f82d	[LOG]Hiding stack info of memory exceed in the log (#5896 ) If query is memory exceed, a detail info where memory exceed is required. However it is not necessary to return the entire query stack to the end user. The query stack only needs to be printed in the be log.	2021-05-27 22:04:17 +08:00
Zhengguo Yang	ba38973209	use virtual hosted-style request to access object store (#5894 ) * use virtual hosted-style access request object store	2021-05-27 15:52:07 +08:00
stdpain	d6076af938	[BUG] fix BE coredump if result sink prepare failed (#5899 )	2021-05-26 10:02:55 +08:00
stdpain	6924637e64	[BUG] fix compression bug while compaction (#5893 ) Because the maximum length of LZ4 compression is 2^32, it can cause some memory problems	2021-05-26 10:02:39 +08:00
HappenLee	629e440a67	[Bug] Fix the bug of nullif function: (#5882 ) 1. Prevent return NULL call nullif(98, null) in FE 2. Support DecimalV2 of nullif function to get the right result	2021-05-26 10:01:17 +08:00
stdpain	9dd54b83b8	[optimize] avoid extra memory alloc in object pool (#5871 )	2021-05-26 09:58:21 +08:00
stdpain	1ec615c562	[BUG] Fixed some uninitialized variables (#5850 ) Fixed some potential bugs caused by uninitialized variables	2021-05-25 10:34:35 +08:00
stdpain	63662194ab	[BUG] Fix Stream Load cost too much memory (#5875 )	2021-05-25 10:34:10 +08:00
stdpain	659d6347a0	[BUG] fix some extra memory in bitmap operate (#5857 )	2021-05-22 23:38:28 +08:00
Mingyu Chen	591d391bbc	[Bug] Fix bug that the buffered reader may read at wrong position. (#5847 ) The buffered reader's _cur_offset should be initialized as same as the inner file reader's, to make sure that the reader will start to read at rignt position.	2021-05-22 23:38:10 +08:00
Mingyu Chen	07ad038870	[Feature][RoutineLoad] Support for consuming kafka from the point of time (#5832 ) Support when creating a kafka routine load, start consumption from a specified point in time instead of a specific offset. eg: ``` FROM KAFKA ( "kafka_broker_list" = "broker1:9092,broker2:9092", "kafka_topic" = "my_topic", "property.kafka_default_offsets" = "2021-10-10 11:00:00" ); or FROM KAFKA ( "kafka_broker_list" = "broker1:9092,broker2:9092", "kafka_topic" = "my_topic", "kafka_partitions" = "0,1,2", "kafka_offsets" = "2021-10-10 11:00:00, 2021-10-10 11:00:00, 2021-10-10 12:00:00" ); ``` This PR also reconstructed the analysis method of properties when creating or altering routine load jobs, and unified the analysis process in the `RoutineLoadDataSourceProperties` class.	2021-05-22 23:37:53 +08:00
Mingyu Chen	ad2820768c	[Build] Enable brpc cpu and heap profile (#5835 ) Enable brpc profile with macro BRPC_ENABLE_CPU_PROFILER, so that we can get cpu and heap profile at runtime via brpc web interface.	2021-05-19 09:30:25 +08:00
HappenLee	1a81b9e160	[MemTracker] Some enchance of MemTracker (#5783 ) 1 Make some MemTracker have reasonable parent MemTracker not the root tracker 2 Make each MemTracker can be easily to trace. 3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.	2021-05-19 09:27:50 +08:00
Xinyi Zou	5748241dab	[Bug-fix] When query cancel, transfer_thread does not continue to schedule scanner_thread (#5768 ) The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule OlapScanNode::scanner_thread until all tasks are scheduled. Although each task does not scan data and exits quickly, it still consumes a lot of resources. (Guess)This may be the cause of the BUG (#5767) causing the I/O to be full. So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for the end of all scanner_threads, transfer_thread will also exit.	2021-05-19 09:26:58 +08:00
caiconghui	add8c4bb74	[Load] Support reading multi-line json objects for JsonScanner (#5774 ) Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-05-18 15:44:45 +08:00
stdpain	17cd32ffee	[BUG] Fixed uninitialized variables in compaction (#5828 )	2021-05-18 12:13:58 +08:00
HappenLee	d0462f4383	[Bug] Fix Backend UT Problem (#5784 ) (#5785 ) 1. relocation R_X86_64_32 against `__gxx_personality_v0' can not be used when making a shared object; recompile with -fPIC 2. warning: the use of `tmpnam' is dangerous, better use `mkstemp' 3. Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test couldn't detect the number of threads.	2021-05-17 11:51:59 +08:00
stdpain	a359b1cb8b	[UT] fix ut failed in new_metrics_test (#5817 )	2021-05-17 11:51:22 +08:00
Mingyu Chen	9d25bfe980	[Bug] Fix bug that database not found when replaying batch transaction remove log (#5815 ) * [Bug] Fix bug that database not found when replaying batch transaction remove log [GlobalTransactionMgr.replayBatchRemoveTransactions():353] replay batch remove transactions failed. db 0 org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = databaseTransactionMgr[0] does not exist at org.apache.doris.transaction.GlobalTransactionMgr.getDatabaseTransactionMgr(GlobalTransactionMgr.java:84) ~[palo-fe.jar:3.4.0] at org.apache.doris.transaction.GlobalTransactionMgr.replayBatchRemoveTransactions(GlobalTransactionMgr.java:350) [palo-fe.jar:3.4.0] at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:601) [palo-fe.jar:3.4.0] at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2452) [palo-fe.jar:3.4.0] at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:101) [palo-fe.jar:3.4.0] at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:3.4.0] at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0] The id of information_scheam database is 0, and it has no txn at all.	2021-05-17 11:50:46 +08:00
Zhengguo Yang	01a45e8691	add read buffer when use s3 reader (#5791 )	2021-05-17 11:46:38 +08:00
luozenglin	0c83e43a67	[Optimize] Optimize profile lock conflict and view profile while query is executing (#5762 ) 1. Reduce lock conflicts in RuntimeProfile of be; 2. can view query profile when the query is executing; 3. reduce wait time for 'show proc /current_queries'.	2021-05-13 22:33:26 +08:00
luozenglin	b686205b97	[Optimize] Reduce lock conflicts in ThreadResourceMgr of be (#5772 ) Removed some useless code that caused lock conflicts in ThreadResourceMgr of be.	2021-05-12 10:59:53 +08:00
HappenLee	d7d50f7ffa	[Optimize] Speed up the bulk data load to ODBC table. (#5765 ) 1. Batch Insert 2. Use fmt to repalce stringstream 3. Add some profile of ODBC_TABLE_SINK	2021-05-12 10:58:52 +08:00
stdpain	bd88309346	[Refactor] fix warning in gcc8+, fix warning from brpc, s2 (#5763 ) Fix warning from brpc, S2 Fix -Warray-bounds	2021-05-12 10:38:46 +08:00

1 2 3 4 5 ...

1407 Commits