doris

Author	SHA1	Message	Date
Xinyi Zou	dd11d5c0a5	[enhancement](memory) Support try catch bad alloc (#14135 )	2022-11-13 11:22:56 +08:00
xy720	035657c5a1	[typo](comment) Fix a lot of spell errors in be comments (#14208 ) fix typos.	2022-11-12 16:06:15 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
jiafeng.zhang	a19e6881c7	[chore](be web ui)upgrade jquery version to 3.6.0 (#13942 ) * upgrade jquery version to 3.6.0 * update license dist	2022-11-04 16:20:17 +08:00
Yongqiang YANG	54545c6446	[improvement](config) enlarge default value of create_table_timeout and remove disable_stream_load_2pc (#13520 ) Users do not need to set create_table_timeout, it is a ddl command and when encounter a timeout event users will set a lager timeout and retry. Stream load 2pc is used by default in flink connector, so we should not disable it by config, the config item is useless.	2022-10-24 11:51:18 +08:00
yixiutt	6d322f85ac	[improvement](compaction) delete num based compaction policy (#13409 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-10-18 16:13:28 +08:00
Mingyu Chen	dbf71ed3be	[feature-wip](new-scan) Support stream load with csv in new scan framework (#13354 ) 1. Refactor the file reader creation in FileFactory, for simplicity. Previously, FileFactory had too many `create_file_reader` interfaces. Now unified into two categories: the interface used by the previous BrokerScanNode, and the interface used by the new FileScanNode. And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files. 2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode 3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside. 4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.	2022-10-17 23:33:41 +08:00
pengxiangyu	af7b6524f2	add hide config to hide config in webserver for safety. (#13255 )	2022-10-12 10:27:09 +08:00
yixiutt	3dc4dc6d43	[compaction](http_action) enable be run manual compaction concurrently (#13219 ) In some case, we need to run manual compaction via http interface concurrently, so we remove the mutex and tablet's compaction lock is enough to prevent concurrent compaction in tablet. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-10-10 08:33:18 +08:00
xueweizhang	70ab9cb43e	[feature](http) refactor version info and add new http api for get version info (#12513 ) Refactor version info and add new http api for get version info	2022-09-22 10:53:04 +08:00
Xinyi Zou	3bb042e45c	[fix](memtracker) Process physical mem check does not include tc/jemalloc allocator cache (#12688 ) tcmalloc/jemalloc allocator cache does not participate in the mem check as part of the process physical memory. because new/malloc will trigger mem hook when using tcmalloc/jemalloc allocator cache, but it may not actually alloc physical memory, which is not expected in mem hook fail. in addition: The value of tcmalloc/jemalloc allocator cache is used as a mem tracker, the parent is the process mem tracker, which is updated every 1s. Modify the process default mem_limit to 90%. expect mem tracker to effectively limit the memory usage of the process.	2022-09-17 11:31:01 +08:00
Stalary	87439e227e	[Enhancement](DOE): Doe support object/nested use string (#12401 ) * MOD: doe support object/nested use string	2022-09-13 09:59:48 +08:00
zhannngchen	38937c15d7	[typo](streamload) fix typo and remove useless method declaration #12343	2022-09-05 19:16:36 +08:00
Mingyu Chen	22430cd7bb	[feature](stmt) add ADMIN COPY TABLET stmt for local debug (#12176 ) Add a new stmt ADMIN COPY TABLET for easy copy a tablet to local env to reproduce problem. See document for more details.	2022-08-31 09:06:49 +08:00
Pxl	67e94d2aea	[Enhancement](compaction) add compaction use time count (#12141 )	2022-08-30 09:18:02 +08:00
yixiutt	1b0b5b5f09	[Enhancement](load) add hidden_columns in stream load param (#11625 ) Stream load will ignore invisible columns if no http header columns specified, but in some case user cannot get all columns if columns changed frequently。 Add a hidden_columns header to support hidden columns import。User can set hidden_columns such as __DORIS_DELETE_SIGN__ and add this column in stream load data so we can delete this line. For example: curl -u root -v --location-trusted -H "hidden_columns: __DORIS_DELETE_SIGN__" -H "format: json" -H "strip_outer_array: true" -H "jsonpaths: [\"$.id\", \"$.name\",\"$.__DORIS_DELETE_SIGN__\"]" -T 1.json http://{beip}:{be_port}/api/test/test1/_stream_load Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-19 14:57:11 +08:00
Xinyi Zou	7d836cf0c7	[fix](memtracker) Fix flush memtable to reduce load channel mem not executed (#11771 ) The memory value automatically tracked by the tcmalloc hook in the DeltaWriter is smaller than the value recorded manually in the memtable, because the first 4096-byte Chunk requested by each MemPool when the memtable is initialized is not tracked to the DeltaWriter by the hook. The values of the two are not equal, causing the mem_consumption() == _mem_table->memory_usage branch judgment to fail.	2022-08-16 14:30:45 +08:00
weizuo93	838fdc1354	[Bug](httpserver) Fix bug that http server should not be stoped in destructor if it not running Co-authored-by: weizuo <weizuo@xiaomi.com>	2022-08-03 19:44:46 +08:00
weizuo93	f730a048b1	[feature-wip](load) Support single replica load (#10298 ) During load process, the same operation are performed on all replicas such as sort and aggregation, which are resource-intensive. Concurrent data load would consume much CPU and memory resources. It's better to perform write process (writing data into MemTable and then data flush) on single replica and synchronize data files to other replicas before transaction finished.	2022-08-02 11:44:18 +08:00
Mingyu Chen	abbf75d302	[doc][refactor](metrics) Reorganize FE and BE metrics and add document (#11307 )	2022-08-02 11:34:06 +08:00
weizuo93	5c1cd058f2	[Feature] Add interface to check tablet segment lost (#10711 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2022-08-02 09:40:04 +08:00
Xinyi Zou	b6bdb3bdbc	[fix] (mem tracker) Fix MemTracker accuracy (#11190 )	2022-07-27 18:59:24 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Xinyi Zou	d5fa66d9a3	[Enhancement] [Memory] Limit memory usage use process actual physical memory (#10924 )	2022-07-19 11:08:39 +08:00
lihangyu	b04a791895	[Enhancement] support compile with jemalloc (#10542 ) A test feature to use jemalloc as default malloc.	2022-07-11 12:15:35 +08:00
plat1ko	331fa50501	[feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280 ) This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet, and there is no necessary to prohibit loading new data to cooled tablets. Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without perceiving the underlying filesystem. The abstracted `RemoteFileSystem` can try local caching strategies with different granularity, instead of caching segment files as before. To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory. In the future, `FileReader`s and `FileWriter`s should be unified.	2022-07-08 12:18:39 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
yiguolei	aab7dc956f	[refactor](load) Remove mini load (#10520 )	2022-06-30 23:21:41 +08:00
Mingyu Chen	8a49c7ef04	[chore] Rename Doris binary output format	2022-06-24 15:30:05 +08:00
yinzhijian	75a7e72402	[Refactor] Use iequal to replace boost::iequals (#10146 ) * [Refactor] Use iequal to replace boost::iequals * remove unused include	2022-06-16 18:18:38 +08:00
yinzhijian	cbbda7857b	[feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541 )	2022-05-26 21:39:12 +08:00
jacktengg	9236c2efc9	[improvement] Show detail status code string for be http api (#9771 ) 1. move to_json method to common/status 2. modify related usage in http folder	2022-05-26 15:09:21 +08:00
Yongqiang YANG	defdae1e7d	[improvement](stream-load) adjust read unit of http to optimize stream load (#9154 )	2022-05-20 09:52:36 +08:00
yinzhijian	bee5c2f8aa	[feature-wip](parquet-vec) Support parquet scanner in vectorized engine (#9433 )	2022-05-17 09:37:17 +08:00
plat1ko	4cd579b155	[refactor] Check status precise_code instead of construct OLAPInternalError (#9514 ) * check status precise_code instead of construct OLAPInternalError * move is_io_error to Status	2022-05-12 15:39:29 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
Xinyi Zou	b34ed43ec9	[feature-wip] (memory tracker) (step6, End) Fix some details (#9301 ) 1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track. 2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability. 3. More powerful MemTracker debugging capabilities. 4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%. 5. Fix some other details.	2022-05-10 18:17:09 +08:00
hongbin	e61d296486	[Refactor] Replace '#ifndef' with '#pragma once' (#9456 ) * Replace '#ifndef' with '#pragma once'	2022-05-10 09:25:59 +08:00
Gabriel	e7f12db06c	[fixbug][compaction] update OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSION (#9410 )	2022-05-07 08:39:20 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
chenlinzhong	afce993ca7	[feature](load)(csv) CSV import and export support header (#8765 ) - Add two new types to stream load boker load: csv_with_names and csv_with_name_sand_types - Add two new types to export: csv_with_names and csv_with_names_and_types	2022-04-18 15:29:18 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
caiconghui	98cab78320	[refactor](schema_hash) remove schema_hash since every tablet id in be is unique (#8574 )	2022-04-07 08:37:45 +08:00
zhannngchen	0c98c1ee03	[Improvement][fix](compaction) Change min_compaction_failure_interval_sec to 5 and fix a bug of log (#8781 ) see issue #8767	2022-04-02 13:00:56 +08:00
caiconghui	c69dd54116	[refactor](mutex) Use std::mutex to replace Mutex and refactor some lock logic (#8452 )	2022-03-24 14:50:02 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
caiconghui	83521a826a	[Feature](create_table) Support create table with random distribution to avoid data skew (#8041 ) In some scenarios, users cannot find a suitable hash key to avoid data skew, so we need to provide an additional data distribution for olap table to avoid data skew example: CREATE TABLE random_table ( siteid INT DEFAULT '10', citycode SMALLINT, username VARCHAR(32) DEFAULT '', pv BIGINT SUM DEFAULT '0' ) AGGREGATE KEY(siteid, citycode, username) DISTRIBUTED BY random BUCKETS 10 PROPERTIES("replication_num" = "1"); Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2022-02-26 10:38:55 +08:00
Zhengguo Yang	50864aca7d	[refactor] fix warings when compile with clang (#8069 )	2022-02-19 11:29:02 +08:00
Mingyu Chen	26289c28b0	[fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072 ) 1. Fix the problem of BE crash caused by destruct sequence. (close #8058) 2. Add a new BE config `compaction_task_num_per_fast_disk` This config specify the max concurrent compaction task num on fast disk(typically .SSD). So that for high speed disk, we can execute more compaction task at same time, to compact the data as soon as possible 3. Avoid frequent selection of unqualified tablet to perform compaction. 4. Modify some log level to reduce the log size of BE. 5. Modify some clone logic to handle error correctly.	2022-02-17 10:52:08 +08:00

1 2 3 4 5

211 Commits