doris

Author	SHA1	Message	Date
Tiewei Fang	759f1da32e	[Enhencement](Backends) add `HostName` filed in backends table and delete backends table in information_schema (#18156 ) 1. Add `HostName` field for `show backends` statement and `backends()` tvf. 2. delete the `backends` table in `information_schema` database	2023-04-07 08:30:42 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
yiguolei	dd53bc1c8d	[unify type system](remove unused type desc) remove some code (#17921 ) There are many type definitions in BE. Should unify the type system and simplify the development. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-19 14:05:02 +08:00
WenYao	a8f20eb4ac	[Enhencement](schema_scanner) Optimize the performance of reading information schema tables (#17371 ) batch fill block batch call rpc from FE to get table desc For 34w colunms SELECT COUNT( * ) FROM information_schema.columns; time: 10.3s --> 0.4s	2023-03-06 09:53:01 +08:00
WenYao	68e9a66aa0	[Enchancement](schema scanner) add SchemaScanner profile (#17230 ) Add some profile information to the schema scanner to facilitate performance optimization. Example: SchemaScanner: - FillBlockTime: 9s131ms - GetDbTime: 12.816ms - GetDescribeTime: 1s645ms - GetTableTime: 25.433ms	2023-03-01 08:34:27 +08:00
Pxl	ca73c60442	[Chore](build) enable ignored-qualifiers check (#16196 ) enable ignored-qualifiers check	2023-02-01 15:15:59 +08:00
yiguolei	90b12143a3	[refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237 )	2023-01-30 16:45:14 +08:00
WenYao	69e748b076	[fix](schema scanner)change schema_scanner::get_next_row to get_next_block (#15718 )	2023-01-30 10:01:50 +08:00
xueweizhang	1597afcd67	[fix](mutil-catalog) fix get many same name db/table when show where (#15076 ) when show databases/tables/table status where xxx, it will change a selectStmt to select result from information_schema, it need catalog info to scan schema table, otherwise may get many database or table info from multi catalog. for example mysql> show databases where schema_name='test'; +----------+ \| Database \| +----------+ \| test \| \| test \| +----------+ MySQL [internal.test]> show tables from test where table_name='test_dc'; +----------------+ \| Tables_in_test \| +----------------+ \| test_dc \| \| test_dc \| +----------------+	2022-12-19 14:27:48 +08:00
Tiewei Fang	826cfdaf93	[feature](information_schema) add `backends` information_schema table (#13086 )	2022-11-08 22:15:10 +08:00
Ashin Gau	6d925054de	[feature-wip](parquet-reader) decode parquet time & datetime & decimal (#11845 ) 1. Spark can set the timestamp precision by the following configuration: spark.sql.parquet.outputTimestampType = INT96(NANOS), TIMESTAMP_MICROS, TIMESTAMP_MILLIS DATETIME V1 only keeps the second precision, DATETIME V2 keeps the microsecond precision. 2. If using DECIMAL V2, the BE saves the value as decimal128, and keeps the precision of decimal as (precision=27, scale=9). DECIMAL V3 can maintain the right precision of decimal	2022-08-22 10:15:35 +08:00
hongbin	e61d296486	[Refactor] Replace '#ifndef' with '#pragma once' (#9456 ) * Replace '#ifndef' with '#pragma once'	2022-05-10 09:25:59 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
Mingyu Chen	a46bf1ada3	[Authorization] Modify the authorization checking logic (#2372 ) Authorization checking logic There are some problems with the current password and permission checking logic. For example: First, we create a user by: `create user cmy@"%" identified by "12345";` And then 'cmy' can login with password '12345' from any hosts. Second, we create another user by: `create user cmy@"192.168.%" identified by "abcde";` Because "192.168.%" has a higher priority in the permission table than "%". So when "cmy" try to login in by password "12345" from host "192.168.1.1", it should match the second permission entry, and will be rejected because of invalid password. But in current implementation, Doris will continue to check password on first entry, than let it pass. So we should change it. Permission checking logic After a user login, it should has a unique identity which is got from permission table. For example, when "cmy" from host "192.168.1.1" login, it's identity should be `cmy@"192.168.%"`. And Doris should use this identity to check other permission, not by using the user's real identity, which is `cmy@"192.168.1.1"`. Black list Functionally speaking, Doris only support adding WHITE LIST, which is to allow user to login from those hosts in the white list. But is some cases, we do need a BLACK LIST function. Fortunately, by changing the logic described above, we can simulate the effect of the BLACK LIST. For example, First we add a user by: `create user cmy@'%' identified by '12345';` And now user 'cmy' can login from any hosts. and if we don't want 'cmy' to login from host A, we can add a new user by: `create user cmy@'A' identified by 'other_passwd';` Because "A" has a higher priority in the permission table than "%". If 'cmy' try to login from A using password '12345', it will be rejected.	2019-12-06 17:45:56 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00
morningman	2868793b6b	Change license to Apache License 2.0 (#262 )	2018-11-01 09:06:01 +08:00
morningman	cc74efb3c5	merge to ddb65b69f9c788e359e191889cb31f15279c41ec (#224 ) 1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication. 2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS. 3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user. 4. A lot of bugs fixed. 5. Performance improvement.	2018-08-24 17:12:26 +08:00
cyongli	e2311f656e	baidu palo	2017-08-11 17:51:21 +08:00

20 Commits