doris

Author	SHA1	Message	Date
lichaoyong	720808fda5	Remove config::max_file_descriptor_number (#1833 )	2019-09-20 07:50:57 +08:00
lichaoyong	315f762523	Seek block when starts a ScanKey (#1828 ) In Doris, one block has 1024 rows. 1. If the previous ScanKey scan rows multiple blocks, and also the final block has 1024 rows just right. 2. The current ScanKey scan rows with number less than one block. Under the two conditions, if not seek block, the position of prefix shortkey columns is wrong.	2019-09-19 20:08:03 +08:00
ZHAO Chun	17e52a4bac	Improve LRUCache to get better performance (#1826 ) In this CL, I move the entry's deleter out of LRUCache's mutex block, which can let others access this cache without waiting free cache entry.	2019-09-19 17:37:02 +08:00
ZHAO Chun	11eafe524f	Add ChunkAllocator to accelerate chunk allocation (#1792 ) I add ChunkAllocator in this CL to put unused memory chunk to a chunk pool other than return it to system allocator. Now we only change MemPool's chunk allocation and free to this. And two configuration are introduduced too. 'chunk_reserved_bytes_limit' is the limit of how many bytes this chunk pool can reserve in total and its default value is 2147483648(2GB). 'use_mmap_allocate_chunk': if chunk is allocated via mmap and default value is false. And in my test case with default configuration a simple like "select * from table limit 10", this can improve throughput from 280 QPS to to 650 QPS. And when I config 'chunk_reserved_bytes_limit' to 0, which means this is disabled, the throughput is the same with origin's.	2019-09-13 08:27:24 +08:00
Mingyu Chen	9aa2045987	Refactor alter job (#1695 )	2019-09-12 16:31:29 +08:00
wubiao	dad4def708	Support estimate size for v2 segment writer (#1787 )	2019-09-12 15:15:39 +08:00
Dayue Gao	5653822298	Writer magic number in footer instead of header (#1771 )	2019-09-10 09:54:13 +08:00
kangkaisen	cd5cfea5cc	Encapsulate HLL logic (#1756 )	2019-09-09 15:52:10 +08:00
ZHAO Chun	a349409838	Move compare from RowCursor to row (#1764 )	2019-09-09 14:51:13 +08:00
Dayue Gao	65dcabf1df	Use crc32c checksum for segment v2 (#1753 )	2019-09-06 15:23:57 +08:00
Dayue Gao	54fd3652e6	Fix bug in BetaRowsetReader which results in empty result (#1754 )	2019-09-06 15:07:23 +08:00
kangkaisen	3f22238012	Add check for to_bitmap function argument (#1747 )	2019-09-05 18:11:38 +08:00
Dayue Gao	85940a292b	RowsetFactory as a single entry for Rowset creation (#1748 )	2019-09-05 18:10:18 +08:00
Dayue Gao	a63989cc61	Use RowsetFactory to create and init RowsetWriter (#1740 )	2019-09-04 17:02:43 +08:00
Dayue Gao	f76dad289e	Basic implementation for BetaRowsetReader (#1718 )	2019-09-03 13:52:16 +08:00
yiguolei	a80e9996a6	Move version to high 8 bit (#1736 )	2019-09-02 19:43:04 +08:00
ZHAO Chun	b4f6f755f1	Add exchange in MemPool to reduce alloc/free operation (#1732 ) Reuse allocated chunks when storage read operation.	2019-09-02 19:29:30 +08:00
yiguolei	6f4feca3dc	Add rowset id generator to FE and BE (#1678 )	2019-09-02 18:51:31 +08:00
Mingyu Chen	76987275b9	Fix result of unix_timestamp() (#1727 )	2019-08-30 21:39:16 +08:00
kangpinghuang	6865f4238b	Add limit to show tablet stmt (#1547 ) Also add some where predicates for filtering results ISSUE #1687	2019-08-28 16:25:12 +08:00
kangpinghuang	34a6e06cb1	fix from string bug(#1710 ) (#1713 )	2019-08-27 18:43:49 +08:00
Dayue Gao	ae22d5e682	Support multiple key ranges in RowwiseIterator and StorageReadOptions (#1704 ) support multiple key ranges in RowwiseIterator and StorageReadOptions remove unused fields and member functions in RowBlock and ColumnData read num_rows_per_block from short key index footer	2019-08-27 17:57:42 +08:00
ZHAO Chun	58801c6ab0	Support converting RowBatch and RowBlockV2 to/from Arrow (#1699 )	2019-08-27 11:30:00 +08:00
kangkaisen	1e4dd77d2a	Add bitmap agg type and udaf (#1610 )	2019-08-26 14:24:42 +08:00
Dayue Gao	da8b9aad9a	Remove preaggregation and index stream cache stuff out of RowsetReaderContext (#1698 )	2019-08-26 14:19:03 +08:00
kangpinghuang	6d040a33af	Add zone map page(#1390 ) (#1633 )	2019-08-24 00:57:30 +08:00
ZHAO Chun	acf868c9d0	Support page compression and checksum in BetaRowset (#1646 )	2019-08-19 09:40:47 +08:00
yuanli	ba6d728f26	Enable parsing columns from file path for Broker Load (#1582 ) (#1635 ) Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv This patch is able to parse columns from file path like in Spark(Partition Discovery). This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.	2019-08-19 09:39:21 +08:00
Mingyu Chen	82d0afc1ba	FROM_UNIXTIME should only convert timestamp from 0 to 253402271999 (#1658 ) which is between 1970-01-01 00:00:00 ~ 9999-12-31 23:59:59, otherwise, return null	2019-08-16 18:29:57 +08:00
HangyuanLiu	199ff968dc	Fix time zone compatibility (#1631 )	2019-08-13 18:44:35 +08:00
ZHAO Chun	032d0b41bb	Fix compile error (#1630 )	2019-08-13 10:00:18 +08:00
HangyuanLiu	69af50aa8c	Time zone related BE function (#1598 ) Details can be found in time-zone.md document	2019-08-12 20:57:59 +08:00
ZHAO Chun	c0253a17fc	Add block compression codec and remove not used codec (#1622 )	2019-08-12 20:47:16 +08:00
Dayue Gao	af8256be2a	Implement BetaRowsetWriter (#1590 ) BetaRowsetWriter is used to write rowset in V2 segment format. This PR contains several interface changes 1. Rowset.make_snapshot() is renamed to `link_files_to` because hard links are also useful in copy task, linked schema change, etc 2. Rowset.copy_files_to_path() is renamed to `copy_files_to` to be consistent with other names 3. RowsetWriter.mem_pool() is removed because not all rowset writers use MemPool 4. RowsetWriter.garbage_collection() is removed because it's not used by clients 5. SegmentGroup's make_snapshot() is removed because link_segments_to_path() provides similar functionality	2019-08-12 16:41:47 +08:00
ZHAO Chun	2bd01b23c7	Add page cache for column page in BetaRowset (#1607 )	2019-08-12 10:42:00 +08:00
Yunfeng,Wu	e3348c46a9	Expose data pruned-filter-scan ability (#1527 )	2019-08-11 12:59:24 +08:00
ZHAO Chun	b2e678dfc1	Support Segment for BetaRowset (#1577 ) We create a new segment format for BetaRowset. New format merge data file and index file into one file. And we create a new format for short key index. In origin code index is stored in format like RowCusor which is not efficient to compare. Now we encode multiple column into binary, and we assure that this binary is sorted same with the key columns.	2019-08-06 17:15:11 +08:00
ZHAO Chun	c5edf9dae0	Unify Field and ColumnSchema in Storage (#1561 ) Currently, we have Field and ColumnSchema to access column data in a row. These two classes are mostly the same. So we should unify these to one class. Now, Field has offset information, which is an row attribute, so we remove offset in Field. RowCursor now has some logic which belong to Schema, so in this patch I add Schema attribute to RowCursor to make RowCursor simple. After this change, only Schema will handle Field/ColumnSchema. I extract some logic from RowCursor to be/src/olap/row.h, then we can use same logic to handle different types of row. Each type of row has same function that to get Cell of this row. A cell represent a column content with a null indicator.	2019-07-30 14:01:57 +08:00
Mingyu Chen	0694b6a6fa	Fix bugs of Broker load (#1546 ) Use same UUID as query ID and load ID of a load execution plan. Each load execution plan has a load ID, and as a plan, there is also a query ID. We can use same UUID as query ID and load ID, for tracing the load process more easily. Change the load ID when retrying a load execution plan. When a load execution plan retry, the load ID should be changed, otherwise BE can not distinguish the old and new load requests. Cancel the running loading task when cancelling the broker load. When user cancel a broker load, the running loading task should also be cancelled, or it may occupies the worker thread for a long time. Remove the unnecessary query report when doing load execution plan. Only the last query report is needed. Add a new BE config tablet_writer_rpc_timeout_sec. It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading. Use streaming_load_max_mb instead of mini_load_max_mb in BE config. Add more logs for tracing a broker load process easily.	2019-07-27 20:17:05 +08:00
kangpinghuang	e8561d71a6	Add dict page (#1409 ) Add dict encoding page for binary/string type data. Construct a dict for original data, and save encoded id instead of origin data to save space. If the dict is too big, then will automatically fall back to plain encoding.	2019-07-26 09:47:11 +08:00
ZHAO Chun	dbc912d2df	Unify ColumnSchemaV2 and ColumnSchema to one (#1545 ) Currently, we have two versions of ColumnSchema, in this patch, we unify these two classes to one class.	2019-07-25 10:48:16 +08:00
ZHAO Chun	0805b05d81	Remove unused FieldInfo (#1540 )	2019-07-24 19:33:30 +08:00
ZHAO Chun	68782be7a6	Refactor storage aggregate framework (#1529 ) Add AggregateInfo to enclose all functions that used to aggregate value column.	2019-07-24 10:02:35 +08:00
HangyuanLiu	4aedaea84e	Support TIME type and timediff function (#1505 )	2019-07-23 13:42:39 +08:00
ZHAO Chun	0c8e91adf4	Add storage rowwise iterator (#1515 ) Use RowwiseIterator to uniform all data fetch in storage engine. All objects in storage engine can be read in iterator format. For example: Segment, Rowset. This patch implement two generic iterators: UnionRowwiseIterator, MergeRowwiseIterator. These two class will add iterator as its inputs. To implement iterators, we define a new class RowBlockV2, all data read from iterator is in this format. We define a new class other than use old version's RowBlock is because we want to keep old code work normally.	2019-07-22 14:35:11 +08:00
ZHAO Chun	41499061ac	Refactor types.h to reduce code and add UT (#1498 )	2019-07-18 12:24:41 +08:00
lichaoyong	a9e8113b82	Fix heap-buffer-overflow in split_part() function in StringFunctions (#1482 )	2019-07-15 23:00:37 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
HangyuanLiu	a7390c03f4	Add percentile_approx aggregate function (#1432 )	2019-07-11 16:44:43 +08:00
Candy	98bd4b4565	Add string function split_part (#1451 )	2019-07-10 09:47:33 +08:00

1 2 3

127 Commits