Commit Graph

44 Commits

Author SHA1 Message Date
ccc1b9d98c Optimize percentile_approx through radix sort (#2102) (#2107) 2019-11-05 09:25:47 +08:00
f53f188c5d Add arrow IPC serialization for Doris-Spark-Connector (#2013) 2019-10-31 10:32:06 +08:00
05643dc403 Replace Arena with MemPool (#2012)
After replacing Arena with MemPool, we can achieve one copy for string
value read from segment v2. We can exchange MemPool's chunk between
RowBlockV2 and RowBlock. This change only replace Arena, this work will
be done in other change list.
2019-10-19 15:53:24 +08:00
024348d74b Enable auto convert when check in (#1926)
Leverage gitattributes to enable auto convert end-of-line to LF when
checking in. Convert already exist CRLF to LF by removing all files and
checking out with new .gitattributes file. Except .gitattributes, all
files are only modified at the end of line.
2019-10-09 22:31:27 +08:00
f852f50acb Improve unique id performance (#1911)
Remove the default constructor for UniqueID
Add a gen_uid method in UniqueId. If need to generate a new uid, users should call this api explicitly.
Reuse boost random generator not generate a new one every time.
2019-09-29 18:20:02 +08:00
0c22d8fa08 Add frame_of_reference page (#1818) 2019-09-28 01:10:29 +08:00
c643cbd30c Optimize the load performance for large file (#1798)
The current load process is:

Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk

In the path of Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk, the following operations are performed:

Insert tuple into different memtables according to tablet ID
When the memtable size reaches the threshold, it is written to disk.
The above operations are equivalent to single thread execution for a single load task.
In fact, the insertion of memtable and the flush of memtable can be executed synchronously.
Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing.

In the new implementation, I added a MemTableFlushExecutor class with a set of flush queues and corresponding worker threads.
By default, each data directory uses two worker threads for flush, which can be modified by the parameter flush_thread_num_per_store of BE.
DeltaWriter will push the full memtable to MemTableFlushExecutor for flush operation and generate a new memtable for receiving new data.

This design can improve the performance of load large files.
In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.
2019-09-25 13:49:32 +08:00
65dcabf1df Use crc32c checksum for segment v2 (#1753) 2019-09-06 15:23:57 +08:00
f76dad289e Basic implementation for BetaRowsetReader (#1718) 2019-09-03 13:52:16 +08:00
6865f4238b Add limit to show tablet stmt (#1547)
Also add some where predicates for filtering results
ISSUE #1687
2019-08-28 16:25:12 +08:00
58801c6ab0 Support converting RowBatch and RowBlockV2 to/from Arrow (#1699) 2019-08-27 11:30:00 +08:00
acf868c9d0 Support page compression and checksum in BetaRowset (#1646) 2019-08-19 09:40:47 +08:00
c0253a17fc Add block compression codec and remove not used codec (#1622) 2019-08-12 20:47:16 +08:00
a9e8113b82 Fix heap-buffer-overflow in split_part() function in StringFunctions (#1482) 2019-07-15 23:00:37 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
a7390c03f4 Add percentile_approx aggregate function (#1432) 2019-07-11 16:44:43 +08:00
7eab12a40e Support reading Parquet file when loading data (#1173) 2019-07-01 18:39:27 +08:00
7f1720b632 Add rle encoding (#1326) 2019-06-18 14:48:33 +08:00
9d03ba236b Uniform Status (#1317) 2019-06-14 23:38:31 +08:00
e9b2d30c6a Add faststring and cpu util (#1281) 2019-06-12 14:00:50 +08:00
84632cd062 Add BitMapIterator (#1277) 2019-06-11 09:23:02 +08:00
3e1c70d1b7 Add coding function (#1264) 2019-06-08 21:02:31 +08:00
a08170fd50 Enhance the usabilities (#1100)
* Enhence the usabilities

1. Add metrics to monitor transactions and steaming load process in BE.
2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s.
3. Modify FE config 'enable_metric_calculator' to true.
4. Add more log for tracing broker load process.
5. Modify the query report process, to cancel query immediately if some instance failed.

* Fix bugs
1. Avoid NullPointer when enabling colocation join with broker load
2. Return immediately when pull load task coordinator execution failed
2019-05-07 15:55:04 +08:00
e8b360d193 Merge master and fix BE ut 2019-04-28 10:33:50 +08:00
ff7d3e5878 Unify the print method of TUniqueId (#487) 2018-12-29 16:22:38 +08:00
e2bb86cf78 Add Md5Digest to util (#420) 2018-12-12 20:06:35 +08:00
9a2ad18428 Add path info of replica in catalog (#327)
Add path info of replica in catalog

Also fix a bug that when calling check_none_row_oriented_table,
store is null, it cannot be used to create table.
Instead, OLAPHeader can be used to get storage type information.
2018-11-19 17:42:46 +08:00
0aea149c0b Fix core local value UT failed (#324)
Issue: #323
2018-11-16 15:27:16 +08:00
c877b43013 Remove my aes and fix palo ns to doris (#277) 2018-11-02 17:05:48 +08:00
d57e91db6e Rewrite aes encryption (#264)
Resolve #257
2018-11-02 15:26:31 +08:00
37b4cafe87 Change variable and namespace name in BE (#268)
Change 'palo' to 'doris'
2018-11-02 10:22:32 +08:00
2868793b6b Change license to Apache License 2.0 (#262) 2018-11-01 09:06:01 +08:00
051aced48d Missing many files in last commit
In last commit, a lot of files has been missed
2018-10-31 16:19:21 +08:00
5d3fc80067 Added:
* Add streaming load feature. You can execute 'help stream load;' to see more information.

Changed:
* Loading phase of a certain table can be parallelized, to reduce the load job execution time when multi load jobs to a single table.
* Using RocksDB to save the header info of tablets in Backends, to reduce the IO operations and increate speeding of restarting.

Fixed:
* A lot of bugs fixed.
2018-10-31 14:46:22 +08:00
765c91bbc2 Added: change Doris build.sh to get environment variables from
custom_env.sh, and add run-ut.sh and run-fe-ut.sh
2018-10-30 23:42:05 +08:00
ae9ce81453 Changed: change build.sh to use environment variable to get thirdparty's
path, and change PALO_HOME to DORIS_HOME
2018-10-30 16:29:06 +08:00
4f6f8572de Added: Add 3 new metrics of Backends: host_fd_metrics, process_fd_metrics and process_thread_metrics, to monitor open file number and thread number.
Added: Support getting column size and precision info of table or view using JDBC.

Updated: Change the promethues type name GAUGE to lowercase, to fit the latest promethues version.
Updated: Backend ip saved in FE will be compared with BE's local ip when doing heartbeat, to avoid false positive heartbeat response.
Updated: Using version_num of tablet instead of calculating nice value to select cumulative compaction candicates.

Fixed: Predicates should not be pushed down to subquery which contains limit clause.
Fixed: Fix the formula of calculating BE load score.
Fixed: Fix a bug that in some edge cases, non-master Fontend may wait for a unnecessary long timeout after forwarding cmd to Master FE.
Fixed: A bug that granting privs on more than one table does not work.
Fixed: Support 'Insert into' table which contains HLL columns.
Fixed: ExportStmt' toSql() method may throw NullPointer Exception if table does not exist.
Fixed: Remove unnecessary 'get capacity' operation to avoid IO impact.

Internal commit id: merge to c16bd603a53dfe2089ff95704c698a738c317792
2018-10-26 14:48:21 +08:00
19997510a6 merge to 9625ef157dd44c58802d63cb7547f037b75fd710 (#208)
1. Implement Backend http server using libevent instead of mongoose.
2. Remove Old Hypertable rpc framework, use brpc instead.
3. Change rpc from FE to BE to brpc.
4. Fs broker support HDFS HA.
5. add more metrics to monitor.
6. Lots of bug fixed.
2018-07-17 09:20:30 +08:00
2419384e8a push 3.3.19 to github (#193)
* push 3.3.19 to github

* merge to 20ed420122a8283200aa37b0a6179b6a571d2837
2018-05-15 20:38:22 +08:00
1e951e5c1c fix ut compile. set timeout to pull load task. fix export sink bug (#175) 2018-01-09 10:42:57 +08:00
585c21fab4 add feature and fix bugs (#148)
Add new features:
1. plugins of Ambari and k8s deploy
2. specified config 'priority_network' to solve some ip problems

Fix bugs:
fix bugs that rebalance does not work in some case.
fix count(*) from union stmt bug
fix some union stmt bugs
fix bugs when try to schema change a clone replica
2017-11-30 16:31:12 +08:00
db8c40e5f0 add authentication to DownloadAction (#91)
* add authentication to DownloadAction

1. use cluster_id as token;
2. add dir limit, only files in data dir can be accessed.

* enable authentication in DownloadAction by default
2017-09-13 16:54:00 +08:00
6486be64c3 fix license statement (#29)
* change picture to word

* change picture to word

* SHOW FULL TABLES WHERE Table_type != VIEW sql can not execute

* change license description
2017-08-18 19:16:23 +08:00
e2311f656e baidu palo 2017-08-11 17:51:21 +08:00