Fix profile not working in sql_cache enabled. It will thrown NullPointerException.
The reason is that the Coordinator in init profile is null when cache is enable.
Therefore, we should perform different profile processing in the case of cache hits and misses, so as to avoid the situation of null pointers.
Fixed#7104
BitShufflePageDecoder reuses the memory for storing decoder results, allocate memory directly from the
`ChunkAllocator`, the performance is improved to a certain extent.
In the case of #6285, the total time consumption is reduced by 13.5%, and the time consumption ratio of `~Reader()`
has also been reduced from 17.65% to 1.53%, and the memory allocation is unified to `ChunkAllocator` for centralized
management , Which is conducive to subsequent memory optimization.
which can avoid the memory waste caused by `Mempool`, because the chunk can be free at any time, but the
performance is lower than the allocation from `Mempool`. The guess is that there is no `Mempool` after secondary
allocation of large chunks , Will directly apply for a large number of small chunks from `ChunkAllocator`, and it takes
longer to lock in `pop_free_chunk` and `push_free_chunk` (but this is not proven from the flame graphs of BE's cpu and
contention).
1. Forbidden non-string column as params of explode_view.
The first param of explode_view must be string column(VARCHAR/CHAR/STRING)
2. N-1 n lateral views map one TableFunctionNode
The TableFunctionNode include all of fnExprs which belongs to one table.
For example:
select pageid,mycol1, mycol2 from pageAds
lateral view explode_string(col1) myTable1 as mycol1
lateral view explode_string(col2) myTable2 as mycol2;
TableFunctionNode
|----
|- fnExprList: explode_string(col1), explode_string(col2)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
Users can directly query the data in the hive table in Doris, and can use join to perform complex queries without laboriously importing data from hive.
Main changes list below:
FE:
Extend HiveScanNode from BrokerScanNode
HiveMetaStoreClientHelper communicate with HIVE and HDFS.
BE:
Treate HiveScanNode as BrokerScanNode, treate HiveTable as BrokerTable.
broker_scanner.cpp: suppot read column from HDFS path.
orc_scanner.cpp: support read hdfs file.
POM:
Add hive.version=2.3.7, hive-metastore and hive-exec
Add hadoop.version=2.8.0, hadoop-hdfs
Upgrade commons-lang to fix incompatiblity of Java 9 and later.
Thrift:
Add THiveTable
Add read_by_column_def in TBrokerRangeDesc
The new session variable 'close_join_reorder' is used to turn off all automatic join reorder algorithms.
If close_join_reorder is true, the Doris will execute query by the order in the original query.
1. Migrate some of the best practice articles to the Blog
2. Changed the names of performance tests and best practices to performance tests and examples
1. reduce hll memory occupied:
replace uint64_t _explicit_data[1602] with uint64_t
new memory for explicit data when really needed
2. trace HLL memory usage
1. Optimize HighWaterMarkCounter::add(), call `UpdateMax()` only if delta greater than 0
to reduce function call times
2. delete useless code lines to keep MemTracker clean
some member datas never be set, but check its value,the if condition never meet, so clean these codes
in debug mode,query memory not enough, may cause be down
fe set useStreamingPreagg true, but be function CreateHashPartitions check is_streaming_preagg_ should false.
then casue core dump.
```
*** Check failure stack trace: ***
@ 0x2aa48ad google::LogMessage::Fail()
@ 0x2aa6734 google::LogMessage::SendToLog()
@ 0x2aa43d4 google::LogMessage::Flush()
@ 0x2aa7169 google::LogMessageFatal::~LogMessageFatal()
@ 0x24703be doris::PartitionedAggregationNode::CreateHashPartitions()
@ 0x2468fd6 doris::PartitionedAggregationNode::open()
@ 0x1e3b153 doris::PlanFragmentExecutor::open_internal()
@ 0x1e3af4b doris::PlanFragmentExecutor::open()
@ 0x1d81b92 doris::FragmentExecState::execute()
@ 0x1d840f7 doris::FragmentMgr::_exec_actual()
```
we should remove DCHECK(!is_streaming_preagg_)
The `defineExpr` in `Column` must be analyzed before calling its `treeToThrift` method.
And fro CreateReplicaTask, no need to set `defineExpr` in TColumn.
Mainly changes:
1. Fix [Bug] Colocate group can not redistributed after dropping a backend #7019
2. Add detail msg about why a colocate group is unstable.
3. Add more suggestion when upgrading Doris cluster.
Add a new field `runningTxns` in the result of `SHOW ROUTINE LOAD`. eg:
```
Id: 11001
Name: test4
CreateTime: 2021-11-02 00:04:54
PauseTime: NULL
EndTime: NULL
DbName: default_cluster:db1
TableName: tbl1
State: RUNNING
DataSourceType: KAFKA
CurrentTaskNum: 1
JobProperties: {xxx}
CustomProperties: {"kafka_default_offsets":"OFFSET_BEGINNING","group.id":"test4"}
Statistic: {"receivedBytes":6,"runningTxns":[1001, 1002],"errorRows":0,"committedTaskNum":1,"loadedRows":2,"loadRowsRate":0,"abortedTaskNum":13,"errorRowsAfterResumed":0,"totalRows":2,"unselectedRows":0,"receivedBytesRate":0,"taskExecuteTimeMs":20965}
Progress: {"0":"10"}
ReasonOfStateChanged:
ErrorLogUrls:
OtherMsg:
```
So that user can view the status of corresponding transactions of this job by executing `show transaction where id=xx`;
The jar file compiled by Flink and Spark Connector, with the corresponding Flink, Spark version
and Scala version at compile time, so that users can know whether the version number matches when using it.
Example of output file name:doris-spark-1.0.0-spark-3.2.0_2.12.jar
Add the sharing blog function to the document site, including the blog list and detail page. At the same time, a guide on how to share blogs has been added to the developer guide.