Commit Graph

4430 Commits

Author SHA1 Message Date
bdf915abd4 [Enhancement] (image) check image validity as soon as generated (#9011)
* load newly generated image file as soon as generated to check if it is valid.

* delete the latest invalid image file

* fix

* fix

* get filePath from saveImage() to ensure deleting the correct file while exception happens

* fix

Co-authored-by: wuhangze <wuhangze@jd.com>
2022-04-25 19:35:41 +08:00
687421b43f keep at least one validated image file (#9192)
* rename ImageSeq to LatestImageSeq in Storage

* keep at least one validated image file
2022-04-25 19:32:43 +08:00
3bdfcde8e8 [Improvement] not print logs to fe.out when fe is running under daemon mode (#9195)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-04-25 18:29:29 +08:00
7226089116 FIX: getChannel -> getChannel() (#9217)
Co-authored-by: Rongqian Li <rongqian_li@idgcapital.com>
2022-04-25 17:46:00 +08:00
5b9a1a2a5d avoiding a corrupt image file when there is image.ckpt with non-zero … (#9180)
* avoiding a corrupt image file when there is image.ckpt with non-zero size

For now, saveImage writes data to image.ckpt via an append FileOutputStream,
when there is a non-zero size file named image.ckpt, a disaster would happen
due to a corrupt image file. Even worse, fe only keeps the lastest image file
and removes others.

BTW, image file should be synced to disk.

It is dangerous to only keep the latest image file, because an image file is
validated when generating the next image file. Then we keep an non validated
image file but remove validated ones. So I will issue a pr which keeps at least
2 image file.

* append other data after MetaHeader

* use channel.force instead of sync
2022-04-25 17:01:01 +08:00
b81f49b0d3 [BUG] fix compiling bug for java udf (#9161) 2022-04-25 10:02:01 +08:00
c3d0fee01b [fix](broker load) sync the workflow of BrokerScanner to other Scanner to avoid oom (#9173) 2022-04-25 10:01:42 +08:00
af2295f971 MOD: remove <scope>provided</scope> (#9177) 2022-04-25 10:00:57 +08:00
a608c3d5dc [Fixbug]assure transaction num in image file is right (#9181)
For now, dbTransactionManager::getTransactionNum is only used by
checkpoint to get transaction num to put into a image file. However,
transactions written into a image file do not come from the same
data structure as the num comes. Thus, we should pay much attention to
assure two data structue is consistent on size. Actually, it is
very difficult to do so.

This patch just let getTransactionNum get number from the same data
structure as write method.

The change was introduced by b93e841688.
2022-04-25 09:59:18 +08:00
Pxl
2d83167e50 [Feature] [Lateral-View] support outer combinator of table function (#9147) 2022-04-24 12:09:40 +08:00
4e1b75f5e7 [doc] add docker for Mac note (#9178) 2022-04-23 22:08:53 +08:00
48ac0d9591 [Refactor][doc]Modify the flink doris connector compilation documentation (#9169) 2022-04-23 22:08:09 +08:00
bfa9814350 [doc] add scala2.11 compile doc (#9166) 2022-04-23 22:07:45 +08:00
f2d741fa95 [doc] Modify the release version to prepare the key generation problem solution (#9165) 2022-04-23 22:06:48 +08:00
4911d6898a [docs][typo] Fix some typos in "alter-table" content. (#9131) 2022-04-23 22:05:13 +08:00
6756db6587 [enhancment](*): polish ignore with build_* (#9128) 2022-04-23 22:04:46 +08:00
4445d3188d [docs][typo] Fix some typos in "getting-started" content. (#9124) 2022-04-23 22:03:59 +08:00
ae25633d50 [fix](cache) Generate md5 value using utf8 encoding for sqlkey string (#9121) 2022-04-23 21:37:34 +08:00
89d37d920e [fix](transaction) Fix running transaction num always be zero when execute show proc '/transactions' stmt (#9106) 2022-04-23 21:37:18 +08:00
4a10b37ca2 [feature](image tool) support image load tool (#8982) 2022-04-23 21:36:58 +08:00
e157c2c254 [feature-wip](remote-storage) step3: Support remote storage, only for be, add migration_task_v2 (#8806)
1. Add TStorageMigrationReqV2 and EngineStorageMigrationTask to support migration action
2. Change TabletManager::create_tablet() for remote storage
3. Change TabletManager::try_delete_unused_tablet_path() for remote storage
2022-04-22 22:38:10 +08:00
e880dde7a5 [feature-wip](statistics) step1: create the statistics job (#8858)
This is the first PR for statistics collection includes some implementations of the statistics(#6370), it will not affect any existing code and users will not be able to create statistics job.
It mainly implements the semantic checking module for statistical information collection jobs, and the job creation module.
The syntax is:

ANALYZE [[ db_name.tb_name ] [( column_name [, ...] )], ...] [ PROPERTIES(...) ]

e.g. 
ANALYZE;
ANALYZE tbl1;
ANALYZE tbl1(col1, col2) PROPERTIES("cbo_ statistics_ task_ timeout" = "10");
Two configurations have been added:

Timeout time of a single task max_cbo_statistics_task_timeout_sec
The maximum number of running jobs the system can receive cbo_max_statistics_job_num

Co-authored-by: weizhengte <1141550741@qq.com>
Co-authored-by: weizhengte <weizhengte@foxmail.com>
Co-authored-by: EmmyMiao87 <522274284@qq.com>
Co-authored-by: frankywei <frankywei@tencent.com>
2022-04-22 18:24:54 +08:00
81ff49f8e3 [revert] "[Fix bug] fix non-equal out join is not supported (#8857)" (#9150)
This PR cause FE ut failed:

InferFiltersRuleTest
testOn3Tables1stInner2ndRightJoinEqLiteralAt2nd
testOn3Tables1stInner2ndRightJoinEqLiteralAt3rd
2022-04-21 18:20:19 +08:00
ae680b4248 [UDF] support RPC udaf part 1: support create RPC udaf in fe (#8510) 2022-04-21 17:38:58 +08:00
2c2e06a5fe [Fix bug] fix non-equal out join is not supported (#8857) 2022-04-21 12:44:20 +08:00
7af684ad0f [fix] Fix a compatibility problem caused by using a non-existent database when connecting via mysql client (#9127) 2022-04-21 12:43:39 +08:00
7b3865b524 [fix](ut)(vectorized) fix a potential stack overflow bug and some unit test (#9140) 2022-04-21 12:17:03 +08:00
fa5b5fc6d1 [fix](dynamic_partition) fix dynamic partition scheduler not work for olap table with random hash info (#9108) 2022-04-21 12:16:27 +08:00
498f50a837 [regression-test] update test case dir which divided by basic functions (#9084)
1.  Add test case dir. 
2. Add some test suites.
2022-04-21 11:55:41 +08:00
Pxl
dda7604e16 [Bug][Storage-vectorized] fix code dump on outer join with not nullable column (#9112) 2022-04-21 11:02:04 +08:00
40362dfaca [fix](partition) Fix wrong partition distribution key info for random hash olap table (#9104) 2022-04-20 17:08:42 +08:00
f253e260c8 [fix] Modify fe jetty configuration parameters (#9075) 2022-04-20 14:51:25 +08:00
39c0fec680 [fix] fix bug when partition_id exceeds integer range in spark load (#9073) 2022-04-20 14:50:55 +08:00
a2edc6fd8b [feature-wip](array-type) replicate impl for ColumnArray to support join with array column (#9070)
SQL with JOIN and columns ARRAY, will call function ColumnArray::replicate. At this pr,
we implement replicate for ARRAY type, to support SQL like this:
`SELECT count(lo_array),count(d_array),SUM(lo_extendedprice*lo_discount) AS REVENUE FROM  lineorder, date WHERE  lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;`
2022-04-20 14:50:34 +08:00
df3a8545dc [fix](routine_load) Add retry mechanism for routine load task which encounter Broker transport failure (#9067) 2022-04-20 14:49:58 +08:00
3cd432c83a [community](*) polish config about project (#8987)
- Add `.editorconfig`
- Polish `.gitignore`
2022-04-20 14:48:32 +08:00
bd126f0679 [improvement] Refactor type info for further optimizations. (#8786)
## Design:

For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types.

There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes.
1. `const TypeInfo* get_scalar_type_info(FieldType field_type)`
    The function is used to get the type info of scalar types. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
2. `const TypeInfo* get_collection_type_info(FieldType sub_type)`
    The function is used to get the type info of array types with just **ONE** depth. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)`
4. `TypeInfoPtr get_type_info(const TabletColumn* col)`
    These functions are used to get the type info of **BOTH** scalar types and composite types. The caller should be responsible to manage the resources returned.

#### About the new type `TypeInfoPtr`
`TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter.
1. For scalar types, the deleter does nothing.
2. For composite types, the deleter reclaim the memory.

By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr:
1. `Field`
2. `ColumnReader`
3. `DefaultValueColumnIterator`

Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance.
1. `ScalarColumnWriter` - holds `Field`
    1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter`
        1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types.
    2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
        1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and  `BitmapIndexWriter` doesn't support `ArrayType`.
    3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
        1.  `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types.
2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types).
3. `ColumnVectorBatch`
    1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in  `IndexedColumnReader`
    2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader`
    3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`
2022-04-20 14:47:29 +08:00
1b4cd76847 [feature](vectorized)(function) Support min_by/max_by function. (#8623)
Support min_by/max_by on vectorized engine.
2022-04-20 14:46:19 +08:00
d58e8d76b5 [doc]Release manager docment (#9081)
Add Release manager docment
2022-04-20 14:16:12 +08:00
304bd9ab62 Change Get-Starting (#9102)
Modify the download address
2022-04-20 14:15:18 +08:00
8d0f06e49a [docs][typo] Fixed Chinese and English "advance-usage.md" files. (#9099)
Fixed Chinese and English "advance-usage.md" files
2022-04-20 14:14:39 +08:00
3a9008f06a [dosc][typo] Fix "basic-usage.md" files. (#9097)
Fix "basic-usage.md"
2022-04-20 14:14:12 +08:00
1d0629925f Modify the compilation docs whether it supports avx2. (#9095)
Modify the compilation docs whether it supports avx2
2022-04-20 14:13:39 +08:00
48f805fbab Fix routine-load-manual.md (#9090)
Fix routine-load-manual
2022-04-20 14:13:19 +08:00
37bd89d24b [typo](docs) fix some typos in docs (#9085)
fix some typos in docs
2022-04-20 14:12:54 +08:00
869fdff2f0 [refactor] add reference path for source file from impala (#9115)
According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.
2022-04-20 12:29:57 +08:00
2cecb5dc82 [fix](regression-test) disable test for hdfs and fix double type with null (#9080)
1. add a new config in regression-conf.groovy
    enableHdfs, default is false, to skip tests with hdfs
2. fix a bug that when double type column result is null, exception will be thrown
2022-04-18 19:37:37 +08:00
0f86fed547 [improvement](insert) Support verbose keyword in insert query stmt (#9047) 2022-04-18 19:36:40 +08:00
51db4e54c0 [fix](table-function) Fix bug of table function with outer join cause nullptr of tuple (#9041) 2022-04-18 19:35:26 +08:00
f3dce9a6c1 [fix](planner) fix is-null predicate in where statement cannot be pushed down to the storage layer (#9035) 2022-04-18 19:35:02 +08:00