Commit Graph

11558 Commits

Author SHA1 Message Date
16c218fde5 [feature](nereids) support bind external relation out of Doris fe environment (#21123)
support bind external relation out of Doris fe environment, for example, analyze sql in other java application.
see BindRelationTest.bindExternalRelation.
2023-06-29 14:29:29 +08:00
f5668ac1a0 [fix](doc) Fix table typo in star schema benchmark documentation and join optimization (#19181) 2023-06-29 11:50:04 +08:00
Pxl
87e64115ae [Chore](materialized-view) add case about insert data imidiately after create mv(#21281)
add case about insert data imidiately after create mv
2023-06-29 11:17:38 +08:00
3a12b67517 [Improvement](statistics, multi catalog)Implement hive table statistic connector (#21053)
This pr is to add the collecting hive statistic function. While the CBO fetching hive table statistics, statistic cache will 
first load from internal stats olap table. If not found, then using this pr's function to fetch from remote Hive metastore.
2023-06-29 10:50:54 +08:00
Pxl
45f1909bc3 [Bug](lateral-view) make lateral view function's nullable mode work (#21242)
make lateral view function's nullable mode work
2023-06-29 10:50:07 +08:00
7f0e37069f [improvement](olap) filter the whole segment by dictionary (#21239) 2023-06-29 10:34:29 +08:00
3f99b91ddf [fix](gc_binlog) Fix tablet gc_binlogs nullptr (#21158) 2023-06-29 10:10:33 +08:00
Pxl
f8cfe5e579 [Bug](pipeline) add DCHECK for _instance_to_sending_by_pipeline = false on _send_rpc (#21169)
add DCHECK for _instance_to_sending_by_pipeline = false on _send_rpc
2023-06-29 10:03:57 +08:00
30b1b93353 [dependency](fe)Dependency version upgrade (#21191)
Keep hadoop-aliyun version consistent with hadoop main version (3.3.5)
upgrade jackson to 2.14.3
upgrade netty version to 4.1.94.final
binding check.freamework version to 3.32.0
upgrade snappy-java to 1.1.10.1
upgrade hudi version to 0.13.1
upgrade spring version to 2.7.13
upgrade orc version to 1.8.4
revert nonsensical changes
2023-06-29 10:01:33 +08:00
54e2e2f7ee [typo](doc)FlinkCDC access to multi-table or whole database example document mod… (#21295) 2023-06-29 09:42:13 +08:00
64ffb06a79 [fix](Nereids) olap scan should not be gather since coordinator chould not process (#21298)
in PR #21168 , we refactor physcial properties and translator
to ensure not generating useless excahange. olap scan node
could be gather in Nereids but translate to hash partitioned.
since coordinator could not process gather olap scan node,
we remove the candidate distribution spec of olap scan
2023-06-29 09:12:08 +08:00
9af714bceb [fix](catalog) disble FileSystem Cache to avoid too many fs cache (#21283)
When creating a new hive catalog or refresh the hive catalog, it will refresh the HiveMetaStore cache.
And it will call "FileInputFormat.setInputPaths()".
In this method, it will create a new FileSystem instance and store it in FileSystem's cache.
So if refresh catalog frequently, there will be too many FileSystem instances in cache, causing OOM.

This PR disable the FileSystem Cache.
2023-06-29 09:06:00 +08:00
73bce9e750 [typo](doc) add params description and example for accessing hdfs in ha mode by tvf #21277 2023-06-29 09:05:35 +08:00
884c908e25 [Enhancement](multi-catalog) try to reuse existed ugi. (#21274)
Try to reuse an existed ugi at DFSFileSystem, otherwise if we query a more then ten-thousands partitons hms table, we will do more than ten-thousands login operations, each login operation will cost hundreds of ms from my test.
Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>
2023-06-29 09:04:59 +08:00
86af533e83 [Enhancement](heartbeat) make heartbeat ok when config repeated host-ip pairs (#21228) 2023-06-28 23:12:06 +08:00
449c8d4568 [fix](jdbc) Handling Zero DateTime Values in Non-nullable Columns for JDBC Catalog Reading MySQL (#21296) 2023-06-28 22:51:17 +08:00
e7dd65f551 [fix](test) fix PlannerTest testEliminatingSortNode (#21112)
testEliminatingSortNode needs to check if SortNode is existed in plan tree, so it should check plan1.contains("order by:"), but rather than plan1.contains("SORT INFO:") or plan1.contains("SORT LIMIT:").
2023-06-28 21:29:23 +08:00
274203a59c [typo](storage)Fixed wrong description about Storage_root_path parameter (#20641) 2023-06-28 21:28:50 +08:00
a6b51ec19a [Feature](avro) Support Apache Avro file format (#19990)
support read avro file by hdfs() or s3() .
```sql
select * from s3(
         "uri" = "http://127.0.0.1:9312/test2/person.avro",
         "ACCESS_KEY" = "ak",
         "SECRET_KEY" = "sk",
         "FORMAT" = "avro");
+--------+--------------+-------------+-----------------+
| name   | boolean_type | double_type | long_type       |
+--------+--------------+-------------+-----------------+
| Alyssa |            1 |     10.0012 | 100000000221133 |
| Ben    |            0 |    5555.999 |      4009990000 |
| lisi   |            0 | 5992225.999 |      9099933330 |
+--------+--------------+-------------+-----------------+

select * from hdfs(
                "uri" = "hdfs://127.0.0.1:9000/input/person2.avro",
                "fs.defaultFS" = "hdfs://127.0.0.1:9000",
                "hadoop.username" = "doris",
                "format" = "avro");
+--------+--------------+-------------+-----------+
| name   | boolean_type | double_type | long_type |
+--------+--------------+-------------+-----------+
| Alyssa |            1 |  8888.99999 |  89898989 |
+--------+--------------+-------------+-----------+
```

current avro reader only support common data type, the complex data types will be supported later.
2023-06-28 21:15:35 +08:00
4e082a803f [typo](docs) improvement lakehouse doc sidebar (#21270) 2023-06-28 20:19:17 +08:00
325504deeb [bugfix](recover) do not need dynamic partition recover except olap table (#21290)
introduced by #19031

FE could not recover any more because there is a convert to olap table operation in the code. But there are many table types that is not a olap table such as view jdbc table ...
It will convert failed and FE will not start correctly.Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-28 19:56:17 +08:00
016870b673 [opt](nereids) use Expression's isConstant to check whether could be remove from group by key (#21195) 2023-06-28 19:12:36 +08:00
f77c69ab95 [fix](test) case bug, streamload without sync. (#21161) 2023-06-28 18:22:19 +08:00
283fd2903f [typo](doc)json document optimization (#20753) 2023-06-28 18:01:41 +08:00
76620c21aa [improvement](nereids) prune hash join output slot ids list (#20789)
1. prune hash join output slot ids list based on slot ids in required project and other conjunctions, to reduce the be side effort.
2. support pruning for semi/anti also
2023-06-28 17:28:18 +08:00
d2c42ec638 [fix](memory) Purge Jemalloc arena dirty pages when memory insufficient (#21237)
Jemalloc dirty page only use madvise MADV_FREE, memory is not release back to system, RSS won't reduce in time,

So when the process memory exceed limit or system available memory is insufficient,
manually transfer dirty page to the muzzy page, which will call MADV_DONTNEED to release the physical memory back to the system.

https://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms
2023-06-28 16:49:45 +08:00
0396f78590 [fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259) 2023-06-28 16:10:24 +08:00
3304af848e [Fix](storage)read page cache when seek #21272
Currently, when a columnIter is used for seek, then page cache is not set;
When this colunIter is used for later read data, then page cache could not be used.
2023-06-28 15:53:40 +08:00
7588abe76b [refactor](Nereids) refactor physical properties and plan translator (#21168)
this PR
1. refactor physical properties, property deriver and property regular 
to ensure Nereids could generate plan with sufficent PhysicalDistribute.
2. refactor PhyscialPlanTranslator to ensure all ExchangeNode generated
by PhysicalDistribute, except CTEConsumer. We will refactor all cte
related node later. 

the detail changes of this PR:
1. update DistributionSpec of physical properties:
- Any: random distribution, used in output and require
- StorageAny: random distribution but constrained by where the data is stored, used in output
- ExecutionAny: random distribution to present random shuffle, used in output
- Gather: gather distribution, used in output and require
- StorageGather: gather distribution but constrained by where the data is stored, used in output
- Replicated: broadcast distribution
- Hash: bucket distribution

2. update shuffle type of DistributionSpecHash
- REQUIRE: used in require
- NATURAL: distribution as storage engine hash algorithm, constrained by where the data is stored
- STORAGE_BUCKETED: distribution as storage engine hash algorithm
- EXECUTION_BUCKETED: distribution as execution engine hash algorithm

3. update HideOneRowRelationUnderSetOperation to MergeOneRowRelationIntoSetOperation

4. update property deriver of SetOperation to ensure suitable PhysicalDistribute be added
at top and below of SetOperation

5. refactor PhysicalPlanTranslator to ensure no unplanned exchange node will be added
2023-06-28 15:15:11 +08:00
e348b9464e [scan](freeblocks) use ConcurrentQueue to replace vector for free blocks (#21241) 2023-06-28 15:10:07 +08:00
a4fdf7324a [Bug](javaudf) fix BE crash if javaudf is push down (#21139) 2023-06-28 15:01:24 +08:00
Pxl
1fc1e76fc7 [Bug](alter table) return error status to avoid core dump on schema change meet invalid input (#21273)
return error status to avoid core dump on schema change meet invalid input
2023-06-28 14:20:16 +08:00
21b30820fd [fix](partial-update) fix a coredump in commit_phase_update_delete_bitmap (#21254) 2023-06-28 11:47:07 +08:00
de9172e476 [enhancement](merge-on-write) replace map with vector for segment handle caches (#21162) 2023-06-28 11:33:02 +08:00
5d1fb33f2d [enhancement](merge-on-write) increasing the max_write_buffer_number parameter to improve save meta performance (#21243) 2023-06-28 11:32:11 +08:00
824c1fe165 [typo](docs)delete the native udf doc (#21146) 2023-06-28 11:29:49 +08:00
1d406d486c [typo](docs) modify invalid URLs in release-1.2.0 (#21175) 2023-06-28 11:29:33 +08:00
08fe22cb0c [improvement](backup) Add BackupJobInfo with tableCommitSeqMap (#21255)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-06-28 11:10:12 +08:00
e9bbac71dc [typo](docs) poor phrasing (#21224) 2023-06-28 11:05:09 +08:00
a6ff87f32c [docker](trino) add Trino docker compose and hive catalog (#21086) 2023-06-28 11:04:41 +08:00
33ace22471 [typo](docs) improvement SQL manual doc sidebar (#21267) 2023-06-28 11:03:53 +08:00
18878df1c0 [typo](doc)outfile export document optimization (#21211) 2023-06-28 10:30:30 +08:00
ac62ca0320 [typo](doc) add model limitation description for inverted index (#21245) 2023-06-28 10:13:42 +08:00
853fa5f688 [typo](nativeInsertStmt) fix object-stored column exception description (#21221) 2023-06-28 10:12:55 +08:00
b1e973b721 [Improve](func)support array to window-func first-last-value arg type (#21201)
* support array to windown-func first-last-value arg type

* add regress test for first-last-value of array type

* update

* format be:
2023-06-28 10:02:00 +08:00
db50face41 [fix](time_zone) be compatible with doris old version for CST time_zone when load orc file in broker load (#21263)
Fix error for broker load with orc file when time_zone is CST of which message is "Failed to create orc row reader. reason = Can't open /usr/share/zoneinfo/CST"
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-06-28 09:44:42 +08:00
98b2bc87b5 [typo](MultiPartitionDesc) fix Multi partition time interval exception description (#21222) 2023-06-28 00:42:25 +08:00
d871df64ca [improvement](oracle jdbc)Support for automatically obtaining the precision of the oracle timestamp type (#21252) 2023-06-28 00:19:01 +08:00
db7eaad3cf [Fix](CI)After Approve, even comments should be considered as mergeable (#21264) 2023-06-28 00:18:25 +08:00
92882ebd91 [fix](inverted index) update output rowset index meta with input rowset when drop inverted index (#21248) 2023-06-27 23:54:35 +08:00