Commit Graph

110 Commits

Author SHA1 Message Date
da5c78019c [opt](fe-ui) support read hardware info from aarch64 MacOS (#23708)
update the version of oshi and jna to support read hardware info from aarch64 MacOS
2023-08-31 18:16:33 +08:00
5b641ebd40 [feature-wip](catalog) support deltalake catalog step1-metadata (#22493) 2023-08-29 10:31:37 +08:00
e17779f193 [Dependency](fe)Upgrade dependency version (#22496)
Upgrade guava to 32.1.2-jre
Set ck dependency scope to provided
Upgrade okio to 3.4.0
Upgrade snake yaml to 1.33
Upgrade aws-java-sdk to 1.12.519
Upgrade hadoop to 3.3.6
2023-08-11 10:54:37 +08:00
582acad8a1 [feature](stats) Enable period time with cron expr (#22095)
Support such grammar

ANALYZE TABLE test WITH CRON "* * * * * ?"

Such job would be scheduled as the cron expr specifie, but natively support minute-level schedule only
2023-07-26 17:25:57 +08:00
964ac4e601 [opt](nereids) Retry when async analyze task failed (#21889)
Retry at most 5 times when async analyze task execution failed
2023-07-26 17:16:56 +08:00
7fcf702081 [improvement](multi catalog)paimon support filesystem metastore (#21910)
1.support filesystem metastore

2.support predicate and project when split

3.fix partition table query error

todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem

doc pr: #21966
2023-07-24 22:02:57 +08:00
30b1b93353 [dependency](fe)Dependency version upgrade (#21191)
Keep hadoop-aliyun version consistent with hadoop main version (3.3.5)
upgrade jackson to 2.14.3
upgrade netty version to 4.1.94.final
binding check.freamework version to 3.32.0
upgrade snappy-java to 1.1.10.1
upgrade hudi version to 0.13.1
upgrade spring version to 2.7.13
upgrade orc version to 1.8.4
revert nonsensical changes
2023-06-29 10:01:33 +08:00
d4240ac21b [fix](multi-catalog)add oss sdk, supported oss properties (#21029) 2023-06-26 13:00:44 +08:00
46f0295b78 [feature](load-refactor-with-tvf) S3 load with S3 tvf and native insert (#19937) 2023-06-25 17:45:31 +08:00
57656b2459 [Enhancement](java-udf) java-udf module split to sub modules (#20185)
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.

Co-authored-by: lexluo <lexluo@tencent.com>
2023-06-13 09:41:22 +08:00
9a83d78dfe [Enhancement](hudi) support hudi mor table, step2 follow #19909 (#20570)
PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries.
Key Implementations:
1. Use hudi meta information to generate the table schema, not from hive client.
2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory.
3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog.
4. Read the COW table originally from c++.
5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.
2023-06-10 12:25:53 +08:00
fe63a0a3bb [Feature](multi-catalog)support paimon catalog (#19681)
CREATE CATALOG paimon_n2 PROPERTIES (
"dfs.ha.namenodes.HDFS1006531" = "nn2,nn1",
"dfs.namenode.rpc-address.HDFS1006531.nn2" = "172.16.65.xx:4007",
"dfs.namenode.rpc-address.HDFS1006531.nn1" = "172.16.65.xx:4007",
"hive.metastore.uris" = "thrift://172.16.65.xx:7004",
"type" = "paimon",
"dfs.nameservices" = "HDFS1006531",
"hadoop.username" = "hadoop",
"paimon.catalog.type" = "hms",
"warehouse" = "hdfs://HDFS1006531/data/paimon1",
"dfs.client.failover.proxy.provider.HDFS1006531" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);
2023-06-06 15:08:30 +08:00
b7fc17da68 [feature-wip](multi-catalog)(step2)support read max compute data by JNI (#19819)
Issue Number: #19679
2023-06-05 22:10:08 +08:00
499f443779 [feature](iceberg) Support read iceberg data on gcs (#19815) 2023-05-20 12:40:03 +08:00
f68d3a660e [improvement](opentelemetry) upgrade opentelemetry jar to v1.26.0 and opentelemetry-cpp to v1.8.3 (#19733)
why upgrade? anything wrong?

Try to fix the problem about opentelemetry::v1::ext::http::client::curl::HttpOperation::Send(), I have updated the pr info.
2023-05-18 18:46:20 +08:00
3f2d1ae9a4 [feature-wip](multi-catalog)(step1)support connect to max compute (#19606)
Issue Number: #19679

support connect to max compute metadata by odps sdk
2023-05-16 11:30:27 +08:00
ccd22c508a [chore](fe) Fix the build on Centos 6 (#19255) 2023-05-06 14:50:56 +08:00
c9fa10ac10 [fix](doc) avoid generate config doc automatically (#19302)
After #19246, when compilng FE, it will automatically generate Config and Session Variables doc and overwrite the origin one.
Need to avoid it because it is not ready to use yet
2023-05-05 20:39:05 +08:00
70236adc1f [Refactor](doc)(config)(variable) use script to generate doc for FE config and session variables (#19246)
The document of configs(FE and BE) and session variables is hard to maintain.
Because developer need to modify both code and document.
And you can see that some of config's document is missing.

So I plan to write the document of config or variables directly in code, and using
script to generate document automatically.

How To
This CL mainly changes:

Add field in Config and Session Variables' annaotion

description: The description of the config or variable item. It is a String array. And first element is in Chinese, second is in English
options: the valid options if the config or variable is enum.
Add a scripts docs/generate-config-and-variable-doc.sh

Simple run sh docs/generate-config-and-variable-doc.sh and it will generate docs of FE config and variables,
And save it under docs/admin-manual/config/fe-config.md and docs/advanced/variables.md,
both in Chinese and in English.

And there are template markdowns for this script to read and replace with real doc content.

TODO
Too many description need to be filled. I will finish them in next PR. And now the origin doc remain unchanged.
Find a way to check the description field of config and variables, to make sure we won't missing it.
Generate doc for BE config.
2023-05-05 14:42:43 +08:00
5459cd9c30 [Improve](fe)Upgrade dependencies and optimize jar package management (#18882)
bind netty-version to 4.1.89-final
bind jettison to 1.5.4
upgrade hadoop version to 3.3.5
upgrade range-plugins-common to 2.4.0
bind bcprov-jdk15on to 2.4.0
upgrade and bind woodstox to 6.5.1
upgrade and bind kerby to 2.0.3
upgrade hudi to 0.13.0
upgrade parquet to 1.13.0
upgrade maven-source-plugin to 3.2.1
upgrade maven-assembly-plugin to 3.3.0
upgrade maven-javadoc-plugin to 3.3.2
upgrade maven-shade-plugin to 3.3.4
upgrade maven-clean-plugin to 3.1.0
Remove meaningless plugins
Optimize doris maven path
Unify the Java modules for management in fe
2023-05-04 10:07:37 +08:00
57982ddc46 [Fix](catalog)Fix hudi-catalog get file split error (#18644) (#18673)
`hudi-common` depends on `parque-avro`, but the dependency scope is `provide`. 
When we use `hudi-catalog`, `HoodieAvroWriteSupport` will be called. This method depends on `parque-avro`, so it will generate ClassNotFound
Describe your changes.
2023-04-16 21:56:14 +08:00
b39846c2c7 [Fix](Catalog)Delete duplicate defined dependencies to avoid class loading exceptions (#18628)
`iceberg-hive-metastore` and `hive-storage-api` have been defined in hive-catalog-shade,
and some classes in the shade have been renamed, so we cannot declare them again.
The classes in the shade should be kept.

The `hive-metastore-api` used in `ranger` can also use the jar in the `shade`.
Since we rename the tool class used inside the `hive`, this has no effect.
2023-04-13 22:12:19 +08:00
75fd4b70fa [improve](fe)Optimize fe binary package packaging (#18554) 2023-04-12 12:58:45 +08:00
5f981b0b1f [fix](catalog)Use hive-catalog-shade to solve thrift version compatibility issues (#18504)
`Hive 3` uses the `thrift-0.9.3` package, and `Doris` uses the `thrift-0.16.0` package.
These two packages are not compatible, so we use the `hive-sahde` package to manage hive dependencies
in a unified way. This jar package renames the `thrift` class , so the problem of conflict can be resolved.
2023-04-11 13:19:39 +08:00
d0219180a9 [feature-wip](multi-catalog)add properties converter (#18005)
Refactor properties of each cloud , use property converter to convert properties accessing fe
metadata and be data.
user docs #18287
2023-04-06 09:55:30 +08:00
c2dd005efb [fix](chore) fix BE compile and FE protoc artifact issue (#18120)
add <optional> head to solve the compilation issue
use 3.12.9 as the protoc.artifact's version, because there is no 3.12.21
See: https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/
Remove --show-progress arguments of wget because it is not supported in low version wget
2023-03-27 08:53:42 +08:00
93cfd5cd2b [Enhance](ComputeNode)support k8s watch (#17442)
Describe your changes.

1.Add the watch mechanism to listen for changes in k8s statefulSet and update nodes in time.
2.For broker, there is only one name by default when using deployManager
3.Refactoring code makes it easier to understand and maintain
4.Fix jar package conflicts between okhttp-ws and okhttp

Previously, the logic of k8sDeployManager.getGroupHostInfos was to call the endpoints () interface of k8s,
which would cause if the pod was unexpectedly restarted, k8sDeployManager would delete the pod before the
restart from the fe or be list and add the pod after the restart to the fe or be list, which obviously does not
meet our expectations.
Now, after fqdn is enabled, we call the statefulSets() interface of k8s to listen for the number of copies to
determine whether we need to be online or offline.
In addition, the watch mechanism is added to avoid the possible A-B-A problem caused by timed polling.
For the sake of stability, when the watch mechanism does not receive messages for a period of time,
it will be degraded to the polling mode.

Now several environment variables have been added,ENV_FE_STATEFULSET,ENV_FE_OBSERVER_STATEFULSET,ENV_BE_STATEFULSET,ENV_BROKER_STATEFULSET,ENV_CN_STATEFULSET For statefulsetName,One-to-one correspondence with ENV_FE_SERVICE,ENV_FE_OBSERVER_SERVICE,ENV_BE_SERVICE,ENV_BROKER_SERVICE,ENV_CN_SERVICE,If a serviceName is configured, the corresponding statefulsetName must be configured, otherwise the program cannot be started.
2023-03-20 11:36:32 +08:00
295b26db00 [chore](fe) update aspectj-maven-plugin to 1.14.0 version (#17890)
In #17797 , we introduced aspectj to help log exception easily.
However, the plugin version 1.11 do not support jdk9 and later.
For support compile FE with jdk11

update aspectj-maven-plugin to 1.14.0 version
add new dependency org.aspectj.aspectjrt 1.9.7 to fe-core
according to:

aspectj java version compatibility
aspectj-maven-plugin issue
aspectj release note
intro to aspectj
2023-03-19 14:50:09 +08:00
0ec10d4836 [Enhancement](fe exception) write a java annotation to catch throwable from a method and print log (#17797)
How it works?
Aspectj is used to implement the aspect function of annotations. During the compilation process, the aspectj-maven-plugin plugin will automatically weave the code with aspect annotations into the generated classes file.
When to use to?
When a method wants to add a try catch to save exception information, the LogException annotation can be used. When there is a method that does not allow errors, the NoException annotation can be used.
What is the result when adding this annotation?
Use the LogException annotation to automatically capture exceptions into the Log file, and the code can be more concise. Use the NoException annotation to automatically capture the exception to the Log file and exit the program when an exception occurs.
2023-03-17 08:52:27 +08:00
310bdb60f4 [chore](maven) Prefer protoc in thirdparty to the one in maven artifacts (#17596)
The prebuilt protoc-gen-grpc-java binary uses glibc on Linux and the version of glibc which Centos 6 uses is too old.
2023-03-09 16:21:38 +08:00
b6128f9b65 [dependenct](fe) Replace jackson-mapper-asl with fastxml-jsckson (#17303) 2023-03-09 09:35:58 +08:00
d908d5fe01 [dependency](fe)Dependency Upgrade (#17377)
* Upgrade log4j to 2.X
  - binding log4j version to 2.18.0
  - used log4j-1.2-api complete smooth upgrade
* Upgrade filerupload to 1.5
* Upgrade commons-io to 2.7
* Upgrade commons-compress to 1.22
* Upgrade gson to 2.8.9
* Upgrade guava to 30.0-jre
* Binding jackson version to 2.14.2
* Upgrade netty-all to 4.1.89.final
* Upgrade protobuf to 3.21.12
* Upgrade kafka-clints to 3.4.0
* Upgrade calcite version to 1.33.0
* Upgrade aws-java-sdk to 1.12.302
* Upgrade hadoop to 3.3.4
* Upgrade zookeeper to 3.4.14
* Binding tomcat-embed-core to 8.5.86
* Upgrade apache parent pom to 25
* Use hive-exec-core as a hive dependency, add the missing jar-hive-serde separately
* Basic public dependencies are extracted to parent dependencies
* Use jackson uniformly as the basic json tool
* Remove springloaded, spring-boot-devtools has the same functionality
* Modify the spark-related dependency scope to provide, which should be provided at runtime
2023-03-08 14:28:40 +08:00
48c2d806d7 [enhencement](jdbc catalog) Use Druid instead of HikariCP in JdbcClient (#17395)
This pr does three things:
1. Use Druid instead of HikariCP in JdbcClient
2. when download udf jar, add the name of the jar package after the local file name.
3. refactor some jdbcResource code
2023-03-07 08:51:10 +08:00
449f2953c9 [Improvement](auth)(step-1) add ranger authorizer for hms catalog (#17153) 2023-03-03 09:45:08 +08:00
51bbae27b8 [feature-wip](iceberg) add dlf and glue catalog impl for iceberg catalog (#16602)
iceberg catalog supports
DLF on Alibaba Cloud and AWS Glue Catalog
2023-02-23 14:02:41 +08:00
d56043ab5a [feature-wip](MTMV) Support setting variables in query statement (#16060)
## Use case

```shell
mysql> CREATE TABLE t_user (
    ->   event_day DATE,
    ->   id bigint,
    ->   username varchar(20)
    -> )
    -> DISTRIBUTED BY HASH(id) BUCKETS 10
    -> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.07 sec)

mysql> CREATE TABLE t_user_pv(
    ->   event_day DATE,
    ->   id bigint,
    ->   pv bigint
    -> )
    -> DISTRIBUTED BY HASH(id) BUCKETS 10
    -> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.09 sec)

mysql> CREATE MATERIALIZED VIEW mv
    -> BUILD IMMEDIATE REFRESH COMPLETE
    -> KEY (username)
    -> DISTRIBUTED BY HASH(username) BUCKETS 10
    -> PROPERTIES ('replication_num' = '1')
    -> AS SELECT /*+ SET_VAR(exec_mem_limit=1048576, query_timeout=3600) */ t1.username ,t2.pv FROM t_user t1 LEFT JOIN t_user_pv t2 on t1.id = t2.id;
Query OK, 0 rows affected (0.10 sec)
```
2023-01-30 01:05:41 +08:00
da28d2faee [deps](http)Upgrade springboot version to 2.7.8 (#16158)
* Upgrade springboot version to 2.7.8

* fix
2023-01-28 20:13:50 +08:00
726427b795 [refactor](fe) refactor and upgrade dependency tree of FE and support AWS glue catalog (#16046)
1. Spark dpp
 
	Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module.
	So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar`
	will not be moved into `fe/lib`, which reduce the size of FE output.
	
2. Modify start_fe.sh

	Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that
	when loading classes with same qualified name, it will be got from doris-fe.jar firstly.
	
3. Upgrade hadoop and hive version

	hadoop: 2.10.2 -> 3.3.3
	hive: 2.3.7 -> 3.1.3
	
4. Override the IHiveMetastoreClient implementations from dependency

	`ProxyMetaStoreClient.java` for Aliyun DLF.
	`HiveMetaStoreClient.java` for origin Apache Hive metastore.

	Because I need to modified some of their method to make them compatible with
	different version of Hive.
	
5. Exclude some unused dependencies to reduce the size of FE output

	Now it is only 370MB (Before is 600MB)
	
6. Upgrade aws-java-sdk version to 1.12.31

7. Support AWS Glue Data Catalog

8. Remove HudiScanNode(no longer support)
2023-01-20 14:42:16 +08:00
d48abd91df [deps](fe)upgrade deps version (#15262)
upgrade hadoop version to 2.10.2
jackson-databind to 2.14.1
2022-12-24 22:18:10 +08:00
e8bac706d3 [deps](FE)Upgrade the velocity version that hive-exec depends on to 2.3 (#15067) 2022-12-19 14:20:11 +08:00
ef1bb9819a [feature-wip](MTMV) Support mapping the partition rule of base table to the materialized view (#14930)
When we create a materialized view for multiple tables, users may not figure out the partition rule for the materialized view, because the query result can be too complex. If the query result doesn't match one of the partition rules, the insertion will fail.

We can resolve this issue by mapping the partition rule of base table to the materialized view. As a result, users don't need specify the partition rules and query results are all valid because they are retrieved from the partitions of the base table.

## Use case

mysql> CREATE TABLE t1 (pk INT NOT NULL, v1 INT SUM) PARTITION BY RANGE(pk) (
    ->   PARTITION p1 VALUES LESS THAN ('10'),
    ->   PARTITION p2 VALUES LESS THAN ('90')
    -> )
    -> DISTRIBUTED BY HASH(pk)
    -> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.04 sec)

mysql> CREATE TABLE t2 (pk INT NOT NULL, v2 INT SUM) PARTITION BY LIST(pk) (
    ->   PARTITION odd VALUES IN ('10', '30', '50', '70', '90'),
    ->   PARTITION even VALUES IN ('20', '40', '60', '80')
    -> )
    -> DISTRIBUTED BY HASH(pk)
    -> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.02 sec)

mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE
    -> KEY (mpk) PARTITION BY (t1.pk) DISTRIBUTED BY HASH(mpk) PROPERTIES ('replication_num' = '1')
    -> AS SELECT t1.pk AS mpk, v1, v2 FROM t1, t2 WHERE t1.pk = t2.pk;
Query OK, 0 rows affected (0.10 sec)

mysql> SHOW CREATE TABLE mv;
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Materialized View | Create Materialized View                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mv                | CREATE MATERIALIZED VIEW `mv`
BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND
KEY(`mpk`)
PARTITION BY RANGE(`mpk`)
(PARTITION p1 VALUES [("-2147483648"), ("10")),
PARTITION p2 VALUES [("10"), ("90")))
DISTRIBUTED BY HASH(`mpk`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
)
AS SELECT `t1`.`pk` AS `mpk`, `v1` AS `v1`, `v2` AS `v2` FROM `default_cluster:dev`.`t1` , `default_cluster:dev`.`t2` WHERE `t1`.`pk` = `t2`.`pk`; |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
2022-12-09 22:47:21 +08:00
ec2539e2a3 [chore](macOS) Resolve the issue with missing python program (#14864) 2022-12-07 15:30:12 +08:00
ed96442b85 [fix](multi-catalog) fix persist issue about jdbc catalog and class loader issue #14794
Fix a bug that JDBC catalog/database/table should be add to GsonUtil

Fix a class loader issue that sometime it will cause ClassNotFoundException

Fix regression test to use different catalog name.

Comment out 2 regression tests:

regression-test/suites/query_p0/system/test_query_sys.groovy
regression-test/suites/statistics/alter_col_stats.groovy
Need to be fixed later
2022-12-05 09:05:13 +08:00
ce95da8dfb [improvement](multi-catalog) support specify hadoop username (#14734)
Support setting "hadoop.username" property when creating hms catalog.
2022-12-04 21:09:39 +08:00
fb5a3e118a [feature-wip](dlf) prepare to support aliyun dlf (#13969)
[What is DLF](https://www.alibabacloud.com/product/datalake-formation)

This PR is a preparation for support DLF, with some changes of multi catalog

1. Add RuntimeException for most of hive meta store or es client visit operation.
2. Add DLF related dependencies.
3. Move the checks of es catalog properties to the analysis phase of creating es catalog

TODO(in next PR):

1. Refactor the `getSplit` method to support not only hdfs, but s3-compatible object storage.
2. Finish the implementation of supporting DLF
2022-11-06 10:01:57 +08:00
477b28efac [deps](fe)upgrade commons-text to 1.10.0 (#13562) 2022-10-23 23:30:02 +08:00
b042ef9765 [chore](macOS) Fix the issues with protoc and protoc-gen-grpc-java on M1 (#13571)
There are some errors occur when building FE by JDK (arm64) on M1 because the dependencies protoc and grpc-java doesn't support M1. 
#13563 modified the build.sh to fix this issues by adding -Dos.arch=x86_64 to build command.
However, if some one executes `mvn clean package -DskipTests=true` under the folder fe, the errors will occur again.

This PR introduces a better way to fix them.
2022-10-23 14:10:46 +08:00
50ae9e6b19 [enhancement](planner) support select table sample (#10170)
### Motivation
TABLESAMPLE allows you to limit the number of rows from a table in the FROM clause.

Used for data detection, quick verification of the accuracy of SQL, table statistics collection.

### Grammar
```
[TABLET tids] TABLESAMPLE n [ROWS | PERCENT] [REPEATABLE seek]
```

Limit the number of rows read from the table in the FROM clause, 
select a number of Tablets pseudo-randomly from the table according to the specified number of rows or percentages, 
and specify the number of seeds in REPEATABLE to return the selected samples again. 
In addition, can also manually specify the TableID, 
Note that this can only be used for OLAP tables.

### Example
Q1:
```
SELECT * FROM t1 TABLET(10001,10002) limit 1000;
```
explain:
```
partitions=1/1, tablets=2/12, tabletList=10001,10002
```
Select the specified tabletID of the t1.

Q2:
```
SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 1 limit 1000;
```
explain:
```
partitions=1/1, tablets=3/12, tabletList=10001,10002,10003
```

Q3:
```
SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 2 limit 1000;
```
explain:
```
partitions=1/1, tablets=3/12, tabletList=10002,10003,10004
```

Pseudo-randomly sample 1000 rows in t1.
Note that several Tablets are actually selected according to the statistics of the table, 
and the total number of selected Tablet rows may be greater than 1000, 
so if you want to explicitly return 1000 rows, you need to add Limit.

### Design
First, determine how many rows to sample from each partition according to the number of partitions.
Then determine the number of Tablets to be selected for each partition according to the average number of rows of Tablet,
If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
If seek is specified, it will be selected sequentially from the seek tablet of the partition.
And add the manually specified Tablet id to the selected Tablet.
2022-10-14 15:05:23 +08:00
0a95ebf602 [feature](Nereids) Add scalar function code generator and some function trait (#12671)
This pr did these things:
1. Change the nullable mode of 'from_unixtime' and 'parse_url' from DEPEND_ON_ARGUMENT to ALWAYS_NULLABLE, which nullable configuration was missing previously.
2. Add some new interfaces for origin NullableMode. This change inspired by the grammar of scala's mix-in trait, It help us to quickly understand the traits of function without read the lengthy procedural code and save the work to write some template code, like `class Substring extends ScalarFunction implements ImplicitCastInputTypes, PropagateNullable`. These are the interfaces:
   - PropagateNullable: equals to NullableMode.DEPEND_ON_ARGUMENT
   - AlwaysNullable: equals to NullableMode.ALWAYS_NULLABLE
   - AlwaysNotNullable: equals to NullableMode.ALWAYS_NOT_NULLABLE
   - others ComputeNullable: equals to NullableMode.CUSTOM
3. Add `GenerateScalarFunction` to generate nereids-style function code from legacy functions, but not actual generate any new function class yet, because the function's trait is not ready for use. I need add some traits for the legacy function's CompareMode and NonDeterministic, this thought is the same as ComputeNullable.
2022-09-16 21:27:30 +08:00
d7ffb4e26e [deps](httpv2)upgrade springboot version to 2.7.3 (#11963) 2022-08-24 08:49:57 +08:00