doris

Author	SHA1	Message	Date
morrySnow	da5c78019c	[opt](fe-ui) support read hardware info from aarch64 MacOS (#23708 ) update the version of oshi and jna to support read hardware info from aarch64 MacOS	2023-08-31 18:16:33 +08:00
zy-kkk	5b641ebd40	[feature-wip](catalog) support deltalake catalog step1-metadata (#22493 )	2023-08-29 10:31:37 +08:00
Calvin Kirs	e17779f193	[Dependency](fe)Upgrade dependency version (#22496 ) Upgrade guava to 32.1.2-jre Set ck dependency scope to provided Upgrade okio to 3.4.0 Upgrade snake yaml to 1.33 Upgrade aws-java-sdk to 1.12.519 Upgrade hadoop to 3.3.6	2023-08-11 10:54:37 +08:00
AKIRA	582acad8a1	[feature](stats) Enable period time with cron expr (#22095 ) Support such grammar ANALYZE TABLE test WITH CRON "* * * * * ?" Such job would be scheduled as the cron expr specifie, but natively support minute-level schedule only	2023-07-26 17:25:57 +08:00
AKIRA	964ac4e601	[opt](nereids) Retry when async analyze task failed (#21889 ) Retry at most 5 times when async analyze task execution failed	2023-07-26 17:16:56 +08:00
zhangdong	7fcf702081	[improvement](multi catalog)paimon support filesystem metastore (#21910 ) 1.support filesystem metastore 2.support predicate and project when split 3.fix partition table query error todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem doc pr: #21966	2023-07-24 22:02:57 +08:00
Calvin Kirs	30b1b93353	[dependency](fe)Dependency version upgrade (#21191 ) Keep hadoop-aliyun version consistent with hadoop main version (3.3.5) upgrade jackson to 2.14.3 upgrade netty version to 4.1.94.final binding check.freamework version to 3.32.0 upgrade snappy-java to 1.1.10.1 upgrade hudi version to 0.13.1 upgrade spring version to 2.7.13 upgrade orc version to 1.8.4 revert nonsensical changes	2023-06-29 10:01:33 +08:00
slothever	d4240ac21b	[fix](multi-catalog)add oss sdk, supported oss properties (#21029 )	2023-06-26 13:00:44 +08:00
Siyang Tang	46f0295b78	[feature](load-refactor-with-tvf) S3 load with S3 tvf and native insert (#19937 )	2023-06-25 17:45:31 +08:00
lexluo09	57656b2459	[Enhancement](java-udf) java-udf module split to sub modules (#20185 ) The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner. Co-authored-by: lexluo <lexluo@tencent.com>	2023-06-13 09:41:22 +08:00
Ashin Gau	9a83d78dfe	[Enhancement](hudi) support hudi mor table, step2 follow #19909 (#20570 ) PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries. Key Implementations: 1. Use hudi meta information to generate the table schema, not from hive client. 2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory. 3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog. 4. Read the COW table originally from c++. 5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.	2023-06-10 12:25:53 +08:00
yuxuan-luo	fe63a0a3bb	[Feature](multi-catalog)support paimon catalog (#19681 ) CREATE CATALOG paimon_n2 PROPERTIES ( "dfs.ha.namenodes.HDFS1006531" = "nn2,nn1", "dfs.namenode.rpc-address.HDFS1006531.nn2" = "172.16.65.xx:4007", "dfs.namenode.rpc-address.HDFS1006531.nn1" = "172.16.65.xx:4007", "hive.metastore.uris" = "thrift://172.16.65.xx:7004", "type" = "paimon", "dfs.nameservices" = "HDFS1006531", "hadoop.username" = "hadoop", "paimon.catalog.type" = "hms", "warehouse" = "hdfs://HDFS1006531/data/paimon1", "dfs.client.failover.proxy.provider.HDFS1006531" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" );	2023-06-06 15:08:30 +08:00
slothever	b7fc17da68	[feature-wip](multi-catalog)(step2)support read max compute data by JNI (#19819 ) Issue Number: #19679	2023-06-05 22:10:08 +08:00
Nick Young	499f443779	[feature](iceberg) Support read iceberg data on gcs (#19815 )	2023-05-20 12:40:03 +08:00
luozenglin	f68d3a660e	[improvement](opentelemetry) upgrade opentelemetry jar to v1.26.0 and opentelemetry-cpp to v1.8.3 (#19733 ) why upgrade? anything wrong? Try to fix the problem about opentelemetry::v1::ext::http::client::curl::HttpOperation::Send(), I have updated the pr info.	2023-05-18 18:46:20 +08:00
slothever	3f2d1ae9a4	[feature-wip](multi-catalog)(step1)support connect to max compute (#19606 ) Issue Number: #19679 support connect to max compute metadata by odps sdk	2023-05-16 11:30:27 +08:00
Adonis Ling	ccd22c508a	[chore](fe) Fix the build on Centos 6 (#19255 )	2023-05-06 14:50:56 +08:00
Mingyu Chen	c9fa10ac10	[fix](doc) avoid generate config doc automatically (#19302 ) After #19246, when compilng FE, it will automatically generate Config and Session Variables doc and overwrite the origin one. Need to avoid it because it is not ready to use yet	2023-05-05 20:39:05 +08:00
Mingyu Chen	70236adc1f	[Refactor](doc)(config)(variable) use script to generate doc for FE config and session variables (#19246 ) The document of configs(FE and BE) and session variables is hard to maintain. Because developer need to modify both code and document. And you can see that some of config's document is missing. So I plan to write the document of config or variables directly in code, and using script to generate document automatically. How To This CL mainly changes: Add field in Config and Session Variables' annaotion description: The description of the config or variable item. It is a String array. And first element is in Chinese, second is in English options: the valid options if the config or variable is enum. Add a scripts docs/generate-config-and-variable-doc.sh Simple run sh docs/generate-config-and-variable-doc.sh and it will generate docs of FE config and variables, And save it under docs/admin-manual/config/fe-config.md and docs/advanced/variables.md, both in Chinese and in English. And there are template markdowns for this script to read and replace with real doc content. TODO Too many description need to be filled. I will finish them in next PR. And now the origin doc remain unchanged. Find a way to check the description field of config and variables, to make sure we won't missing it. Generate doc for BE config.	2023-05-05 14:42:43 +08:00
Calvin Kirs	5459cd9c30	[Improve](fe)Upgrade dependencies and optimize jar package management (#18882 ) bind netty-version to 4.1.89-final bind jettison to 1.5.4 upgrade hadoop version to 3.3.5 upgrade range-plugins-common to 2.4.0 bind bcprov-jdk15on to 2.4.0 upgrade and bind woodstox to 6.5.1 upgrade and bind kerby to 2.0.3 upgrade hudi to 0.13.0 upgrade parquet to 1.13.0 upgrade maven-source-plugin to 3.2.1 upgrade maven-assembly-plugin to 3.3.0 upgrade maven-javadoc-plugin to 3.3.2 upgrade maven-shade-plugin to 3.3.4 upgrade maven-clean-plugin to 3.1.0 Remove meaningless plugins Optimize doris maven path Unify the Java modules for management in fe	2023-05-04 10:07:37 +08:00
Calvin Kirs	57982ddc46	[Fix](catalog)Fix hudi-catalog get file split error (#18644 ) (#18673 ) `hudi-common` depends on `parque-avro`, but the dependency scope is `provide`. When we use `hudi-catalog`, `HoodieAvroWriteSupport` will be called. This method depends on `parque-avro`, so it will generate ClassNotFound Describe your changes.	2023-04-16 21:56:14 +08:00
Calvin Kirs	b39846c2c7	[Fix](Catalog)Delete duplicate defined dependencies to avoid class loading exceptions (#18628 ) `iceberg-hive-metastore` and `hive-storage-api` have been defined in hive-catalog-shade, and some classes in the shade have been renamed, so we cannot declare them again. The classes in the shade should be kept. The `hive-metastore-api` used in `ranger` can also use the jar in the `shade`. Since we rename the tool class used inside the `hive`, this has no effect.	2023-04-13 22:12:19 +08:00
Calvin Kirs	75fd4b70fa	[improve](fe)Optimize fe binary package packaging (#18554 )	2023-04-12 12:58:45 +08:00
Calvin Kirs	5f981b0b1f	[fix](catalog)Use hive-catalog-shade to solve thrift version compatibility issues (#18504 ) `Hive 3` uses the `thrift-0.9.3` package, and `Doris` uses the `thrift-0.16.0` package. These two packages are not compatible, so we use the `hive-sahde` package to manage hive dependencies in a unified way. This jar package renames the `thrift` class , so the problem of conflict can be resolved.	2023-04-11 13:19:39 +08:00
slothever	d0219180a9	[feature-wip](multi-catalog)add properties converter (#18005 ) Refactor properties of each cloud , use property converter to convert properties accessing fe metadata and be data. user docs #18287	2023-04-06 09:55:30 +08:00
Mingyu Chen	c2dd005efb	[fix](chore) fix BE compile and FE protoc artifact issue (#18120 ) add <optional> head to solve the compilation issue use 3.12.9 as the protoc.artifact's version, because there is no 3.12.21 See: https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/ Remove --show-progress arguments of wget because it is not supported in low version wget	2023-03-27 08:53:42 +08:00
zhangdong	93cfd5cd2b	[Enhance](ComputeNode)support k8s watch (#17442 ) Describe your changes. 1.Add the watch mechanism to listen for changes in k8s statefulSet and update nodes in time. 2.For broker, there is only one name by default when using deployManager 3.Refactoring code makes it easier to understand and maintain 4.Fix jar package conflicts between okhttp-ws and okhttp Previously, the logic of k8sDeployManager.getGroupHostInfos was to call the endpoints () interface of k8s, which would cause if the pod was unexpectedly restarted, k8sDeployManager would delete the pod before the restart from the fe or be list and add the pod after the restart to the fe or be list, which obviously does not meet our expectations. Now, after fqdn is enabled, we call the statefulSets() interface of k8s to listen for the number of copies to determine whether we need to be online or offline. In addition, the watch mechanism is added to avoid the possible A-B-A problem caused by timed polling. For the sake of stability, when the watch mechanism does not receive messages for a period of time, it will be degraded to the polling mode. Now several environment variables have been added，ENV_FE_STATEFULSET，ENV_FE_OBSERVER_STATEFULSET，ENV_BE_STATEFULSET，ENV_BROKER_STATEFULSET，ENV_CN_STATEFULSET For statefulsetName，One-to-one correspondence with ENV_FE_SERVICE，ENV_FE_OBSERVER_SERVICE，ENV_BE_SERVICE，ENV_BROKER_SERVICE，ENV_CN_SERVICE，If a serviceName is configured, the corresponding statefulsetName must be configured, otherwise the program cannot be started.	2023-03-20 11:36:32 +08:00
morrySnow	295b26db00	[chore](fe) update aspectj-maven-plugin to 1.14.0 version (#17890 ) In #17797 , we introduced aspectj to help log exception easily. However, the plugin version 1.11 do not support jdk9 and later. For support compile FE with jdk11 update aspectj-maven-plugin to 1.14.0 version add new dependency org.aspectj.aspectjrt 1.9.7 to fe-core according to: aspectj java version compatibility aspectj-maven-plugin issue aspectj release note intro to aspectj	2023-03-19 14:50:09 +08:00
NetShrimp	0ec10d4836	[Enhancement](fe exception) write a java annotation to catch throwable from a method and print log (#17797 ) How it works? Aspectj is used to implement the aspect function of annotations. During the compilation process, the aspectj-maven-plugin plugin will automatically weave the code with aspect annotations into the generated classes file. When to use to? When a method wants to add a try catch to save exception information, the LogException annotation can be used. When there is a method that does not allow errors, the NoException annotation can be used. What is the result when adding this annotation? Use the LogException annotation to automatically capture exceptions into the Log file, and the code can be more concise. Use the NoException annotation to automatically capture the exception to the Log file and exit the program when an exception occurs.	2023-03-17 08:52:27 +08:00
Adonis Ling	310bdb60f4	[chore](maven) Prefer protoc in thirdparty to the one in maven artifacts (#17596 ) The prebuilt protoc-gen-grpc-java binary uses glibc on Linux and the version of glibc which Centos 6 uses is too old.	2023-03-09 16:21:38 +08:00
Calvin Kirs	b6128f9b65	[dependenct](fe) Replace jackson-mapper-asl with fastxml-jsckson (#17303 )	2023-03-09 09:35:58 +08:00
Calvin Kirs	d908d5fe01	[dependency](fe)Dependency Upgrade (#17377 ) * Upgrade log4j to 2.X - binding log4j version to 2.18.0 - used log4j-1.2-api complete smooth upgrade * Upgrade filerupload to 1.5 * Upgrade commons-io to 2.7 * Upgrade commons-compress to 1.22 * Upgrade gson to 2.8.9 * Upgrade guava to 30.0-jre * Binding jackson version to 2.14.2 * Upgrade netty-all to 4.1.89.final * Upgrade protobuf to 3.21.12 * Upgrade kafka-clints to 3.4.0 * Upgrade calcite version to 1.33.0 * Upgrade aws-java-sdk to 1.12.302 * Upgrade hadoop to 3.3.4 * Upgrade zookeeper to 3.4.14 * Binding tomcat-embed-core to 8.5.86 * Upgrade apache parent pom to 25 * Use hive-exec-core as a hive dependency, add the missing jar-hive-serde separately * Basic public dependencies are extracted to parent dependencies * Use jackson uniformly as the basic json tool * Remove springloaded, spring-boot-devtools has the same functionality * Modify the spark-related dependency scope to provide, which should be provided at runtime	2023-03-08 14:28:40 +08:00
Tiewei Fang	48c2d806d7	[enhencement](jdbc catalog) Use Druid instead of HikariCP in JdbcClient (#17395 ) This pr does three things: 1. Use Druid instead of HikariCP in JdbcClient 2. when download udf jar, add the name of the jar package after the local file name. 3. refactor some jdbcResource code	2023-03-07 08:51:10 +08:00
Yulei-Yang	449f2953c9	[Improvement](auth)(step-1) add ranger authorizer for hms catalog (#17153 )	2023-03-03 09:45:08 +08:00
slothever	51bbae27b8	[feature-wip](iceberg) add dlf and glue catalog impl for iceberg catalog (#16602 ) iceberg catalog supports DLF on Alibaba Cloud and AWS Glue Catalog	2023-02-23 14:02:41 +08:00
Adonis Ling	d56043ab5a	[feature-wip](MTMV) Support setting variables in query statement (#16060 ) ## Use case ```shell mysql> CREATE TABLE t_user ( -> event_day DATE, -> id bigint, -> username varchar(20) -> ) -> DISTRIBUTED BY HASH(id) BUCKETS 10 -> PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.07 sec) mysql> CREATE TABLE t_user_pv( -> event_day DATE, -> id bigint, -> pv bigint -> ) -> DISTRIBUTED BY HASH(id) BUCKETS 10 -> PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.09 sec) mysql> CREATE MATERIALIZED VIEW mv -> BUILD IMMEDIATE REFRESH COMPLETE -> KEY (username) -> DISTRIBUTED BY HASH(username) BUCKETS 10 -> PROPERTIES ('replication_num' = '1') -> AS SELECT /+ SET_VAR(exec_mem_limit=1048576, query_timeout=3600) / t1.username ,t2.pv FROM t_user t1 LEFT JOIN t_user_pv t2 on t1.id = t2.id; Query OK, 0 rows affected (0.10 sec) ```	2023-01-30 01:05:41 +08:00
jiafeng.zhang	da28d2faee	[deps](http)Upgrade springboot version to 2.7.8 (#16158 ) * Upgrade springboot version to 2.7.8 * fix	2023-01-28 20:13:50 +08:00
Mingyu Chen	726427b795	[refactor](fe) refactor and upgrade dependency tree of FE and support AWS glue catalog (#16046 ) 1. Spark dpp Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module. So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar` will not be moved into `fe/lib`, which reduce the size of FE output. 2. Modify start_fe.sh Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that when loading classes with same qualified name, it will be got from doris-fe.jar firstly. 3. Upgrade hadoop and hive version hadoop: 2.10.2 -> 3.3.3 hive: 2.3.7 -> 3.1.3 4. Override the IHiveMetastoreClient implementations from dependency `ProxyMetaStoreClient.java` for Aliyun DLF. `HiveMetaStoreClient.java` for origin Apache Hive metastore. Because I need to modified some of their method to make them compatible with different version of Hive. 5. Exclude some unused dependencies to reduce the size of FE output Now it is only 370MB (Before is 600MB) 6. Upgrade aws-java-sdk version to 1.12.31 7. Support AWS Glue Data Catalog 8. Remove HudiScanNode(no longer support)	2023-01-20 14:42:16 +08:00
jiafeng.zhang	d48abd91df	[deps](fe)upgrade deps version (#15262 ) upgrade hadoop version to 2.10.2 jackson-databind to 2.14.1	2022-12-24 22:18:10 +08:00
jiafeng.zhang	e8bac706d3	[deps](FE)Upgrade the velocity version that hive-exec depends on to 2.3 (#15067 )	2022-12-19 14:20:11 +08:00
Adonis Ling	ef1bb9819a	[feature-wip](MTMV) Support mapping the partition rule of base table to the materialized view (#14930 ) When we create a materialized view for multiple tables, users may not figure out the partition rule for the materialized view, because the query result can be too complex. If the query result doesn't match one of the partition rules, the insertion will fail. We can resolve this issue by mapping the partition rule of base table to the materialized view. As a result, users don't need specify the partition rules and query results are all valid because they are retrieved from the partitions of the base table. ## Use case mysql> CREATE TABLE t1 (pk INT NOT NULL, v1 INT SUM) PARTITION BY RANGE(pk) ( -> PARTITION p1 VALUES LESS THAN ('10'), -> PARTITION p2 VALUES LESS THAN ('90') -> ) -> DISTRIBUTED BY HASH(pk) -> PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.04 sec) mysql> CREATE TABLE t2 (pk INT NOT NULL, v2 INT SUM) PARTITION BY LIST(pk) ( -> PARTITION odd VALUES IN ('10', '30', '50', '70', '90'), -> PARTITION even VALUES IN ('20', '40', '60', '80') -> ) -> DISTRIBUTED BY HASH(pk) -> PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.02 sec) mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE -> KEY (mpk) PARTITION BY (t1.pk) DISTRIBUTED BY HASH(mpk) PROPERTIES ('replication_num' = '1') -> AS SELECT t1.pk AS mpk, v1, v2 FROM t1, t2 WHERE t1.pk = t2.pk; Query OK, 0 rows affected (0.10 sec) mysql> SHOW CREATE TABLE mv; +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Materialized View \| Create Materialized View \| +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| mv \| CREATE MATERIALIZED VIEW `mv` BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND KEY(`mpk`) PARTITION BY RANGE(`mpk`) (PARTITION p1 VALUES [("-2147483648"), ("10")), PARTITION p2 VALUES [("10"), ("90"))) DISTRIBUTED BY HASH(`mpk`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ) AS SELECT `t1`.`pk` AS `mpk`, `v1` AS `v1`, `v2` AS `v2` FROM `default_cluster:dev`.`t1` , `default_cluster:dev`.`t2` WHERE `t1`.`pk` = `t2`.`pk`; \| +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)	2022-12-09 22:47:21 +08:00
Adonis Ling	ec2539e2a3	[chore](macOS) Resolve the issue with missing python program (#14864 )	2022-12-07 15:30:12 +08:00
Mingyu Chen	ed96442b85	[fix](multi-catalog) fix persist issue about jdbc catalog and class loader issue #14794 Fix a bug that JDBC catalog/database/table should be add to GsonUtil Fix a class loader issue that sometime it will cause ClassNotFoundException Fix regression test to use different catalog name. Comment out 2 regression tests: regression-test/suites/query_p0/system/test_query_sys.groovy regression-test/suites/statistics/alter_col_stats.groovy Need to be fixed later	2022-12-05 09:05:13 +08:00
Mingyu Chen	ce95da8dfb	[improvement](multi-catalog) support specify hadoop username (#14734 ) Support setting "hadoop.username" property when creating hms catalog.	2022-12-04 21:09:39 +08:00
Mingyu Chen	fb5a3e118a	[feature-wip](dlf) prepare to support aliyun dlf (#13969 ) [What is DLF](https://www.alibabacloud.com/product/datalake-formation) This PR is a preparation for support DLF, with some changes of multi catalog 1. Add RuntimeException for most of hive meta store or es client visit operation. 2. Add DLF related dependencies. 3. Move the checks of es catalog properties to the analysis phase of creating es catalog TODO(in next PR): 1. Refactor the `getSplit` method to support not only hdfs, but s3-compatible object storage. 2. Finish the implementation of supporting DLF	2022-11-06 10:01:57 +08:00
jiafeng.zhang	477b28efac	[deps](fe)upgrade commons-text to 1.10.0 (#13562 )	2022-10-23 23:30:02 +08:00
Adonis Ling	b042ef9765	[chore](macOS) Fix the issues with protoc and protoc-gen-grpc-java on M1 (#13571 ) There are some errors occur when building FE by JDK (arm64) on M1 because the dependencies protoc and grpc-java doesn't support M1. #13563 modified the build.sh to fix this issues by adding -Dos.arch=x86_64 to build command. However, if some one executes `mvn clean package -DskipTests=true` under the folder fe, the errors will occur again. This PR introduces a better way to fix them.	2022-10-23 14:10:46 +08:00
Xinyi Zou	50ae9e6b19	[enhancement](planner) support select table sample (#10170 ) ### Motivation TABLESAMPLE allows you to limit the number of rows from a table in the FROM clause. Used for data detection, quick verification of the accuracy of SQL, table statistics collection. ### Grammar ``` [TABLET tids] TABLESAMPLE n [ROWS \| PERCENT] [REPEATABLE seek] ``` Limit the number of rows read from the table in the FROM clause, select a number of Tablets pseudo-randomly from the table according to the specified number of rows or percentages, and specify the number of seeds in REPEATABLE to return the selected samples again. In addition, can also manually specify the TableID, Note that this can only be used for OLAP tables. ### Example Q1: ``` SELECT * FROM t1 TABLET(10001,10002) limit 1000; ``` explain: ``` partitions=1/1, tablets=2/12, tabletList=10001,10002 ``` Select the specified tabletID of the t1. Q2: ``` SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 1 limit 1000; ``` explain: ``` partitions=1/1, tablets=3/12, tabletList=10001,10002,10003 ``` Q3: ``` SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 2 limit 1000; ``` explain: ``` partitions=1/1, tablets=3/12, tabletList=10002,10003,10004 ``` Pseudo-randomly sample 1000 rows in t1. Note that several Tablets are actually selected according to the statistics of the table, and the total number of selected Tablet rows may be greater than 1000, so if you want to explicitly return 1000 rows, you need to add Limit. ### Design First, determine how many rows to sample from each partition according to the number of partitions. Then determine the number of Tablets to be selected for each partition according to the average number of rows of Tablet, If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition. If seek is specified, it will be selected sequentially from the seek tablet of the partition. And add the manually specified Tablet id to the selected Tablet.	2022-10-14 15:05:23 +08:00
924060929	0a95ebf602	[feature](Nereids) Add scalar function code generator and some function trait (#12671 ) This pr did these things: 1. Change the nullable mode of 'from_unixtime' and 'parse_url' from DEPEND_ON_ARGUMENT to ALWAYS_NULLABLE, which nullable configuration was missing previously. 2. Add some new interfaces for origin NullableMode. This change inspired by the grammar of scala's mix-in trait, It help us to quickly understand the traits of function without read the lengthy procedural code and save the work to write some template code, like `class Substring extends ScalarFunction implements ImplicitCastInputTypes, PropagateNullable`. These are the interfaces: - PropagateNullable: equals to NullableMode.DEPEND_ON_ARGUMENT - AlwaysNullable: equals to NullableMode.ALWAYS_NULLABLE - AlwaysNotNullable: equals to NullableMode.ALWAYS_NOT_NULLABLE - others ComputeNullable: equals to NullableMode.CUSTOM 3. Add `GenerateScalarFunction` to generate nereids-style function code from legacy functions, but not actual generate any new function class yet, because the function's trait is not ready for use. I need add some traits for the legacy function's CompareMode and NonDeterministic, this thought is the same as ComputeNullable.	2022-09-16 21:27:30 +08:00
jiafeng.zhang	d7ffb4e26e	[deps](httpv2)upgrade springboot version to 2.7.3 (#11963 )	2022-08-24 08:49:57 +08:00

1 2 3

110 Commits