doris

Author	SHA1	Message	Date
Qi Chen	84e9a14063	[Fix](hive-writer) Fix partition column orders issue when the partition fields inserted into the target table are inconsistent with the field order of the query source table and the schema field order of the query source table. (#35543 ) ## Proposed changes backport #35347 ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 18:11:55 +08:00
Qi Chen	68eda58a8c	[Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335 ) The following sql and when the dictionary column contains functions related to null, the results will be incorrect. ``` select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null'; ``` ``` select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null' ``` ``` select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'; ```	2024-05-27 15:25:29 +08:00
wuwenchi	f98ed4e4c5	[bugfix](hive)Misspelling of class names (#34981 )	2024-05-27 15:24:38 +08:00
wuwenchi	b1795d44ec	[bugfix](hive)fix testcase for test_hive_write_different_path (#35209 ) Hive's test environment uses docker, so when using 127.0.0.1, BE will write the file to the docker of its own machine. But if FE and are not on the same machine, FE cannot read this file because it can only read docker on its own machine. Therefore, the address 127.0.0.1 cannot be used in the test environment.	2024-05-27 15:24:30 +08:00
Tiewei Fang	f6beeb1ddd	[Enhencement](tvf) select tvf supports using resource (#35139 ) Create an S3/HDFS resource that TVF can use it directly to access the data source.	2024-05-24 16:23:58 +08:00
Mingyu Chen	adc364a6fd	[feature](Paimon) support deletion vector for Paimon naive reader (#34743 ) (#35241 ) bp #34743 Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>	2024-05-23 00:01:30 +08:00
zy-kkk	24990383ff	[refactor](jdbc catalog) split clickhouse jdbc executor (#34794 ) (#35174 ) pick master #34794	2024-05-22 19:09:05 +08:00
Qi Chen	291cf57c54	[Configurations](multi-catalog) Add `enable_parquet_filter_by_min_max` and `enable_orc_filter_by_min_max` Session variables. (#35012 ) (#35164 ) backport #35012	2024-05-22 19:06:12 +08:00
Tiewei Fang	c0fd98abe5	[Fix](tvf) Fix that tvf reading empty files in compressed formats. (#34926 ) 1. Fix the issue with tvf reading empty compressed files. 2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0	2024-05-21 12:59:31 +08:00
Mingyu Chen	22f85be712	[fix](hive-ctas) support create hive table with full quolified name (#34984 ) Before, when executing `create table hive.db.table as select` to create table in hive catalog, if current catalog is not hive catalog, the default engine name will be filled with `olap`, which is wrong. This PR will fill the default engine name base on specified catalog.	2024-05-18 18:42:43 +08:00
wuwenchi	4dd5379951	[bugfix](hive)fix error for writing to hive for 2.1 (#34518 ) mirror #34520	2024-05-14 23:27:29 +08:00
zy-kkk	5a3107442a	[feature](tvf) support query table value function (#34516 ) (#34640 ) This PR supports a Table Value Function called `Query`. He can push a query directly to the catalog source for execution by specifying `catalog` and `query` without parsing by Doris. Doris only receives the results returned by the query. Currently only JDBC Catalog is supported. Example: ``` Doris > desc function query('catalog' = 'mysql','query' = 'select count() as cnt from test.test'); +-------+--------+------+------+---------+-------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +-------+--------+------+------+---------+-------+ \| cnt \| BIGINT \| Yes \| true \| NULL \| NONE \| +-------+--------+------+------+---------+-------+ Doris > select from query('catalog' = 'mysql','query' = 'select count(*) as cnt from test.test'); +----------+ \| cnt \| +----------+ \| 30000000 \| +----------+ ```	2024-05-10 14:29:17 +08:00
Mingyu Chen	3ae3f9d6e1	[opt](catalog) support using loading cache for db/table list in external catalog (#33610 ) (#34596 ) bp #33610	2024-05-09 17:50:39 +08:00
wangbo	39fdc9ba0c	[refactor](executor)Rename workload schedule policy #34497	2024-05-08 08:35:20 +08:00
Qi Chen	99af54f779	[Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146 ) (#34248 ) backport #34146	2024-04-28 19:43:57 +08:00
苏小刚	11039ade7b	[opt](paimon) support mapping Paimon column type "Row" to Doris type "Struct" (#34239 ) backport: #33786	2024-04-28 19:38:50 +08:00
Mingyu Chen	45556686ea	[fix](test) fix some external test cases (#34209 ) Fix some test cases and enable `test_information_schema_external` suite	2024-04-27 23:25:33 +08:00
qiye	414fbd353e	[fix](ES catalog)Make col != '' behavior consistent with SQL (#34151 ) In SQL syntax, `col != ''` equals `col.length() > 0`. It means that this column must exist in ES doc fields and its content is not empty. In this PR, we make a special translation for this binary predicate to keep the behavior of both consistent. --------- Co-authored-by: Luennng <luennng@gmail.com>	2024-04-27 02:29:33 +08:00
苏小刚	0f0c0a266b	[opt](parquet)Skip page with offset index (#33082 ) Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.	2024-04-26 15:06:16 +08:00
Qi Chen	acc2b532e7	[Test](hive-writer) Adjust test_hive_write_partitions regression test to resolve special characters issue with git on windows. (#34026 )	2024-04-26 15:05:47 +08:00
Mingyu Chen	50f9d47e96	[test](hive) run suite cases both in hive2 and hive3 (#33874 ) (#34156 ) bp #33874 Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>	2024-04-26 13:48:09 +08:00
wangbo	03c3419265	[Refactor](executor)Add workload schedule policy table (#33729 )	2024-04-20 20:06:34 +08:00
Mingyu Chen	0e3ad5cd9d	[fix](parquet) fix time zone error(isAdjustedToUTC=true) in parquet reader (#33675 ) (#33924 ) bp (#33675) Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>	2024-04-20 19:06:54 +08:00
苏小刚	1c025c0488	[docker](hive) add hive3 docker compose and modify scripts (#33115 ) add hive3 docker compose from: big-data-europe/docker-hive#56	2024-04-17 23:42:13 +08:00
zy-kkk	1be753ed75	[enhancement](mysql compatible) add user and procs_priv tables to mysql db in all catalogs (#33058 ) Issue Number: close #xxx This PR aims to enhance the compatibility of BI tools (such as Dbeaver, DataGrip) when using the mysql connector to connect to Doris, because some BI tools query some tables in the mysql database. In our tests, the user and procs_priv tables were mainly queried. This PR adds these two tables and adds actual data to the user table. However, please note that most of the fields in the user table are in Doris' own format rather than mysql format, so it can only ensure that the BI tool is querying No error is reported when accessing these tables, which does not guarantee that the data is completely displayed, and the tables under Doris's mysql database do not support data modification. Thanks to @liujiwen-up for assisting in testing	2024-04-17 23:42:12 +08:00
zy-kkk	b035c7ceb4	[fix](catalog) fix resource is not reopen when rename catalog (#33432 ) During the renaming of `JdbcCatalog`, I noticed that the `jdbcClient` was being closed, resulting in exceptions during subsequent queries. This happens because the `removeCatalog` method is invoked when changing the name, which in turn calls the `onClose` method of the catalog. Ideally, the client should not be closed when renaming the catalog. However, to avoid extra checks in the `removeCatalog` method, we can simply execute `onRefresh` in the `addCatalog` method to address this issue.	2024-04-12 15:09:25 +08:00
slothever	18fb8407ae	[feature](insert)use optional location and add hive regression test (#33153 )	2024-04-12 10:38:54 +08:00
slothever	07f296734a	[regression](insert)add hive DDL and CTAS regression case (#32924 ) Issue Number: #31442 dependent on #32824 add ddl(create and drop) test add ctas test add complex type test TODO: bucketed table test truncate test add/drop partition test	2024-04-12 10:24:23 +08:00
slothever	716c146750	[fix](insert)fix hive external return msgs and exception and pass all columns to BE (#32824 ) [fix](insert)fix hive external return msgs and exception and pass all columns to BE	2024-04-12 10:23:52 +08:00
Tiewei Fang	61e214c327	[Fix](Hive-Metastore) fix that if JDBC reads the NULL value, it will cause NPE (#32831 )	2024-04-10 11:55:17 +08:00
Qi Chen	5116724494	[Fix](hive-writer) Fix the issue of block was not copied to do filtering when hive partition writer write block to file. (#32775 ) (#33447 ) backport #32775	2024-04-10 11:42:23 +08:00
Qi Chen	4963d60a07	[Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721 ) (#33446 ) backport #32721.	2024-04-10 11:42:22 +08:00
zy-kkk	fae55e0e46	[Feature](information_schema) add processlist table for information_schema db (#32511 )	2024-04-07 23:24:22 +08:00
Ashin Gau	29556f758e	[fix](parquet) fix time zone error in parquet reader (#33217 ) `isAdjustedToUTC` is exactly the opposite in parquet reader(https://github.com/apache/parquet-format/blob/master/LogicalTypes.md), resulting the time with `isAdjustedToUTC=true` has increased by eight hours(UTC8). The parquet with `isAdjustedToUTC=true` can be produced by spark-sql with the following configuration: ``` --conf spark.sql.session.timeZone=UTC --conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS ``` However, using the following configuration, there's no logical and convert type in parquet meta data, so the time read by doris will also increase by eight hours(UTC8). Users need to set their own UTC time zone in doris(https://doris.apache.org/docs/dev/advanced/time-zone/) ``` --conf spark.sql.session.timeZone=UTC --conf spark.sql.parquet.outputTimestampType=INT96 ```	2024-04-07 23:24:22 +08:00
feiniaofeiafei	b5a1914740	[Fix](nereids) Fix deletestmt getting catalog (#32701 )	2024-03-26 20:29:03 +08:00
yiguolei	7b94cfdba1	Revert "[Fix](tests) add regression tests for trino-connector (#32552 )" This reverts commit 3fc3a4650681cb519405730899a2f22f268b38c1.	2024-03-25 22:38:21 +08:00
Tiewei Fang	3fc3a46506	[Fix](tests) add regression tests for trino-connector (#32552 )	2024-03-25 22:31:55 +08:00
Tiewei Fang	d7a3ff1ddf	[Fix](Outfile) Fix the column type mapping in the orc/parquet file format (#32281 ) \| Doris Type \| Orc Type \| Parquet Type \| \|---------------------\|--------------------\|------------------------\| \| Date \| Long (logical: DATE) \| int32 (Logical: Date) \| \| DateTime \| TIMESTAMP (logical: TIMESTAMP) \| int96 \|	2024-03-22 08:52:16 +08:00
zy-kkk	dea6859e0d	[refactor](jdbc catalog) refactor jdbc catalog get databases logic (#32579 )	2024-03-21 14:07:50 +08:00
zy-kkk	26ed4b69b1	[opt](jdbc catalog) filter jdbc datasource internal database (#32294 )	2024-03-21 14:07:23 +08:00
Mryange	8bd101129a	[behavior change](output) change float output format (#32049 )	2024-03-21 14:07:22 +08:00
yiguolei	85b2c42f76	[Enhancement](jdbc catalog) Add a property to test the connection when creating a Jdbc catalog (#32125 ) (#32531 )	2024-03-21 14:05:59 +08:00
wangbo	258dcfca97	[Refactor](executor)Add information_schema.workload_groups (#32195 ) (#32314 )	2024-03-15 20:46:54 +08:00
wangbo	df5ec16d7c	[Refactor](exectuor)Add schema type table active_queries (#32057 ) * Add schema type table active_queries	2024-03-15 17:57:28 +08:00
zy-kkk	31ee448c87	[test](fix) Fix one missing line of output in out file (#32036 )	2024-03-12 14:17:55 +08:00
zy-kkk	cf6b22c621	[fix](jdbc catalog) fix type conversion error in MySQL JDBC Driver 5.x (#31880 )	2024-03-12 14:07:57 +08:00
wangbo	c5390d00bb	[Improvement]Add schema table backend_active_tasks (#31945 )	2024-03-09 19:55:48 +08:00
zy-kkk	5b00f4fbeb	[improvement](jdbc catalog) opt get db2 schema list & xml type mapping (#31856 ) 1. Trim Schema Names: Adapted the system to remove trailing spaces from DB2 schema names, ensuring compatibility without affecting query operations. 2. XML Mapping: Implemented a feature to directly map XML types to String.	2024-03-07 16:53:19 +08:00
zy-kkk	2e9bd268cd	[improvement](jdbc catalog) support sqlserver timestamp type read (#31805 )	2024-03-06 13:08:04 +08:00
zy-kkk	07224686ef	[feature](jdbc catalog) support db2 jdbc catalog (#31627 )	2024-03-01 14:19:28 +08:00

1 2 3 4

162 Commits