Commit Graph

1079 Commits

Author SHA1 Message Date
289d621faa [improvement](information_schema)Show view definition in information_schema.views. (#45857) (#45930)
backport: https://github.com/apache/doris/pull/45857
2024-12-26 10:11:13 +08:00
1db78d4496 branch-2.1: [fix](hive) fix block decompressor bug #45289 (#45379)
Cherry-picked from #45289

Co-authored-by: Socrates <suyiteng@selectdb.com>
2024-12-14 19:20:55 -08:00
5d3f0a267a [opt](scan) unify the local and remote scan bytes stats for all scanners for 2.1 (#45167)
pick part of #40493

TODO: not working with s3 reader
2024-12-10 14:19:19 +08:00
f0324e2a56 branch-2.1: [improvement](information_schema)Support show default value in information_schema. #44849 (#45080)
Cherry-picked from #44849

Co-authored-by: James <lijibing@selectdb.com>
2024-12-06 14:54:09 +08:00
00c7394813 branch-2.1: [fix](scanner) Delete meaningless finish dependency in schema scanner #44915 (#44963)
Cherry-picked from #44915

Co-authored-by: Gabriel <liwenqiang@selectdb.com>
2024-12-04 13:16:08 +08:00
48e33bfb2a branch-2.1: [fix](hive)Fixed the issue of reading hive table with empty lzo files #43979 (#44063)
Cherry-picked from #43979

Co-authored-by: wuwenchi <wuwenchi@selectdb.com>
2024-11-16 16:14:50 +08:00
1101fbaf04 [fix](column_complex) wrong type of Field returned by ColumnComplex (#43515) (#43860) 2024-11-13 19:07:00 +08:00
2e64491ee3 [branch-2.1](insert-overwrite) Support create partition for auto partition table when insert overwrite (#38628) (#42644)
pick https://github.com/apache/doris/pull/38628
2024-11-13 11:16:00 +08:00
66dcb943c3 branch-2.1: [opt](log) change lzo decompress log to debug level (#43583)
Cherry-picked from #43540

Co-authored-by: morningman <yunyou@selectdb.com>
2024-11-12 14:35:31 +08:00
d209c16d81 [fix](schema-change) fix the bug of alter column nullable when double writing (#41737) (#42352)
pick master #41737 

## problem 

CREATE TABLE t (
    `k1` VARCHAR(30) NOT NULL,
    `v1` INT NOT NULL
)

alter table t modify column `v1` INT NULL

insert into value ('1', 2), ('1', 3);

core dump

## reason

Schema change leads to double writing, during double writing, the two
schemas and slots are as follows
 
```
old tablet schema 
k1 varchar not null
v1 int not null
```
 
```
new tablet scheam
k1 varchar not null
v1 int null
```

```
slot
k1 varchar not null
v1 int not null
v1 int null
```

During the double writing process, when selecting slots through the
schema, only the column names and types were compared, without comparing
the nullable attributes, which led to the selection of the wrong slot.
Since the slot determines the nullable attribute of the block, the
nullable attribute of the columns in the block is different from that of
the columns in the schema, resulting in a core dump.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-24 16:49:46 +08:00
78b6157aa9 [fix](ip/variant) fix information meta (#41871)
fix datatype information meta  for ip/variant (#41666)

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-15 18:01:14 +08:00
90d6985f91 [Fix](bug) Is null predicate get error query result (#41704)
cherry-pick #41668
2024-10-12 13:18:14 +08:00
34429bfa0e [Chore](inverted index) remove useless code of compound filters for inverted index #40258 (#41448)
cherry pick from #40258
2024-09-29 17:27:29 +08:00
0b4552f74b [cherry-pick](branch-2.1) pick hive text write from master (#40537)
## Proposed changes
pick prs:
https://github.com/apache/doris/pull/38549
https://github.com/apache/doris/pull/40183
https://github.com/apache/doris/pull/40315

---------

Co-authored-by: Calvin Kirs <kirs@apache.org>
2024-09-27 20:57:07 +08:00
eb13cd4154 [branch-2.1] Picks "[Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update #40272" (#40964)
picks https://github.com/apache/doris/pull/40272
2024-09-26 22:54:27 +08:00
c6a6adb3a4 [Fix](topn) avoid missmatched row count when upgrading (#40999)
#41000
2024-09-21 08:46:57 +08:00
8e860a26a7 [fix](systable) fix unstable case for partitions table (#40553) (#41043)
bp #40553
2024-09-20 17:13:30 +08:00
e0fac66223 [branch-2.1](fix) fix snappy decompressor bug (#40862)
## Proposed changes
Hadoop snappycodec source :

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/SnappyCodec.cc
Example:
OriginData(The original data will be divided into several large data
block.) :
     large data block1 | large data block2 | large data block3 | ....
The large data block will be divided into several small data block.
Suppose a large data block is divided into three small blocks:
large data block1: | small block1 | small block2 | small block3 |
CompressData: <A [B1 compress(small block1) ] [B2 compress(small block1)
] [B3 compress(small block1)]>

A : original length of the current block of large data block.
sizeof(A) = 4 bytes.
A = length(small block1) + length(small block2) + length(small block3)
Bx : length of  small data block bx.
sizeof(Bx) = 4 bytes.
Bx = length(compress(small blockx))
2024-09-20 11:57:14 +08:00
b8bc9b699c [fix](scan) Incorrect scan keys lead to wrong query results. (#40814) (#40971)
## Proposed changes
pick #40814
```
mysql [doris_14555]>select * from table_9436528_3;
+------+------+------+------+------------------------+--------------------+------+
| col1 | col2 | col3 | col5 | col4                   | col6               | col7 |
+------+------+------+------+------------------------+--------------------+------+
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.904282 | NULL |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |   23423423.0324234 | NULL |
| -100 |    0 |  -82 |    0 | 2023-11-11 10:49:43.00 |   840968969.872149 | NULL |
```
wrong result:
```
mysql [doris_14555]>select * from table_9436528_3 where col1 <= -100 and col2 in (true, false) and col3 = -82;
+------+------+------+------+------------------------+--------------------+------+
| col1 | col2 | col3 | col5 | col4                   | col6               | col7 |
+------+------+------+------+------------------------+--------------------+------+
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.904282 | NULL |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |   23423423.0324234 | NULL |
+------+------+------+------+------------------------+--------------------+------+
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-09-19 22:01:02 +08:00
b52b572ade [branch-2.1](memory) When Load ends, check memory tracker value returns is equal to 0 (#40850)
pick
#38960
#39908
#40043
#40092
#40016
#40439

---------

Co-authored-by: hui lai <1353307710@qq.com>
Co-authored-by: yiguolei <676222867@qq.com>
2024-09-15 23:47:53 +08:00
7851563829 [fix](brpc_client_cache) resolve hostname in DNS cache before passing to brpc (#40074) (#40786)
backport #40074
2024-09-13 14:28:01 +08:00
3604d63184 [Branch 2.1] backport systable PR (#34384,#40153,#40456,#40455,#40568) (#40687)
backport
https://github.com/apache/doris/pull/40568
https://github.com/apache/doris/pull/40455
https://github.com/apache/doris/pull/40456
https://github.com/apache/doris/pull/40153
https://github.com/apache/doris/pull/34384

Test result:
2024-09-11 11:00:45.618 INFO [suite-thread-1] (SuiteContext.groovy:309)
- Recover original connection
2024-09-11 11:00:45.619 INFO [suite-thread-1] (Suite.groovy:359) -
Execute sql: REVOKE SELECT_PRIV ON
test_partitions_schema_db.duplicate_table FROM partitions_user
2024-09-11 11:00:45.625 INFO [suite-thread-1] (SuiteContext.groovy:299)
- Create new connection for user 'partitions_user'
2024-09-11 11:00:45.632 INFO [suite-thread-1] (Suite.groovy:1162) -
Execute tag: select_check_5, sql: select
TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,PARTITION_NAME,SUBPARTITION_NAME,PARTITION_ORDINAL_POSITION,SUBPARTITION_ORDINAL_POSITION,PARTITION_METHOD,SUBPARTITION_METHOD,PARTITION_EXPRESSION,SUBPARTITION_EXPRESSION,PARTITION_DESCRIPTION,TABLE_ROWS,AVG_ROW_LENGTH,DATA_LENGTH,MAX_DATA_LENGTH,INDEX_LENGTH,DATA_FREE,CHECKSUM,PARTITION_COMMENT,NODEGROUP,TABLESPACE_NAME
from information_schema.partitions where
table_schema="test_partitions_schema_db" order by
TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,PARTITION_NAME,SUBPARTITION_NAME,PARTITION_ORDINAL_POSITION,SUBPARTITION_ORDINAL_POSITION,PARTITION_METHOD,SUBPARTITION_METHOD,PARTITION_EXPRESSION,SUBPARTITION_EXPRESSION,PARTITION_DESCRIPTION,TABLE_ROWS,AVG_ROW_LENGTH,DATA_LENGTH,MAX_DATA_LENGTH,INDEX_LENGTH,DATA_FREE,CHECKSUM,PARTITION_COMMENT,NODEGROUP,TABLESPACE_NAME
2024-09-11 11:00:45.644 INFO [suite-thread-1] (SuiteContext.groovy:309)
- Recover original connection
2024-09-11 11:00:45.645 INFO [suite-thread-1] (ScriptContext.groovy:120)
- Run test_partitions_schema in
/root/doris/workspace/doris/regression-test/suites/query_p0/system/test_partitions_schema.groovy
succeed
2024-09-11 11:00:45.652 INFO [main] (RegressionTest.groovy:259) - Start
to run single scripts
2024-09-11 11:01:10.321 INFO [main] (RegressionTest.groovy:380) -
Success suites:

/root/doris/workspace/doris/regression-test/suites/query_p0/system/test_partitions_schema.groovy:
group=default,p0, name=test_partitions_schema
2024-09-11 11:01:10.322 INFO [main] (RegressionTest.groovy:459) - All
suites success.
 ____   _    ____ ____  _____ ____
|  _ \ / \  / ___/ ___|| ____|  _ \
| |_) / _ \ \___ \___ \|  _| | | | |
|  __/ ___ \ ___) |__) | |___| |_| |
|_| /_/   \_\____/____/|_____|____/

2024-09-11 11:01:10.322 INFO [main] (RegressionTest.groovy:410) - Test 1
suites, failed 0 suites, fatal 0 scripts, skipped 0 scripts
2024-09-11 11:01:10.322 INFO [main] (RegressionTest.groovy:119) - Test
finished


2024-09-11 11:03:00.712 INFO [suite-thread-1] (Suite.groovy:1162) -
Execute tag: select_check_5, sql: select * from
information_schema.table_options ORDER BY
TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,TABLE_MODEL,TABLE_MODEL_KEY,DISTRIBUTE_KEY,DISTRIBUTE_TYPE,BUCKETS_NUM,PARTITION_NUM;
2024-09-11 11:03:00.729 INFO [suite-thread-1] (SuiteContext.groovy:309)
- Recover original connection
2024-09-11 11:03:00.731 INFO [suite-thread-1] (ScriptContext.groovy:120)
- Run test_table_options in
/root/doris/workspace/doris/regression-test/suites/query_p0/system/test_table_options.groovy
succeed
2024-09-11 11:03:04.817 INFO [main] (RegressionTest.groovy:259) - Start
to run single scripts
2024-09-11 11:03:28.741 INFO [main] (RegressionTest.groovy:380) -
Success suites:

/root/doris/workspace/doris/regression-test/suites/query_p0/system/test_table_options.groovy:
group=default,p0, name=test_table_options
2024-09-11 11:03:28.742 INFO [main] (RegressionTest.groovy:459) - All
suites success.
 ____   _    ____ ____  _____ ____
|  _ \ / \  / ___/ ___|| ____|  _ \
| |_) / _ \ \___ \___ \|  _| | | | |
|  __/ ___ \ ___) |__) | |___| |_| |
|_| /_/   \_\____/____/|_____|____/

2024-09-11 11:03:28.742 INFO [main] (RegressionTest.groovy:410) - Test 1
suites, failed 0 suites, fatal 0 scripts, skipped 0 scripts
2024-09-11 11:03:28.742 INFO [main] (RegressionTest.groovy:119) - Test
finished


*************************** 7. row ***************************
             PartitionId: 18035
           PartitionName: p100
          VisibleVersion: 2
      VisibleVersionTime: 2024-09-11 10:59:28
                   State: NORMAL
            PartitionKey: col_1
Range: [types: [INT]; keys: [83647]; ..types: [INT]; keys: [2147483647];
)
         DistributionKey: pk
                 Buckets: 10
          ReplicationNum: 1
           StorageMedium: HDD
            CooldownTime: 9999-12-31 15:59:59
     RemoteStoragePolicy: 
LastConsistencyCheckTime: NULL
                DataSize: 2.872 KB
              IsInMemory: false
       ReplicaAllocation: tag.location.default: 1
               IsMutable: true
      SyncWithBaseTables: true
            UnsyncTables: NULL
        CommittedVersion: 2
                RowCount: 4
7 rows in set (0.01 sec)

---------

Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2024-09-12 11:50:09 +08:00
8708fae420 [fix](ES Catalog)Support parse single value for array column (#40614) (#40660)
bp #40614
2024-09-11 17:26:48 +08:00
314f6ae823 [fix](ES Catalog)Fix int parse error when querying by doc_values (#40385) (#40521)
bp #40385
2024-09-09 14:29:21 +08:00
92752b90e7 [feature](metacache) add system table catalog_meta_cache_statistics #40155 (#40210)
bp #40155
2024-09-02 23:23:35 +08:00
ca07a00c93 Revert "[branch-2.1](hive) support hive write text table (#38549) (#4… (#40157)
…0063)"

This reverts commit c6df7c21a3c09ae1664deabacb88dfcea9d94b68.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-08-30 10:25:38 +08:00
c6df7c21a3 [branch-2.1](hive) support hive write text table (#38549) (#40063)
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd

pick from https://github.com/apache/doris/pull/38549
2024-08-29 16:50:40 +08:00
131238ff71 [fix](file-cache) change metric_value column in file_cache_statistics table to string (#40083)
Make it more flexible
followup #39552
2024-08-29 16:39:22 +08:00
173aafc86f [Enhancement] add information_schema.table_properties #38745 (#38746) (#39886)
bp #38746

---------

Co-authored-by: Vallish Pai <vallishpai@gmail.com>
2024-08-27 17:22:19 +08:00
6ceb574aa0 [branch-2.1]Pick IO limit/workload group usage table (#39839) 2024-08-23 18:51:47 +08:00
a55e109e97 [pick][Improment]Add schema table workload_group_privileges (#38436) (#39708)
pick #38436
2024-08-22 00:44:43 +08:00
0bfcee1251 [opt](file-cache) support system table file_cache_statistics (#39552)
1. Add new system table: `file_cache_statistics`

	This table is used for viewing metrics related to file cache on BE side

	```
	mysql> select * from information_schema.file_cache_statistics limit 10;

+-------+---------------+----------------------------+--------------------------------+--------------------+
| BE_ID | BE_IP | CACHE_PATH | METRIC_NAME | METRIC_VALUE |

+-------+---------------+----------------------------+--------------------------------+--------------------+
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_curr_elements | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_curr_size | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_max_elements | 102400 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_max_size | 21474836480 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ | hits_ratio |
0.8539634687001242 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ | hits_ratio_1h | 0
|
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ | hits_ratio_5m | 0
|
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
index_queue_curr_elements | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
index_queue_curr_size | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
index_queue_max_elements | 102400 |

+-------+---------------+----------------------------+--------------------------------+--------------------+
	```

	It will show metrics of file caches on each BE.

2. Add new metrics `hits_ratio_1h` and `hits_ratio_5m` for file cache

This 2 metrics will show the hit ratio of file cache in recent 1 hour or
5 minutes.
So that we can know recent hit ratio instead of global historical hit
ratio.
2024-08-21 10:03:39 +08:00
43cc8d648d [fix](ES Catalog)Check isArray before parse json to array (#39104) (#39273)
## Proposed changes

bp #39104
2024-08-13 15:13:40 +08:00
fc0222a64c [opt](info) processlist schema table support show all fe (#38701) (#38953)
pick #38701
2024-08-07 11:01:46 +08:00
9d23ccf1f2 [Improvement](schema scan) Use async scanner for schema scanners (#38… (#38666)
…403)
2024-08-01 16:05:24 +08:00
017dad8c54 [fix](type)support runtime predicate for time type (#38258) (#38465)
## Proposed changes
https://github.com/apache/doris/pull/38258
Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-31 10:27:36 +08:00
e2bb86e7f8 [fix](inverted index) fixed in_list condition not indexed on pipelinex (#38178)
## Proposed changes

https://github.com/apache/doris/pull/36565
https://github.com/apache/doris/pull/37842
https://github.com/apache/doris/pull/37921
https://github.com/apache/doris/pull/37386

<!--Describe your changes.-->
2024-07-25 14:42:34 +08:00
10c5c336d8 [branch-2.1](arrow-flight-sql) Add config arrow_flight_result_sink_buffer_size_rows (#38223)
pick #38221
2024-07-24 15:15:39 +08:00
e5339a4014 [feature](ES Catalog)Support control scroll level by config #37180 (#37290)
## Proposed changes

backport #37180
2024-07-15 16:41:38 +08:00
f8cee439b6 [feature](ES Catalog) map nested/object type in ES to JSON type in Doris (#37101) (#37182)
backport #37101
2024-07-05 10:48:32 +08:00
02fad48870 [Fix](upgrade) Fix fields not handled correctly during upgrade and downgrade (#36691)
master version is #36690
2024-06-22 14:23:04 +08:00
445d42a57d [fix](topn-opt) remove redundant check for fetch phase (#36676)
#36629
Issue Number: close #xxx

<!--Describe your changes.-->
2024-06-21 22:28:38 +08:00
bd47d5a681 [branch-2.1](auto-partition) Fix auto partition load failure in multi replica (#36586)
this pr
1. picked #35630, which was reverted #36098 before.
2. picked #36344 from master

these two pr fixed existing bug about auto partition load.

---------

Co-authored-by: Kaijie Chen <ckj@apache.org>
2024-06-20 17:51:18 +08:00
Pxl
dda25cceb6 [Bug](information-schema) fix some bug of information_schema.PROCESSLIST (#36447)
## Proposed changes
pick from #36409
2024-06-18 16:45:48 +08:00
3b23eee37c Revert "[fix](auto-partition) fix auto partition load lost data in multi sender (#35287)" (#36098)
Reverts apache/doris#35630 because it brought some more damaging bugs.
we will fix it and merge in next version
2024-06-11 17:11:42 +08:00
75a6f28f2e [cherry-pick]Add query type when report (#35918)
pick #34978
2024-06-11 10:51:59 +08:00
b5a35b9cef [FIX] Pick array inverted index bugfix (#35837)
here with some array with inverted index bugfix:
see also: 
https://github.com/apache/doris/pull/34766
https://github.com/apache/doris/pull/35086
https://github.com/apache/doris/pull/34683
https://github.com/apache/doris/pull/34076
2024-06-06 09:54:14 +08:00
fe1a4c4136 [Feature](IP) support ipv4/ipv6 with inverted index and conjuncts for query (#35734)
support data type ipv4/ipv6 with inverted index 
and then we can query like "> or < or >= or <= or in/not in " this
conjuncts expr for ip with inverted index speeding up
2024-06-03 23:24:03 +08:00
c2fc485327 [fix](auto-partition) fix auto partition load lost data in multi sender (#35287) (#35630)
## Proposed changes

Change `use_cnt` mechanism for incremental (auto partition) channels and
streams, it's now dynamically counted.
Use `close_wait()` of regular partitions as a synchronize point to make
sure all sinks are in close phase before closing any incremental (auto
partition) channels and streams.
Add dummy (fake) partition and tablet if there is no regular partition
in the auto partition table.

Backport #35287

Co-authored-by: zhaochangle <zhaochangle@selectdb.com>
2024-05-31 10:27:03 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00