Commit Graph

127 Commits

Author SHA1 Message Date
3358f76a7f [feature](spill) Implement spill to disk for hash join, aggregation and sort for pipelineX (#31910)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
2024-03-12 14:12:09 +08:00
538032a75c [fix](partition) add log when tablet partition id eq 0 (#31796) 2024-03-07 16:11:25 +08:00
65d45daf8a [Bug](coredump) fix regresstion test coredump in multi thread access map (#31664) 2024-03-03 19:30:55 +08:00
e7de2ba0ac [refactor](raw ptr) disable some raw pointer usage and some unused code (#31595)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-02-29 19:51:18 +08:00
5b343911e8 [log](gc) add log for unused rowsets gc (#30854) 2024-02-16 10:12:23 +08:00
6eba030897 [fix](chore) path gc should consider tablet migration (#30095) (#30548)
Background:

Migration will create new tablet in different DataDir, the old tablet will be moved to TabletManager::_shutdown_tablets.
The migration task won't copy data in stale rowsets to new tablet, so after migration, the new tablet don't contains stale rowsets of old tablet
The path GC process will check every path, to make sure if it's an useless tablet, or an useless rowset. If it is, will remove data of these tablets/rowsets
The issue:

When path GC got a stale rowset path from the data dir of old tablet, it extract the tablet id and rowset id
Then it check if the tablet id exists in TabletManager, and the answer is YES!
It got the tablet instance, which is the new tablet, then it check if the stale rowset id from the old tablet path exists in the new tablet instance, and got the answer NO.
The path GC process treat the rowset as an useless rowset, since it can't find anyone holds reference to it, then delete the data of this stale rowset.
But some query may still holds reference to this stale rowset, the deletion will cause query failure.
Solution:

The lifecycle of all rowsets in a shutdown tablet, should be related with the lifecycle of this tablet
We need to differentiate the old tablet and the new one created by migration task, while performing path GC.
2024-01-30 12:03:21 +08:00
de3fdc7d08 [chore](Fix) Fix uninitilized buffer in read_cluster_id() (#29949) 2024-01-14 15:56:19 +08:00
b0cac0014d [enhance](FS) Improve FS error code (#29432) 2024-01-06 21:17:22 +08:00
f40cce1406 [Fix](partition) Skip rowset partition id eq 0 smaller than config wh… (#29510) 2024-01-05 19:39:51 +08:00
d8ad6ebff2 [enhancement](disk) log disk path when creating tablet (#29464) 2024-01-04 20:36:37 +08:00
329d20df3b [fix](regression) spare .testfile to make disk checker happy when injecting fault (#29477)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2024-01-04 15:09:57 +08:00
4581618b09 [improvement](disk) pick disk randomly when usage is less than 0.7 (#29368) 2024-01-02 14:08:09 +08:00
3661c316c9 Revert "[improvement](create tablet) backend create tablet round robin among disks (#23218)" (#29347)
This reverts commit df5b5ae0cb2f30f026ec104a64b4d9a5ce2904f3.
2023-12-31 12:51:21 +08:00
cefae3dc90 [bug](storage) Fix gc rowset bug (#28979) 2023-12-26 00:29:03 +08:00
aab859be56 [enhance](partition_id) check partition id before store meta (#28055) 2023-12-19 21:31:41 +08:00
c26f5a2bd2 [improvement](BE) Remove unnecessary error handling codes (#26760) 2023-11-12 00:02:51 +08:00
d767804815 [feature](merge-cloud) Decouple rowset id generator and local rowsets gc implementation (#25921) 2023-11-10 10:07:02 +08:00
eb9ba59996 [improvement](show trash) Fix be restart slow when too many trash files (#26147) 2023-11-01 17:43:24 +08:00
411fae951b [fix](trash core) fix get trash directory core when stop be (#25428) (#25829) 2023-10-25 11:05:25 +08:00
Pxl
2e2d5bcba2 [Improvements](status) catch some error status (#25677)
catch some error status
2023-10-23 10:19:08 +08:00
9c9fc84f39 [feature](merge-cloud) Abstract BaseTablet for CloudTablet (#24929) 2023-10-18 20:29:04 +08:00
e9157a3dba [fix](path gc) fix data dir path gc (#25420) 2023-10-16 20:25:20 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
8eb14eec7c [enhancement](baddisk) record bad disk in be_custom.conf to handle (#24639) 2023-09-21 18:31:58 +08:00
df5b5ae0cb [improvement](create tablet) backend create tablet round robin among disks (#23218)
Backend choose disk by disk available bytes and tablet num. If both are equal, round robin among them.
2023-09-15 11:39:43 +08:00
d20365cdcf [fix](transaction) fix publish txn fake succ (#24273) 2023-09-14 21:04:59 +08:00
d8ef9dda59 [feature](merge-cloud) Rewrite FS interface (#23953) 2023-09-12 19:20:25 +08:00
acbd8ca185 [improvement](show backends) show backends print trash used (#23792) 2023-09-03 20:30:58 +08:00
1ac0ff0ea9 [feature](delete-predicate) support delete sub predicate v2 (#22442)
New structure for delete sub predicate.
Delete sub predicate uses a string type condition_str to stored temporarily now and fields will be extracted from it using std::regex, which may introduces stack overflow when matching a extremely large string(bug of libc).

Now we attempt to use a new PB structure to hold the delete sub predicate, to avoid that problem.

message DeleteSubPredicatePB {
    optional int32 column_unique_id = 1;
    optional string column_name = 2;
    optional string op = 3;
    optional string cond_value = 4;
}
Currently, 2 versions of sub predicate will both be filled. For query, we use the v2, and during compaction we still use v1. The old rowset meta with delete predicates which had sub predicate v1 will be attempted to convert to v2 when read from PB. Moreover, efforts will be made to rewrite these meta with the new delete sub predicate.

Make preparation to use column unique id to specify a column globally.
Using the column unique id rather than the column name to identify a column is vital for flexible schema change. The rewritten delete predicate will attach column unique id.
2023-08-29 19:37:23 +08:00
da9eb79ac4 [Enhancement](Schema hash) Remove schema hash in tablet info (#23516) 2023-08-29 10:05:12 +08:00
d4694167a8 [Enhancement](chore) Some Status relevant enhancement (#23072) 2023-08-21 14:14:38 +08:00
f0d08da97c [enhancement](merge-on-write) split delete bitmap from tablet meta (#21456) 2023-07-12 19:13:36 +08:00
Pxl
ca71048f7f [Chore](status) avoid empty error msg on status (#21454)
avoid empty error msg on status
2023-07-11 13:48:16 +08:00
2678afd2db [fix][improvement](fs) add HdfsIO profile and modification time (#21638)
Refactor the interface of create_file_reader

the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore.
Now the file handle cache can get correct file's modification time from FileDescription.
Add HdfsIO for hdfs file reader
pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442
2023-07-08 14:49:44 +08:00
691a988c97 [enhancement](merge-on-write) add async publish task when version is discontinuous for merge on write table when clone (#21025)
version discontinuity may occur when clone. To deal with this case, add async publish task when version is discontinuous.
2023-06-22 21:50:14 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
161678380c [bug](GC)the issue of incorrect disk usage (#18397) 2023-04-08 09:32:36 +08:00
e848e456be [config] modify tablet_shard to 4 and add some log (#18416)
modify the default value of BE config tablet_map_shard_size to 4. To reduce lock contention.
Add log when failed writing disk test file, for debug
2023-04-06 17:18:16 +08:00
05db6e9b55 [refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009)
Follow #17586.
This PR mainly changes:

Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.

Except for tests in #17586, I also tested some case related to fs io:

clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
2023-03-29 09:00:52 +08:00
cb79e42e5c [refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586)
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system

Not test:
- cold & host data separation case.
2023-03-21 21:08:38 +08:00
7754619e2b [fix](quit) be can not quit cleanly due to deadlock (#17971) 2023-03-21 12:52:48 +08:00
6eeba204f9 [Enhancement] path scan causes disk io to skyrocket (#16968) 2023-02-25 09:15:15 +08:00
5014ad03e7 [feature](cooldown) Auto delete unused remote files (#16588) 2023-02-13 23:59:39 +08:00
bd8ef4edeb [fix](cooldown) Fix core in remove_all_remote_rowsets (#16374) 2023-02-04 22:31:38 +08:00
00a598a839 [feature](cooldown) Decouple storage policy and resource (#15873) 2023-01-31 14:13:47 +08:00
1489e3cfbf [Fix](file system) Make the constructor of XxxFileSystem a private method (#15889)
Since Filesystem inherited std::enable_shared_from_this , it is dangerous to create native point of FileSystem.
To avoid this behavior, making the constructor of XxxFileSystem a private method and using the static method create(...) to get a new FileSystem object.
2023-01-13 15:32:16 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
30175010c7 Fix nullptr in perform_remote_tablet_gc (#11820) 2022-08-16 16:50:21 +08:00
b35daf0a04 [improvement](light-schema-change) Support tablet schema cache (#11131) 2022-08-01 12:18:00 +08:00