Because of the latest moving of some code to a new repository, the documentation for release and verification
needs to be reorganized. There are 5 relevant documents as follows.
1. release-prepare.md
General instructions for the release and related preparation work.
2. release-doris-core.md
The Doris Core release process
3. release-doris-connectors
The Doris Connectors release process
4. release-complete.md
Release completion after release polling passed.
5. release-verify.md
Release verification methods.
Add scalable regression testing framework(#7584)
contains
- Test framework making by groovy, and support built-in **readable DSL** named as `Action`
- Demo exists in `${DORIS_HOME}/regression-test/data/demo`
- Chinese doc exist in `${DORIS_HOME}/docs/zh-CN/developer-guide/regression-testing.md`
English document coming soon
As described in #8120, a large number of rowset meta remain in rocksdb, which may be generated by:
1. drop tablet
The drop tablet task itself just sets the state of the tablet meta to `SHUTDOWN`
and moves the tablet to `_shutdown_tablets` vector then the background thread
will periodically clean up the tablet in `_shutdown_tablets` (that's why even if we execute
the `drop table xx force`, the tablet may be delayed by 10min to 1 hour before it goes into the trash directory).
The regular cleanup thread in the background saves the complete tablet meta as a `.hdr` file
when deleting the tablet, and then moves it to the trash directory along with the data files.
But this process does not process the rowset meta (before doing the checkpoint of the tablet meta,
the rowset meta is stored independently in rocksdb as a key-value). So this results in a residual rowset meta.
2. clone task
The clone task may migrate back and forth between BEs, which may result in a situation
where the tablet id is the same on the BE, but the tablet uuid is different.
This leads to some rowset meta can not find the corresponding tablet, but there is no thread
to process these rowsets, and eventually lead to residual.
This is PR, I handled it in the regular cleanup thread with method `_clean_unused_rowset_metas()`.
I did not delete rowset meta along with "drop tablet" task, because "drop tablet" itself is not a synchronous operation.
It also relies on a background thread to clean up the tablet periodically.
So I put this operation in the background cleanup thread.
1. move `group_concat` from string-functions to aggregate-functions.
2. add `json_array`/`json_object`/`json_quote` to sidebar file.
3. move `json_array`/`json_object`/`json_quote`/`get_json_double`/`get_json_int`/`get_json_string` to json-functions.
4. change `group_concat` document to uppercase
This Bug is introduced by PR #7936 , which change key type of connectionMap from Long to Integer,
which cause connectionMap could not find connectContext by connectionId
if you want test storage layer vectorized, you need modify some codes to let vectorized storage layer working,
it's boring work.
now, you can just change one code (redefine the macro STORAGE_LAYER_VECTORIZED_SWITCH as 1 or 0),
this gets more convenient.
Support thread pool per disk for scanners to prevent pool performance from some high ioutil disks happening
key point:
1. each disk has a thread pool for scanners
2. whenever a thread pool of one disk runs out of local work, tasks can be retrieved from other threads(disks). This is done round-robin.
performance testing:
vec version: 25% faster than single thread pool in a high io util disk test case
normal version: 8% faster than single thread pool in a high io util disk test case
Enable to check the Java version when Doris starts, to prevent the user experience caused by the inconsistency
between the compiled version and the running version.
If the Java version is compiled and the Java version is run, it will not start, and a prompt message will be given.
1. Fix the problem of BE crash caused by destruct sequence. (close#8058)
2. Add a new BE config `compaction_task_num_per_fast_disk`
This config specify the max concurrent compaction task num on fast disk(typically .SSD).
So that for high speed disk, we can execute more compaction task at same time,
to compact the data as soon as possible
3. Avoid frequent selection of unqualified tablet to perform compaction.
4. Modify some log level to reduce the log size of BE.
5. Modify some clone logic to handle error correctly.
1. Support some function alias of mod/fmod, adddate/add_data
2. Support some function of multi args: week, yearweek
3. Fix bug of multi args function call the DATETIME type not effective in DATE type
Sometimes BE is build on a machine with SIMD instruction such as AVX2.
But the BE binary will be copied to a machine without AVX2. It will crashed without any error message.
This PR will check the required SIMD instructions and print error messages during startup.
Hive Bitmap UDF provides UDFs for generating bitmap and bitmap operations in hive tables.
The bitmap in Hive is exactly the same as the Doris bitmap.
The bitmap in Hive can be imported into Doris through spark bitmap load.