This commit enable the avg operator in fe instead of converting the avg function into sum/count.
Also, this commit fix the bug of deciamlv2 avg which cause the core in be.
The int128 could not be assinged directly.
The speed of avg function is similar to sum function after enhancement.
In this change list
1. validate HLL column when loading data, if data is invalid, this row
will be filtered.
2. seems as empty HLL when serializing invalid type of HLL data, with
this change, all ingested data will be valid.
3. seems as empty HLL when deserializing nullptr or invalid type of HLL data.
With this change, dirty data can be handled normally.
4. rename function empty_hll to hll_empty.
5. disable memtable_flush_execute_test because this will fails
sometimes. When tearing down, some thread is not joined, and they will
visit destroyed resource, which is invalid.
Now size of HyperLogLog struct is so large that it lead the rowset is
too small when ingesting data. In this CL, registers in HyperLogLog are
only created when it is needed. When ingesting data, it's normal case
that there are only few values in one HyperLogLog.
* Fix bug that <=> operator and in operator get wrong result
* Add some comment to get_result_for_null
* Add an new Binary Operator to replace is_safe_for_null for handleing '<=>' operator
* Add EQ_FOR_NULL to TExprOpcode
* Remove macro definition last backslash
1. get_json_xxx() now support using quoto to escape dot
2. Implement json_path_prepare() function to preprocess json_path
Performance of get_json_string() on 1000000 rows reduces from 2.27s to 0.27s
* Remove build rows counter in PartitionHashJoinNode
* Fix unit test fail in RuntimeProfileTest
* Add check for result type length in cast_to_string_val
* Add UserFunctionCache to cache UDF's library
This patch replace LibCache with UserFunctionCache. LibCache use HDFS
URL to identify a UDF's Library, and when BE process restart all of
downloaded library should be loaded another time. We use function id
corresponding to a library, and when process restart, all downloaded
libraries can be loaded without another downloading.
* update
* Add streaming load feature. You can execute 'help stream load;' to see more information.
Changed:
* Loading phase of a certain table can be parallelized, to reduce the load job execution time when multi load jobs to a single table.
* Using RocksDB to save the header info of tablets in Backends, to reduce the IO operations and increate speeding of restarting.
Fixed:
* A lot of bugs fixed.
2. add 2 new proc '/current_queries' and '/current_backend_instances' to monitor the current running queries.
3. add a manual compaction api on Backend to trigger cumulative or base compaction manually.
4. add Frontend config 'max_bytes_per_broker_scanner' to limit to bytes per one broker scanner. This is to limit the memory cost of a single broker load job
5. add Frontend config 'max_unfinished_load_job' to limit load job number: if number of running load jobs exceed the limit, no more load job is allowed to be submmitted.
6. a log of bug fixed
1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication.
2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS.
3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user.
4. A lot of bugs fixed.
5. Performance improvement.