doris

Author	SHA1	Message	Date
wudongliang	22bf2889e5	[feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236 ) Support avro's enum, record, union data types	2023-11-09 13:58:49 +08:00
Gabriel	809510f8b2	[bug](udf) Fix method invoking (#26131 )	2023-10-31 11:46:14 +08:00
DongLiang-0	267c11207b	[feature](paimon)paimon catalog supports complex types (#25364 )	2023-10-23 17:32:13 +08:00
Ashin Gau	a2ceea5951	[refactor](jni) unified jni framework for java udaf (#25591 ) Follow https://github.com/apache/doris/pull/25302, and use the unified jni framework to refactor java udaf. This PR has removed the old interfaces to run java udf/udaf. Thanks to the ease of use of the new framework, the core code for modifying UDAF does not exceed 100 lines, and the logic is similar to that of UDF.	2023-10-20 16:13:40 +08:00
Ashin Gau	47689fd452	[refactor](jni) unified jni framework for java udf (#25302 ) Use the unified jni framework to refactor java udf. The unified jni framework takes VectorTable as the container to transform data between c++ and java, and hide the details of data format conversion. In addition, the unified framework supports complex and nested types. The performance of basic types remains consistent, with a 30% improvement in string types and an order of magnitude improvement in complex types.	2023-10-18 09:27:54 +08:00
slothever	18c2a13e09	[fix](multi-catalog)fix maxcompute partition filter and session creation (#24911 ) add maxcompute partition support fix maxcompute partition filter modify maxcompute session create method	2023-10-17 22:36:10 +08:00
zhangdong	ce18f1148a	[improvement](catalog)compatible with paimon 0.5 (#24985 ) compatible with paimon 0.5 add p0 for paimon,need set enablePaimonTest=true	2023-10-17 22:07:13 +08:00
Ashin Gau	522faa8cd2	[fix](jni) the offset in map type is int64 (#25394 ) The offset in map type column is int64, but #24810 has put as int32, causing error like:	2023-10-13 14:23:17 +08:00
zhangdong	4e8cde127c	[Enhance](catalog)add table cache in paimon jni (#25014 ) - fix get old schema after refresh paimon table - add table cache in paimon jni	2023-10-08 10:36:18 +08:00
Ashin Gau	26818de9c8	[feature](jni) support complex types in jni framework (#24810 ) Support complex types in jni framework, and successfully run end-to-end on hudi. ### How to Use Other scanners only need to implement three interfaces in `ColumnValue`: ``` // Get array elements and append into values void unpackArray(List<ColumnValue> values); // Get map key array&value array, and append into keys&values void unpackMap(List<ColumnValue> keys, List<ColumnValue> values); // Get the struct fields specified by `structFieldIndex`, and append into values void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values); ``` Developers can take `HudiColumnValue` as an example.	2023-09-27 14:47:41 +08:00
lsy3993	1f8e0b48bc	[fix](S3)delete main function because hardcoded ip is not safe (#24872 )	2023-09-26 10:49:16 +08:00
Calvin Kirs	c832e018d0	[Dependence](Fe)Upgrade Fe dependencies (#24606 ) * be scanner - Upgrade avro to 1.11.2 fe - Upgrade quartz to 2.5.0-rc1 - Upgrade maxcompute to 0.45-2-publish - Binding avro-ipc to 1.11.2 * Binding hbase version to 2.5.5 binding nimbusds version to 9.35	2023-09-22 10:14:42 +08:00
Mryange	ee56783629	[fix](Java UDF) Do not use enum as the data type for JavaUdfDataType. (#24460 )	2023-09-19 14:06:02 +08:00
slothever	4816ca6679	[fix](multi-catalog)fix mc decimal type parse, fix wrong obj location (#24242 ) 1. mc decimal type need parse correctly by arrow vector method 2. fix wrong obj location if use oss,obs,cosn Will add test case in another PR	2023-09-15 17:44:56 +08:00
zy-kkk	dbfacdc4af	[improvement](jdbc catalog) Optimize Loop Performance by Caching `isNebula` Method Result (#24260 )	2023-09-13 21:40:28 +08:00
slothever	fca34ec337	[fix](multi-catalog)support bit type and hidden mc secret key (#24124 ) support max compute bit type and mask mc secret key bool type will use bit arrow vector should mask secret key: close #24019	2023-09-12 10:36:48 +08:00
Ashin Gau	6e28d878b5	[fix](hudi) compatible with hudi spark configuration and support skip merge (#24067 ) Fix three bugs: 1. Hudi slice maybe has log files only, so `new Path(filePath)` will throw errors. 2. Hive column names are lowercase only, so match column names in ignore-case-mode. 3. Compatible with [Spark Datasource Configs](https://hudi.apache.org/docs/configurations/#Read-Options), so users can add `hoodie.datasource.merge.type=skip_merge` in catalog properties to skip merge logs files.	2023-09-11 19:54:59 +08:00
Mingyu Chen	f85da7d942	[improvement](jdbc) add profile for jdbc read and convert phase (#23962 ) Add 2 metrics in jdbc scan node profile: - `CallJniNextTime`: call get next from jdbc result set - `ConvertBatchTime`: call convert jobject to columm block Also fix a potential concurrency issue when init jdbc connection cache pool	2023-09-10 21:42:06 +08:00
Ashin Gau	13c9c41c1f	[opt](hudi) reduce the memory usage of avro reader (#23745 ) 1. Reduce the number of threads reading avro logs and keep the readers in a fixed thread pool. 2. Regularly cleaning the cached resolvers in the thread local map by reflection.	2023-09-05 23:59:23 +08:00
GoGoWen	228f0ac5bb	[Feature](Multi-Catalog) support query doris bitmap column in external jdbc catalog (#23021 )	2023-09-02 12:46:33 +08:00
Mryange	96c4471b4a	[feature](udf) udf array/map support decimal and update doc (#23560 ) * update * decimal * update table name * remove log * add log	2023-08-31 07:44:18 +08:00
zhangstar333	aef162ad4c	[test](log) add some log in udf function when thrown exception (#23651 ) [test](log) add some log in udf function when thrown exception (#23651)	2023-08-30 14:16:05 +08:00
slothever	f66f161017	[fix](multi-catalog)fix hive table with cosn location issue (#23409 ) Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc). This PR mainly changes: 1. Fix the bug of accessing files via cosn. 2. Add a new field `fs_name` in TFileRangeDesc This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name for each file, otherwise, it may return error: `reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`	2023-08-26 00:16:00 +08:00
slothever	5ba505ebf4	[fix](multi-catalog)fix avro and jdbc scanner dependency (#23015 ) add preload-extensions module, put all conflict dependencies to pom.xml in preload-extensions	2023-08-20 19:28:17 +08:00
zy-kkk	221e7bdd17	[test](jdbc external) fix mysql and pg external regression test (#22998 )	2023-08-16 10:44:47 +08:00
zhangdong	fa6110accd	[fix](catalog)paimon support more data type (#22899 )	2023-08-14 13:48:33 +08:00
DongLiang-0	a089fe3e43	[Improve](jni-avro)Reduce the volume of the avro-scanner-jar package (#22276 ) The avro-scanner-jar package is reduced from 204M to 160M. Hadoop-related dependencies in the original avro pom are directly packaged into a jar package, resulting in a jar volume of 200M. Now since there is already a hadoop jar package environment in be lib, it can be directly referenced.	2023-08-11 17:26:14 +08:00
DongLiang-0	db69457576	[fix](avro)Fix S3 TVF avro format reading failure (#22199 ) This pr fixes two issues: 1. when using s3 TVF to query files in AVRO format, due to the change of `TFileType`, the originally queried `FILE_S3 ` becomes `FILE_LOCAL`, causing the query failed. 2. currently, both parameters `s3.virtual.key` and `s3.virtual.bucket` are removed. A new `S3Utils` in jni-avro to parse the bucket and key of s3. The purpose of doing this operation is mainly to unify the parameters of s3.	2023-08-11 17:22:48 +08:00
slothever	209f36f1bf	[fix](multi-catalog)fix jdbc loader (#22814 )	2023-08-11 14:36:19 +08:00
slothever	919bfd73f1	[improvement](multi-catalog)add scanner isolation class loader (#22247 ) Add scanner isolation class loader to make each plugin non-conflicting. The BE will get scanner classes by JNI call and use JniClassLoader load them. In the last version，we always get canner classes from the system class path by default, so it cannot isolate the classes for each scanner	2023-08-10 10:02:46 +08:00
Mryange	768088c95e	[refactor](udaf) refactor call udaf function and support map type in return (#22508 )	2023-08-09 22:44:07 +08:00
Mryange	ddd90855a9	[vectorized](udaf) java udaf support with map type (#22397 ) [vectorized](udaf) java udaf support with map type (#22397) * test * remove some unused * update * add case	2023-08-02 15:03:44 +08:00
Mryange	47c2cc5c74	[vectorized](udf) java udf support with return map type (#22300 )	2023-07-29 12:52:27 +08:00
lsy3993	6f1c03c766	[fix](jdbc_catalog) fix int and bigint in mysql view when use doris catalog (#22251 )	2023-07-27 16:50:42 +08:00
lsy3993	4f6a3c5bf0	[feature](catalog) support clob type in oracle jdbc catalog (#21532 )	2023-07-27 15:49:15 +08:00
zy-kkk	619a2857e1	[improvement](jdbc catalog) improve mysql jdbc catalog read bytea`s types & else improve (#22233 )	2023-07-27 10:18:37 +08:00
Ashin Gau	4c4f08f805	[fix](hudi) the required fields are empty if only reading partition columns (#22187 ) 1. If only read the partition columns, the `JniConnector` will produce empty required fields, so `HudiJniScanner` should read the "_hoodie_record_key" field at least to know how many rows in current hoodie split. Even if the `JniConnector` doesn't read this field, the call of `releaseTable` in `JniConnector` will reclaim the resource. 2. To prevent BE failure and exit, `JniConnector` should call release methods after `HudiJniScanner` is initialized. It should be noted that `VectorTable` is created lazily in `JniScanner`, so we don't need to reclaim the resource when `HudiJniScanner` is failed to initialize. ## Remaining works Other jni readers like `paimon` and `maxcompute` may encounter the same problems, the jni reader need to handle this abnormal situation on its own, and currently this fix can only ensure that BE will not exit.	2023-07-26 10:59:45 +08:00
lsy3993	9abf32324b	[improvement](jdbc) add `timestamp` put to `datev2` (#21680 )	2023-07-26 09:10:34 +08:00
Tiewei Fang	e8f4323e0f	[Fix](jdbcCatalog) fix typo of some variable #22214	2023-07-26 08:34:45 +08:00
Ashin Gau	3414d1a61f	[fix](hudi) table schema is not the same as parquet schema (#22186 ) Upgrade hudi version from 0.13.0 to 0.13.1, and keep the hudi version of jni scanner the same as that of FE. This may fix the bug of the table schema is not same as parquet schema.	2023-07-26 00:29:53 +08:00
zy-kkk	cf677b327b	[fix](jdbc catalog) Fixed mappings with type errors for bool and tinyint(1) (#22089 ) First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries	2023-07-25 22:45:22 +08:00
lsy3993	999fbdc802	[improvement](jdbc) add new type 'object' of int (#21681 )	2023-07-25 21:29:46 +08:00
Mryange	0f439bb1ca	[vectorized](udf) java udf support map type (#22059 )	2023-07-25 11:56:20 +08:00
zhangdong	7fcf702081	[improvement](multi catalog)paimon support filesystem metastore (#21910 ) 1.support filesystem metastore 2.support predicate and project when split 3.fix partition table query error todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem doc pr: #21966	2023-07-24 22:02:57 +08:00
slothever	7d488688b4	[fix](multi-catalog)fix minio default region and throw minio error msg, support s3 bucket root path (#21994 ) 1. check minio region, set default region if user region is not provided, and throw minio error msg 2. support read root path s3://bucket1 3. fix max compute public access	2023-07-20 20:48:55 +08:00
zhangstar333	c07e2ada43	[imporve](udaf) refactor java-udaf executor by using for loop (#21713 ) refactor java-udaf executor by using for loop	2023-07-14 11:37:19 +08:00
Ashin Gau	4158253799	[feature](hudi) support hudi time travel in external table (#21739 ) Support hudi time travel in external table: ``` select * from hudi_table for time as of '20230712221248'; ``` PR(https://github.com/apache/doris/pull/15418) supports to take timestamp or version as the snapshot ID in iceberg, but hudi only has timestamp as the snapshot ID. Therefore, when querying hudi table with `for version as of`, error will be thrown like: ``` ERROR 1105 (HY000): errCode = 2, detailMessage = Hudi table only supports timestamp as snapshot ID ``` The supported formats of timestamp in hudi are: 'yyyy-MM-dd HH:mm:ss[.SSS]' or 'yyyy-MM-dd' or 'yyyyMMddHHmmss[SSS]', which is consistent with the [time-travel-query.](https://hudi.apache.org/docs/quick-start-guide#time-travel-query) ## Partitioning Strategies Before this PR, hudi's partitions need to be synchronized to hive through [hive-sync-tool](https://hudi.apache.org/docs/syncing_metastore/#hive-sync-tool), or by setting very complex synchronization parameters in [spark conf](https://hudi.apache.org/docs/syncing_metastore/#sync-template). These processes are exceptionally complex and unnecessary, unless you want to query hudi data through hive. In addition, partitions are changed in time travel. We cannot guarantee the correctness of time travel through partition synchronization. So this PR directly obtain partitions by reading hudi meta information. Caching and updating table partition information through hudi instant timestamp, and reusing Doris' partition pruning.	2023-07-13 22:30:07 +08:00
zy-kkk	0be349e250	[feature](jdbc) Support jdbc catalog to read json types (#21341 )	2023-07-10 16:21:00 +08:00
zhangstar333	bb985cd9a1	[refactor](udf) refactor java-udf execute method by using for loop (#21388 )	2023-07-07 11:43:11 +08:00
Ashin Gau	0084b9fd9a	[fix](hudi) scala can't call Properties.putAll in jdk11 (#21494 )	2023-07-05 10:53:09 +08:00

1 2

62 Commits