doris

Author	SHA1	Message	Date
Ashin Gau	eb49cd839b	[refactor](datalake) return the error status instead of static_cast<void> (#34873 ) Followup #34797 `static_cast<void>` has ignored the wrong status, some of them should make the query finished with error status, so replace `static_cast<void>` with `RETURN_IF_ERROR`. The following three scenarios need to be handled separately and cannot be simply replaced: 1. The outer function returns void; 2. Call status function inner constructors or destructors; 3. Call status function with best effort, and should ignore the wrong status.	2024-05-23 19:06:21 +08:00
Mingyu Chen	799c43686c	[fix](jni-connector) avoid core dump if init connector failed (#34007 ) _jni_scanner_cls may be null if connector init failed. So need to check it before delete it.	2024-04-24 17:13:50 +08:00
Mingyu Chen	0c8d3d007d	[fix](jni) don't delete global ref if scanner is not openned (#33398 )	2024-04-09 09:06:16 +08:00
Mingyu Chen	ed93d6132f	[fix](jni) avoid coredump if failed to get jni env (#32950 ) This PR #32217 find a problem that may failed to get jni env. And it did a work around to avoid BE crash. This PR followup this issue, to avoid BE crash when doing `close()` of JniConnector if failed to get jni env. The `close()` method will return error when: 1. Failed to get jni env 2. Failed to release jni resource. This PR will ignore the first error, and still log fatal for second error	2024-04-07 22:16:53 +08:00
Mingyu Chen	4c8aaa156a	[fix](jni) remove 'push_down_predicates' and fix BE crash with decimal predicate (#32253 ) (#32599 )	2024-03-21 14:07:50 +08:00
Mingyu Chen	2e564036ef	[fix](profile) avoid update profile in deconstructor (#32131 ) In previous, the counter in `profile` may be updated when close the file reader. And the file reader may be closed when the object being deconstruted. But at that time, the `profile` object may already be deleted, causing NPE and BE will crash. This PR try to fix this issue: 1. Remove the "profile counter update" logic from all `close()` method. 2. Add a new interface `ProfileCollector` It has 2 methods: - `collect_profile_at_runtime()` It can be called at runtime, eg, in every `get_next_block()` method. So that the counter in profile can be updated at runtime. - `collect_profile_before_close()` Should be called before the object call `close()`. And it will only be called once. 3. Derived from `ProfileCollector` All classes which may update the profile counter in `close()` method should extends the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()` And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.	2024-03-21 14:07:22 +08:00
Tiewei Fang	4636b6195b	[Fix](JNI) fix BE core when using JNI to query the empty `map` type value (#31502 )	2024-02-29 14:03:38 +08:00
Pxl	5687ca977d	[Bug](java-udf) fix core dump when javaudf input 0 row block (#30720 ) fix core dump when javaudf input 0 row block	2024-02-03 20:25:25 +08:00
nanfeng	be893d792c	[fix](jni) fix jni_reader function name get_nex_block to get_next_block (#29943 )	2024-01-16 18:39:00 +08:00
Ashin Gau	5789b7e380	[fix](jin) add datetimev2 precision (#29528 )	2024-01-06 13:35:26 +08:00
TengJianPing	a525d5c5a3	[refactor](decimal) change type name Decimal128 to Decimal128V2, Decimal128I to Decimal128V3 to avoid confusion (#29265 ) change type name Decimal128 to Decimal128V2, Decimal128I to Decimal128V3 to avoid confusion	2023-12-29 10:11:44 +08:00
Mingyu Chen	c459408580	[fix](jni) avoid BE crash and NPE when close paimon reader (#27129 ) 1. Do not use FATAL log when jni encounter error, to avoid crash. 2. Fix NPE when closing PaimonReader, the reader may not be assigned if PaimonReader open failed.	2023-11-17 20:01:08 +08:00
Ashin Gau	a2ceea5951	[refactor](jni) unified jni framework for java udaf (#25591 ) Follow https://github.com/apache/doris/pull/25302, and use the unified jni framework to refactor java udaf. This PR has removed the old interfaces to run java udf/udaf. Thanks to the ease of use of the new framework, the core code for modifying UDAF does not exceed 100 lines, and the logic is similar to that of UDF.	2023-10-20 16:13:40 +08:00
lihangyu	c21eb315b0	[feature](thrift api) support expr in MemoryScratchSink and make arrow::Schema recalculate with block info (#24603 )	2023-10-18 07:51:56 -05:00
Ashin Gau	47689fd452	[refactor](jni) unified jni framework for java udf (#25302 ) Use the unified jni framework to refactor java udf. The unified jni framework takes VectorTable as the container to transform data between c++ and java, and hide the details of data format conversion. In addition, the unified framework supports complex and nested types. The performance of basic types remains consistent, with a 30% improvement in string types and an order of magnitude improvement in complex types.	2023-10-18 09:27:54 +08:00
Ashin Gau	26818de9c8	[feature](jni) support complex types in jni framework (#24810 ) Support complex types in jni framework, and successfully run end-to-end on hudi. ### How to Use Other scanners only need to implement three interfaces in `ColumnValue`: ``` // Get array elements and append into values void unpackArray(List<ColumnValue> values); // Get map key array&value array, and append into keys&values void unpackMap(List<ColumnValue> keys, List<ColumnValue> values); // Get the struct fields specified by `structFieldIndex`, and append into values void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values); ``` Developers can take `HudiColumnValue` as an example.	2023-09-27 14:47:41 +08:00
slothever	209f36f1bf	[fix](multi-catalog)fix jdbc loader (#22814 )	2023-08-11 14:36:19 +08:00
slothever	919bfd73f1	[improvement](multi-catalog)add scanner isolation class loader (#22247 ) Add scanner isolation class loader to make each plugin non-conflicting. The BE will get scanner classes by JNI call and use JniClassLoader load them. In the last version，we always get canner classes from the system class path by default, so it cannot isolate the classes for each scanner	2023-08-10 10:02:46 +08:00
Ashin Gau	4c4f08f805	[fix](hudi) the required fields are empty if only reading partition columns (#22187 ) 1. If only read the partition columns, the `JniConnector` will produce empty required fields, so `HudiJniScanner` should read the "_hoodie_record_key" field at least to know how many rows in current hoodie split. Even if the `JniConnector` doesn't read this field, the call of `releaseTable` in `JniConnector` will reclaim the resource. 2. To prevent BE failure and exit, `JniConnector` should call release methods after `HudiJniScanner` is initialized. It should be noted that `VectorTable` is created lazily in `JniScanner`, so we don't need to reclaim the resource when `HudiJniScanner` is failed to initialize. ## Remaining works Other jni readers like `paimon` and `maxcompute` may encounter the same problems, the jni reader need to handle this abnormal situation on its own, and currently this fix can only ensure that BE will not exit.	2023-07-26 10:59:45 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
Ashin Gau	ef17289925	[feature](jni) add jni metrics and attach to BE profile automatically (#21004 ) Add JNI metrics, for example: ``` - HudiJniScanner: 0ns - FillBlockTime: 31.29ms - GetRecordReaderTime: 1m5s - JavaScanTime: 35s991ms - OpenScannerTime: 1m6s ``` Add three common performance metrics for JNI scanner: 1. `OpenScannerTime`: Time to init and open JNI scanner 2. `JavaScanTime`: Time to scan data and insert into vector table in java side 3. `FillBlockTime`: Time to convert java vector table to c++ block And support user defined metrics in java side, for example: `OpenScannerTime` is a long time for the open process, we want to determine which sub-process takes too much time, so we add `GetRecordReaderTime` in java side. The user defined metrics in java side can be attached to BE profile automatically.	2023-06-21 11:19:02 +08:00
lexluo09	57656b2459	[Enhancement](java-udf) java-udf module split to sub modules (#20185 ) The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner. Co-authored-by: lexluo <lexluo@tencent.com>	2023-06-13 09:41:22 +08:00
slothever	b7fc17da68	[feature-wip](multi-catalog)(step2)support read max compute data by JNI (#19819 ) Issue Number: #19679	2023-06-05 22:10:08 +08:00
zy-kkk	56fa38de1d	[Enhencement](JDBC Catalog) refactor jdbc catalog insert logic (#19950 ) This PR refactors the old way of writing data to JDBC External Table & JDBC Catalog, mainly including the following tasks 1. Continuing the work of @BePPPower 's PR #18594, changing the logic of splicing Inster sql to operating off-heap memory and using preparedStatement.set to write data logic to complete 2. Supplement the support written by largeint type, mainly to adapt to Java.Math.BigInteger, which uses binary operations 3. Delete the splicing SQL logic in the JDBC External Table & JDBC Catalog related written code ToDo: Binary type，like bit,binary, blob... Finally, special thanks to @BePPPower , @AshinGau for his work Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>	2023-05-30 22:03:39 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Ashin Gau	ddbff2aa39	[feature](jni) map c++ block to java vector table (#18566 ) PR(#17960) has introduced vector table which can map java table to c++ block. In some cases(java udf & jdbc exector), we should map c++ block to java table. This PR implements this function. The memory structure of java vector table and c++ block is consistent, so the implementation doesn't copy the block, just passes the memory address.	2023-04-17 00:04:53 +08:00
Ashin Gau	d6b0fe9072	[feature](jni) jni table scanner framework (#17960 ) A framework that read data from jni scanner, which can support the data source from java ecosystem(java API). ## Java Interface Java scanner should extends `org.apache.doris.jni.JniScanner`, implements the following methods: ``` // Initialize JniScanner public abstract void open() throws IOException; // Close JniScanner and release resources public abstract void close() throws IOException; // Scan data and save as vector table public abstract int getNext() throws IOException; ``` See demo usage in `org.apache.doris.jni.MockJniScanner` ## c++ interface C++ reader should use `doris::JniConnector` to get data from `org.apache.doris.jni.JniScanner`. See demo usage in `doris::MockJniReader`. ## Pushed-down predicates Java scanner can get pushed-down predicates by `org.apache.doris.jni.vec.ScanPredicate`. ## Remaining works: 1. Implement complex nested types. 2. Read hudi MOR table as the end-to-end demo usage.	2023-03-30 23:47:45 +08:00

27 Commits