doris

Author	SHA1	Message	Date
gnehil	c78341b728	[improvement](spark-load) support datev2 and datetimev2 #21839	2023-07-24 09:07:53 +08:00
HonestManXin	365afb5389	[fix](sparkdpp) Hive table properties not take effect when create spark session (#21881 ) When creating a Hive external table for Spark loading, the Hive external table includes related information such as the Hive Metastore. However, when submitting a job, it is required to have the hive-site.xml file in the Spark conf directory; otherwise, the Spark job may fail with an error message indicating that the corresponding Hive table cannot be found. The SparkEtlJob.initSparkConfigs method sets the properties of the external table into the Spark conf. However, at this point, the Spark session has already been created, and the Hive-related parameters will not take effect. To ensure that the Spark Hive catalog properly loads Hive tables, you need to set the Hive-related parameters before creating the Spark session. Co-authored-by: zhangshixin <zhangshixin@youzan.com>	2023-07-20 14:36:00 +08:00
Qi Chen	4418eb36a3	[Fix](multi-catalog) Fix some hive partition issues. (#19513 ) Fix some hive partition issues. 1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type. 2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.	2023-05-11 07:49:46 +08:00
Calvin Kirs	5459cd9c30	[Improve](fe)Upgrade dependencies and optimize jar package management (#18882 ) bind netty-version to 4.1.89-final bind jettison to 1.5.4 upgrade hadoop version to 3.3.5 upgrade range-plugins-common to 2.4.0 bind bcprov-jdk15on to 2.4.0 upgrade and bind woodstox to 6.5.1 upgrade and bind kerby to 2.0.3 upgrade hudi to 0.13.0 upgrade parquet to 1.13.0 upgrade maven-source-plugin to 3.2.1 upgrade maven-assembly-plugin to 3.3.0 upgrade maven-javadoc-plugin to 3.3.2 upgrade maven-shade-plugin to 3.3.4 upgrade maven-clean-plugin to 3.1.0 Remove meaningless plugins Optimize doris maven path Unify the Java modules for management in fe	2023-05-04 10:07:37 +08:00
Calvin Kirs	75fd4b70fa	[improve](fe)Optimize fe binary package packaging (#18554 )	2023-04-12 12:58:45 +08:00
WenYao	b5b595519a	[fix](log) use logger to replace printStackTrace() (#17382 ) Use Logger to replace printStackTrace to better locate problems.	2023-03-03 14:51:30 +08:00
Mingyu Chen	726427b795	[refactor](fe) refactor and upgrade dependency tree of FE and support AWS glue catalog (#16046 ) 1. Spark dpp Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module. So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar` will not be moved into `fe/lib`, which reduce the size of FE output. 2. Modify start_fe.sh Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that when loading classes with same qualified name, it will be got from doris-fe.jar firstly. 3. Upgrade hadoop and hive version hadoop: 2.10.2 -> 3.3.3 hive: 2.3.7 -> 3.1.3 4. Override the IHiveMetastoreClient implementations from dependency `ProxyMetaStoreClient.java` for Aliyun DLF. `HiveMetaStoreClient.java` for origin Apache Hive metastore. Because I need to modified some of their method to make them compatible with different version of Hive. 5. Exclude some unused dependencies to reduce the size of FE output Now it is only 370MB (Before is 600MB) 6. Upgrade aws-java-sdk version to 1.12.31 7. Support AWS Glue Data Catalog 8. Remove HudiScanNode(no longer support)	2023-01-20 14:42:16 +08:00
PF FOUR	650136c32e	[Enhancement](fe): replace assertTrue(X.equals(X)) with assertEquals (#15356 )	2022-12-27 00:37:24 +08:00
jiafeng.zhang	d48abd91df	[deps](fe)upgrade deps version (#15262 ) upgrade hadoop version to 2.10.2 jackson-databind to 2.14.1	2022-12-24 22:18:10 +08:00
xiaoDjun	c5f9fd5619	[fix](spark load)partition column is not duplicate key, spark load IndexOutOfBounds error (#14661 ) * [fix](spark load)partition column is not duplicate key，spark load IndexOutOfBoundsException error Co-authored-by: 张放(vivianv.zhang) <vivianv.zhang@huolala.cn>	2022-11-29 15:21:21 +08:00
ChenJiaHao	91bd76a902	[enhancement](FE) use forEach() to replace stream().forEach() (#14039 )	2022-11-21 15:40:43 +08:00
jiafeng.zhang	7fedfdcf6a	[fix](spark load)The where condition does not take effect when spark load loads the file (#13803 )	2022-11-01 23:01:45 +08:00
liujinhui	60d5e4dfce	[improvement](spark-load) support parquet and orc file (#13438 ) Add support for parquet/orc in SparkDpp.java Fixed sparkDpp checkstyle issue	2022-10-20 08:59:22 +08:00
DingGeGe	7aae98eb71	[fix](comment) sparkload comment mislead which file types it support (#12982 )	2022-09-29 20:23:57 +08:00
HouRong	f0cde35ea6	[performance improvement] Spark Load, SparkDpp processRDDAggregate performance improvement (#12186 ) Co-authored-by: hourong <hourong@zhihu.com>	2022-08-31 09:14:13 +08:00
jiafeng.zhang	915d8989c5	[feature](spark-load)Spark load supports string type data import (#11927 )	2022-08-22 08:56:59 +08:00
jakevin	976e7685db	[minor](*): remove redundant log and unused code. (#11620 )	2022-08-10 19:28:04 +08:00
Gabriel	e769597fd2	[Improvement] (datetime) support microsecond for date literal (#10917 ) * [Improvement] (datetime) support microsecond for date literal * remove joda dependency	2022-07-18 21:39:39 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
morrySnow	b7b78ae707	[style](fe)the last step of fe CheckStyle (#10134 ) 1. fix all checkstyle warning 2. change all checkstyle rules to error 3. remove some java doc rules a. RequireEmptyLineBeforeBlockTagGroup b. JavadocStyle c. JavadocParagraph 4. suppress some rules for old codes a. all java doc rules only affect on Nereids b. DeclarationOrder only affect on Nereids c. OverloadMethodsDeclarationOrder only affect on Nereids d. VariableDeclarationUsageDistance only affect on Nereids e. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/ColumnParser.java f. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/SparkRDDAggregator.java g. suppress LineLength on org/apache/doris/catalog/FunctionSet.java h. suppress LineLength on org/apache/doris/common/ErrorCode.java	2022-06-17 21:02:45 +08:00
morrySnow	e701c057dc	[style](fe) wrap and whitespace rules (#9764 ) change below rules' severity to error and fix original code error: - EmptyBlock - EmptyCatchBlock - LeftCurly - RightCurly - IllegalTokenText - MultipleVariableDeclarations - OneStatementPerLine - StringLiteralEquality - UnusedLocalVariable - Indentation - OuterTypeFilename - MethodParamPad - GenericWhitespace - NoWhitespaceBefore - OperatorWrap - ParenPad - WhitespaceAfter - WhitespaceAround	2022-05-26 16:56:20 +08:00
Shuangchi He	77297bb7ee	Fix some typos in fe/. (#9682 )	2022-05-23 12:11:01 +08:00
spaces-x	c048b1f0f9	[fix](sparkload): fix min_value will be negative number when `maxGlobalDictValue` exceeds integer range (#9436 )	2022-05-19 23:56:24 +08:00
morrySnow	235d586f11	[style](fe) code correct rules and name rules (#9670 ) * [style](fe) code correct rules and name rules * revert some change according to comments	2022-05-19 16:36:03 +08:00
morrySnow	8a0097cfb9	[style](java) format fe code with some check rules (#9460 ) Issue Number: close #9403 set below rules' severity to error and format code according check info. a. Merge conflicts unresolved b. Avoid using corresponding octal or Unicode escape c. Avoid Escaped Unicode Characters d. No Line Wrap e. Package Name f. Type Name g. Annotation Location h. Interface Type Parameter i. CatchParameterName j. Pattern Variable Name k. Record Component Name l. Record Type Parameter Name m. Method Type Parameter Name n. Redundant Import o. Custom Import Order p. Unused Imports q. Avoid Star Import r. tab character in file s. Newline At End Of File t. Trailing whitespace found	2022-05-12 20:14:38 +08:00
leo65535	d1b85d51a0	[code style](fe) Include test sources (#9366 ) Include test sources, we also need to check them.	2022-05-09 09:40:44 +08:00
Shuo Wang	1746f61388	[refactor](test) Refactor FE unit test framework that starts a FE server. (#9388 ) Currently, we use `UtFrameUtils` to start a FE server in the FE unit test. Each test class has to do some initialization and clean up stuff with the JUnit4 `@BeforeClass` and `@AfterClass` annotation. It's redundant and boring. Besides, almost all the APIs in `UtFrameUtils` has a `ConnectContext` parameter, which is not easy to use. This PR proposes to use an inherit-manner, i.e., wrap all the common logic in base class `TestWithFeService`, leveraging the JUnit5 `@BeforeAll` and `@AfterAll` annotation to narrow down the setup and cleanup lifecycle to each test class instance. At the same time, the derived concrete test class could directly use utility methods inherited from the base class, without calling a util class and passing a `ConnectContext` argument. `UtFrameUtils` and `DorisAssert` are marked as deprecated. We could remove these two classes if this refactor works well for a time.	2022-05-07 21:28:42 +08:00
leo65535	c5941fd166	[FE Code Style][sub] Adjust some check rules (#9345 ) Adjust `RedundantImport`,`UnusedImports`,`EmptyStatement`,`NewlineAtEndOfFile`,`UpperEll`, `AvoidStarImport`, `MissingOverride` rules.	2022-05-04 23:34:55 +08:00
morrySnow	784681f106	[FE Code Style][step 0]add github action to check incremental code in pr (#9328 ) 1. add rules to checkstyle 2. add github action to check incremental code in pr	2022-05-01 17:30:29 +08:00
spaces-x	62b38d7a75	[fix](spark load) fix `getHashValue` of string type is always zero in spark load. (#9136 ) Buffer flip is used incorrectly. When the hash key is string type, the hash value is always zero. The reason is that the buffer of string type is obtained by wrap, which is not needed to flip. If we do so, the buffer limit for read will be zero.	2022-04-26 10:14:21 +08:00
spaces-x	39c0fec680	[fix] fix bug when partition_id exceeds integer range in spark load (#9073 )	2022-04-20 14:50:55 +08:00
Mingyu Chen	50a59f3f86	[license] Organize third-party dependent licenses for bianry releases (#8350 )	2022-03-07 23:18:58 +08:00
Zhengguo Yang	4bdeef3b64	[chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804 ) 1. fix problems when build fe_plugins 2. format 3. add docs about dump data using mysql dump	2022-01-26 09:11:23 +08:00
Zhengguo Yang	738d2d2e07	[refactor] update parent pom version and optimize build scripts (#7548 )	2022-01-05 10:45:11 +08:00
Zhengguo Yang	2872dbfeb8	[refactor] Standardize the writing of pom files, prepare for deployment to maven (#7477 )	2021-12-30 10:16:37 +08:00
caiconghui	382351b0ee	[fix](ut) Fix run fe ut failed, be ut memory leak and build thirdparty failed (#7377 )	2021-12-15 11:00:20 +08:00
Zhengguo Yang	926540c561	[feature] Support return bitmp/hll data in select statement (#7276 ) Support return bitmp/hll data in select statement, this can be used when set show_object_data=true;	2021-12-15 09:48:27 +08:00
Zhengguo Yang	d420ff0afd	display current load bytes to show load progress, (#7134 ) this value may greate than the file size when loading parquert or orc file, will less than file size when loading csv file.	2021-11-24 10:08:32 +08:00
lihuigang	e9282205f1	[feat-opt](spark-load) support bitmap binary data from hive in spark load (#6883 ) Support to load the binary data of bitmap value from Hive into Doris. fix #6461	2021-11-20 21:38:38 +08:00
lihuigang	35da149ebe	[SparkDpp]Add not() and xor() methods to bitmapValue (#6885 ) Add not() and xor() methods to bitmapValue	2021-11-12 10:38:15 +08:00
dohongdayi	ea17682d1f	[Typo] Correct misspellings in SparkDpp (#6789 ) Correct misspellings in SparkDpp	2021-10-10 23:07:39 +08:00
Xiang Wei	6ac0ab6b29	fix(sparkload): bitmap deep copy in `or` operator (#6480 ) * fix(sparkload): bitmap deep copy in `or` operator fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly. * fix(sparkload): bitmap deep copy in `or` operator fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly. Co-authored-by: weixiang <weixiang06@meituan.com>	2021-09-02 12:15:02 +08:00
Xiang Wei	52f39e3fde	[Bug][SparkLoad]: bitmap value in `or` operator in spark load should be deep copied (#6453 ) fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly. fix #6452	2021-08-19 14:17:31 +08:00
Xiang Wei	60ac4a9660	[Bug][SparkLoad] Fix bucket_hash_value for bool value (#6284 ) Co-authored-by: weixiang <weixiang06@meituan.com>	2021-07-27 13:38:42 +08:00
wangbo	ba84eacb8c	(#6009 ) fix bucket key distribute error when using spark load (#6087 )	2021-06-29 12:30:08 +08:00
Xiang Wei	9f706848b9	[Bug] Fix somg bugs about Spark Load (#5701 ) The distinct count result of bitmap/hll column may be incorrect in the spark load mode. Fix some bugs in spark load to solve the above problem. 1. FE is big end but BE is little end. BitmapValues should be transfered to little end in FE's serialization 2. BitmapUnionAggregator/HllUnionAggregator ignore `null` value 3. Make sure encodeVarint64 in FE is consistent with BE Co-authored-by: weixiang <weixiang06@meituan.com>	2021-05-07 11:18:23 +08:00
zh0122	18c2553ef8	[FE][Bug] Update Spark version to fix a security issue (#5593 ) Fix CVE-2020-9480: Apache Spark RCE vulnerability in auth-enabled standalone master https://spark.apache.org/security.html#CVE-2020-9480	2021-04-06 11:02:04 +08:00
copperybean	d8202ca9cc	[Enhancement] move common codes from fe-core to fe-common and remove log4j1 (#5317 ) (#5318 ) The io related codes may be used by new modules, so It's better to move them to fe-common. The modification to fe-core is frequent, but there are many generated java files by thrift will slow down the compilation, so It's better to move thrift generation process to fe-common. Currently both log4j1 and log4j2 are used, which leads to logs are written to wrong files. Our modification will remove log4j1 from dependency, use slf4j + slf4j -> log4j2 instead.	2021-02-04 13:41:03 +08:00
wangbo	41ef9ccda9	(#5224 )some little fix for spark load (#5233 ) * (#5224)some little fix for spark load * 1 use yyyy-MM-dd instead of YYYY-MM-DD 2 unify lower case for bitmap column name	2021-01-27 11:16:59 +08:00

1 2

58 Commits