doris

Author	SHA1	Message	Date
Zhengguo Yang	e023ef5404	[Load] Support multi bytes LineDelimiter and ColumnSeparator (#5462 ) * [Internal][Support Multibytes Separator] doris-1079 support multi bytes LineDelimiter and ColumnSeparator	2021-03-09 09:35:39 +08:00
Zhengguo Yang	6ede4c6ec1	[Feature] Support backup,restore,load,export directly connect to s3 (#5399 ) * [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol * Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3 1. Support load and export data from/to s3 directly. 2. Add a config to auto convert broker access to s3 acces when available Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7 (cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201) * [Internal][S3DirectAccess] File path glob compatible with broker Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68 (cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f) * [internal] [doris-1008] fix log4j class not found Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3 (cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232) * add poms Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>	2021-02-22 16:07:56 +08:00
Mingyu Chen	780900ac9c	[Feature] Support preceding filter original data when loading (#5338 ) Support conditional filtering of original data in broker load and routine load eg: ``` LOAD LABEL `label1` ( DATA INFILE ('bos://cmy-repo/1.csv') INTO TABLE tbl2 COLUMNS TERMINATED BY '\t' (event_day, product_id, ocpc_stage, user_id) SET ( ocpc_stage = ocpc_stage + 100 ) PRECEDING FILTER user_id = 1381035 WHERE ocpc_stage > 30 ) ... ```	2021-02-07 22:37:48 +08:00
Mingyu Chen	cd96ded1ad	[Bugs] Fix bugs that FE heartbeat api of httpv2 does not return version info (#5306 ) Co-authored-by: morningman <chenmingyu@baidu.com>	2021-01-30 20:34:33 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
Mingyu Chen	3f31866169	[Bug][Load][Json] #4124 Load json format with stream load failed (#4217 ) Stream load should read all the data completely before parsing the json. And also add a new BE config streaming_load_max_batch_read_mb to limit the data size when loading json data. Fix the bug of loading empty json array [] Add doc to explain some certain case of loading json format data. Fix: #4124	2020-08-04 12:55:53 +08:00
Mingyu Chen	c3d9feed75	[Load][Json] Refactor json load logic to make it more reasonable (#4020 ) This CL mainly changes: 1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent. 2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly. 3. See `load-json-format.md` to get details of loading json format.	2020-07-07 23:07:28 +08:00
worker24h	ef8fd1fcbe	[Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553 ) Doris support load json-data by RoutineLoad or StreamLoad	2020-05-21 13:00:49 +08:00
yangzhg	852046de29	Fix incompatibility with arm architecture in olap #2645 (#2682 )	2020-01-07 19:16:10 +08:00
caiconghui	043a9528f7	Support decompressing csv file with deflate format in hdfs broker load (#2583 )	2019-12-27 08:06:22 +08:00
HangyuanLiu	35b2800542	Keep num_of_columns_from_file incompatibile with 0.10 protocol (#2187 ) After checking, I found that broker load in 0.11 added num_of_columns_from_file parameter in thrift. This parameter does not consider compatibility in BE. So broker load could cause BE crashed during the upgrade	2019-11-13 22:04:15 +08:00
kangkaisen	cd2b8373c2	Fix Stream load double NumberTotalRows (#1664 )	2019-08-19 12:23:43 +08:00
yuanli	ba6d728f26	Enable parsing columns from file path for Broker Load (#1582 ) (#1635 ) Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv This patch is able to parse columns from file path like in Spark(Partition Discovery). This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.	2019-08-19 09:39:21 +08:00
Mingyu Chen	51c92a0bec	Validate the UTF-8 encode of loading data (#1457 ) Currently, Doris only support UTF-8 encoded data. All data will be shown to user in UTF-8 format. So if data loaded in Doris does not UTF-8 encoded, user will see garbled data when querying. I introduce a fast UTF-8 validator from https://github.com/lemire/fastvalidate-utf-8 This validator is highly optimized that it only takes 0.7 CPU cycles to validata a 64k string. And by testing 1GB data load to Doris, the validator has no impact on performance.	2019-07-11 09:46:38 +08:00
worker24h	7eab12a40e	Support reading Parquet file when loading data (#1173 )	2019-07-01 18:39:27 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
EmmyMiao87	53062122ea	Change strategy of incorrect data (#1255 ) This change adds a load property named strict_mode which is used to prohibit the incorrect data. When it is set to false, the incorrect data will be loaded by NULL just like before. When it is set to true, the incorrect data which belongs to a column without expr will be filtered. The strict_mode is supported in broker load v2 now. It will be supported in stream load later.	2019-06-10 20:39:45 +08:00
Mingyu Chen	b7b66527ce	Fix some load bugs (#961 ) 1. Use load job's timeout as its txn timeout 2. Add a new session variable 'forward_to_master' for SHOW PROC and ADMIN stmt	2019-04-28 10:33:50 +08:00
Mingyu Chen	2b4d02b2fa	Add error load log url for routine load job (#938 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	8474061d63	Add some logs (#711 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	0820a29b8d	Implement the routine load process of Kafka on Backend (#671 )	2019-04-28 10:33:50 +08:00
李超勇	6b4049e21c	Unify Slice code path (#380 )	2018-12-03 18:11:47 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00
morningman	2868793b6b	Change license to Apache License 2.0 (#262 )	2018-11-01 09:06:01 +08:00
morningman	051aced48d	Missing many files in last commit In last commit, a lot of files has been missed	2018-10-31 16:19:21 +08:00
morningman	cc74efb3c5	merge to ddb65b69f9c788e359e191889cb31f15279c41ec (#224 ) 1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication. 2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS. 3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user. 4. A lot of bugs fixed. 5. Performance improvement.	2018-08-24 17:12:26 +08:00
李超勇	6486be64c3	fix license statement (#29 ) * change picture to word * change picture to word * SHOW FULL TABLES WHERE Table_type != VIEW sql can not execute * change license description	2017-08-18 19:16:23 +08:00
cyongli	e2311f656e	baidu palo	2017-08-11 17:51:21 +08:00

30 Commits