doris

Author	SHA1	Message	Date
Mingyu Chen	be27d4d921	[fix](broker-load) fix use_count() issue when doing broker load in debug mode (#25288 ) When executing broker load in ASAN mode, BE may crash with error: ``` F20231010 18:18:17.044978 185490 block.cpp:694] Check failed: d.column->use_count() == 1 (3 vs. 1) * Check failure stack trace: * @ 0x55e9d94c4e46 google::LogMessage::SendToLog() @ 0x55e9d94c1410 google::LogMessage::Flush() @ 0x55e9d94c5689 google::LogMessageFatal::~LogMessageFatal() @ 0x55e9c509f80d doris::vectorized::Block::clear_column_data() @ 0x55e9b6c170b3 doris::PlanFragmentExecutor::get_vectorized_internal() @ 0x55e9b6c147e6 doris::PlanFragmentExecutor::open_vectorized_internal() @ 0x55e9b6c12d9a doris::PlanFragmentExecutor::open() @ 0x55e9b6c18426 doris::PlanFragmentExecutor::execute() @ 0x55e9b6945cca doris::FragmentMgr::_exec_actual() @ 0x55e9b696456c doris::FragmentMgr::exec_plan_fragment()::$_0::operator()() ``` It may happen when there is column maping like: ``` (k1,v2,v3,v4,v5,v6,v7,v8) set (k2=v4,k3=v4,k4=v4) ``` in load stmt. Case is covered by Baidu test cases	2023-10-12 17:04:29 +08:00
HappenLee	d31d99bf34	[pipeline](load) opt the pipeline load code (#24708 ) opt the pipeline load code	2023-09-21 15:20:31 +08:00
yiguolei	c3b3f0f00a	[enhancement](serialize) add dcheck to ensure pb type is set (#24645 ) should check the pb's type is set, or the deserialize will core. should not return unknown type because deserialize will core. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-09-20 10:42:28 +08:00
zclllyybb	d3f1388717	[Feature](partitions) Support auto-partition (#24153 ) Co-authored-by: zhangstar333 <2561612514@qq.com>	2023-09-12 15:23:15 +08:00
zclllyybb	fdb7a44f57	Revert "[Feature](partitions) Support auto partition" (#24024 ) * Revert "[Feature](partitions) Support auto partition (#23236)" This reverts commit 6c544dd2011d731b8c9c51384c77bcf19c017981. * Update config.h	2023-09-07 17:08:26 +08:00
zclllyybb	6c544dd201	[Feature](partitions) Support auto partition (#23236 ) Co-authored-by: zhangstar333 <2561612514@qq.com>	2023-09-06 16:26:45 +08:00
TengJianPing	62c075bf7e	[improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672 )	2023-08-31 14:44:17 +08:00
huanghaibin	1410a15a61	[fix](compaction) print column name when checking block ColumnPtr is nullptr on get block byte (#23338 )	2023-08-29 17:24:48 +08:00
Pxl	477961dc21	[Chore](agg) refactor of hash map (#22958 ) refactor of hash map	2023-08-18 17:59:30 +08:00
Pxl	3f55d5d4d5	[Chore](excution) change some log fatal and dcheck to exception (#22890 ) change some log fatal and dcheck to exception	2023-08-15 10:45:00 +08:00
lihangyu	5584d7a5ba	[Improve](point query) Improve lookup connection cache from DoubleBuffer to LRU cache for better item pruning (#22041 )	2023-07-27 22:22:50 +08:00
Jerry Hu	36524f2b72	[improvement](functions) avoid copying of block in create_block_with_nested_columns (#21526 ) avoid copying of block in create_block_with_nested_columns	2023-07-10 17:21:23 +08:00
Pxl	f7c724f8a3	[Bug](excution) avoid core dump on filter_block_internal and add debug information (#21433 ) avoid core dump on filter_block_internal and add debug information	2023-07-03 18:10:30 +08:00
Kang	2e6d91aa99	[chore](block) temporarily disable DCHECK for column name equality in MutableBlock (#21116 ) * tempororyly disable DCHECK for column name equality in MutableBlock::add_rows * num columns EQ to LE	2023-06-26 10:49:27 +08:00
Kang	2c11ce0a02	[bugfix](topn) fix key topn merge block conflict with index predicate result columns (#20820 )	2023-06-20 21:23:00 +08:00
Xinyi Zou	93b53cf2f4	[improvement](exception-safe) create and prepare node/sink support exception safe (#20551 )	2023-06-09 21:06:59 +08:00
Xinyi Zou	068a32bc49	[Improvement](memory) faststring use Allocator #19762 After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator. Currently page body compress will catch memory alloc failure exception	2023-05-18 15:00:49 +08:00
yiguolei	63a76ed115	[refactor](exceptionsafe) disallow call new method explicitly (#18830 ) disallow call new method explicitly force to use create_shared or create_unique to use shared ptr placement new is allowed reference https://abseil.io/tips/42 to add factory method to all class. I think we should follow this guide because if throw exception in new method, the program will terminate. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-21 09:13:24 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Xinyi Zou	a68af93d30	[fix](compile) Fix block.cpp compilation failure (#18797 )	2023-04-19 08:49:23 +08:00
Xinyi Zou	79c446c89f	[enhancement](exception) Column filter/replicate supports exception safety (#18503 )	2023-04-18 19:23:09 +08:00
Pxl	307170030c	[Bug](materialized-view) fix core dump when create mv have case different with base table (#18206 ) fix core dump when create mv have case different with base table	2023-03-31 12:32:09 +08:00
yiguolei	7ae51c856e	[refactor](unify exception) unify exception definition and error code (#18006 ) * [refactor](unify exception) unify exception definition and error code --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-25 12:41:07 +08:00
Pxl	40ca250678	[Feature](materialized-view) support where clause on create materialized view (#17534 ) support where clause on create materialized view	2023-03-22 11:25:13 +08:00
Pxl	401836f523	[Bug](planner) fix core dump when lateral view above union node and have predicate (#17912 ) fix core dump when lateral view above union node and have predicate	2023-03-22 11:24:45 +08:00
yiguolei	dd53bc1c8d	[unify type system](remove unused type desc) remove some code (#17921 ) There are many type definitions in BE. Should unify the type system and simplify the development. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-19 14:05:02 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
Xinyi Zou	f9baf9c556	[improvement](scan) Support pushdown execute expr ctx (#15917 ) In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process: Read part of the column first, and calculate the row ids with a simple push-down predicate. Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates. This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process: Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above) Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again. Use row ids to read the remaining columns and pass them to the scanner.	2023-03-10 08:35:32 +08:00
Jerry Hu	caacee253d	[fix](olap)Crashing caused by IS NULL expression (#17463 ) Issue Number: close #17462	2023-03-07 15:32:52 +08:00
lihangyu	94e9a226a6	[Bug](Block compression) Fix bug if `config::compress_rowbatches=false` then the block column values could be empty (#17325 )	2023-03-03 10:31:12 +08:00
HappenLee	1244eed1cd	[Opt](exec) opt the dispose nullable column logic (#17192 )	2023-03-01 23:25:40 +08:00
Jerry Hu	08adf914f9	[improvement](vec) avoid creating a new column while filtering mutable columns (#16850 ) Currently, when filtering a column, a new column will be created to store the filtering result, which will cause some performance loss。 ssb-flat without pushdown expr from 19s to 15s.	2023-02-21 09:47:21 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
Kang	737c73dcf0	[Improvement](topn) order by key topn query optimization (#15663 )	2023-02-06 15:36:05 +08:00
yiguolei	90b12143a3	[refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237 )	2023-01-30 16:45:14 +08:00
yiguolei	adb758dcac	[refactor](remove non vec code) remove json functions string functions match functions and some code (#16141 ) remove json functions code remove string functions code remove math functions code move MatchPredicate to olap since it is only used in storage predicate process remove some code in tuple, Tuple structure should be removed in the future. remove many code in collection value structure, they are useless	2023-01-26 16:21:12 +08:00
yiguolei	79ad74637d	[refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136 )	2023-01-24 10:45:35 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
lihangyu	3894de49d2	[Enhancement](topn) support two phase read for topn query (#15642 ) This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`. TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase: 1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode. 2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine. After the second phase read, Block will contain all the data needed for the query	2023-01-19 10:01:33 +08:00
yiguolei	d857b4af1b	[refactor](remove row batch) remove impala rowbatch structure (#15767 ) * [refactor](remove row batch) remove impala rowbatch structure Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-11 09:37:35 +08:00
xueweizhang	40c53931e5	[fix](vec) VMergeIterator add key same label for agg table (#14722 )	2023-01-02 22:54:21 +08:00
yiguolei	a807978882	[refactor](non-vec) Remove rowbatch code from delta writer and some rowbatch related code (#15349 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-26 08:54:51 +08:00
Gabriel	732417258c	[Bug](pipeline) Fix bugs to pass TPCDS cases (#15194 )	2022-12-20 22:29:55 +08:00
Mingyu Chen	0b945e4ee3	[fix](csv-reader) fix be crash when reading invalid value (#14951 )	2022-12-10 18:45:47 +08:00
HappenLee	9d36931038	[Refactor](NLJ) refactor the nested loop join node (#14911 ) * [Refactor](NLJ) refactor the nested loop join node * change the logic of alloc/release resource	2022-12-09 14:10:26 +08:00
Gabriel	2c42f0a905	[refactor](decimalv3) Refine code for DecimalV3 (#14394 )	2022-11-19 16:57:17 +08:00
Ashin Gau	6bd5378f66	[feature-wip](multi-catalog) lazy read for ParquetReader (#13917 ) Read predicate columns firstly, and use VExprContext(push-down predicates) to generate the select vector, which is then applied to read the non-predicate columns. The data in non-predicate columns may be skipped by select vector, so the value-decode-time can be reduced. If a whole page can be skipped, the decompress-time can also be reduced.	2022-11-10 16:56:14 +08:00
Pxl	2fab0c45c7	[Feature](runtime-filter) add runtime filter breaking change adapt (#13246 ) add runtime filter breaking change adapt	2022-10-28 10:59:28 +08:00
Adonis Ling	125def5102	[enhancement](macOS M1) Support building from source on macOS (M1) (#13195 ) # Proposed changes This PR fixed lots of issues when building from source on macOS with Apple M1 chip. ## ATTENTION The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime: 1. Some errors with memory tracker occur when BE (RELEASE) starts. 2. Some UT cases fail. ... Temporarily, the following changes are made on macOS to start BE successfully. 1. Disable memory tracker. 2. Use tcmalloc instead of jemalloc. This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues. ## Use case ```shell ./build.sh -j 8 --be --clean cd output/be/bin ulimit -n 60000 ./start_be.sh --daemon ``` ## Something else It takes around _10+_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.	2022-10-18 13:10:13 +08:00
Pxl	bdcb600f3d	[Bug](load) fix core dump on big block load (#13014 )	2022-10-10 12:38:32 +08:00

1 2

90 Commits