Fix the mistake for HLL in mini load (#1981)

[Docs] Fix mistakes for HLL column in mini load
This commit is contained in:
EmmyMiao87
2019-10-14 19:46:23 +08:00
committed by ZHAO Chun
parent ccc236484b
commit b84ef013eb
4 changed files with 20 additions and 11 deletions

View File

@ -37,11 +37,11 @@
2. 导入数据,导入的方式见相关help curl
a. 使用表中的列生成hll列
curl --location-trusted -uname:password -T data http://host/api/test_db/test/_load?label=load_1\&hll=set1,id:set2,name
curl --location-trusted -uname:password -T data -H "label:load_1" -H "columns:dt, id, name, province, os, set1=hll_hash(id), set2=hll_hash(name)"
http://host/api/test_db/test/_stream_load
b. 使用数据中的某一列生成hll列
curl --location-trusted -uname:password -T data http://host/api/test_db/test/_load?label=load_1\&hll=set1,cuid:set2,os
\&columns=dt,id,name,province,sex,cuid,os
curl --location-trusted -uname:password -T data -H "label:load_1" -H "columns:dt, id, name, province, sex, cuid, os, set1=hll_hash(cuid), set2=hll_hash(os)"
http://host/api/test_db/test/_stream_load
3. 聚合数据,常用方式3种:(如果不聚合直接对base表查询,速度可能跟直接使用ndv速度差不多)

View File

@ -95,6 +95,9 @@
6. 导入含有HLL列的表,可以是表中的列或者数据中的列用于生成HLL列(用户是defalut_cluster中的
curl --location-trusted -u root -T testData http://host:port/api/testDb/testTbl/_load?label=123\&max_filter_ratio=0.2
\&columns=k1,k2,k3\&hll=hll_column1,k1:hll_column2,k2
curl --location-trusted -u root -T testData http://host:port/api/testDb/testTbl/_load?label=123\&max_filter_ratio=0.2
\&hll=hll_column1,tmp_k4:hll_column2,tmp_k5\&columns=k1,k2,k3,tmp_k4,tmp_k5

View File

@ -36,12 +36,15 @@ distributed by hash(id) buckets 32;
2. Import data. See help curl for the way you import it.
A. Generate HLL columns using columns in tables
curl --location-trusted -uname:password -T data http://host/api/test_db/test/_load?label=load_1\&hll=set1,id:set2,name
A. Generate HLL columns using columns in tables
B. Generate HLL columns using a column in the data
curl --location-trusted -uname:password -T data http://host/api/test_db/test/_load?label=load_1\&hll=set1,cuid:set2,os
\&columns=dt,id,name,province,sex,cuid,os
curl --location-trusted -uname:password -T data -H "label:load_1" -H "columns:dt, id, name, province, os, set1=hll_hash(id), set2=hll_hash(name)"
http://host/api/test_db/test/_stream_load
B. Generate HLL columns using a column in the data
curl --location-trusted -uname:password -T data -H "label:load_1" -H "columns:dt, id, name, province, sex, cuid, os, set1=hll_hash(cuid), set2=hll_hash(os)"
http://host/api/test_db/test/_stream_load
3. There are three common ways of aggregating data: (without aggregating the base table directly, the speed may be similar to that of using NDV directly)

View File

@ -92,8 +92,11 @@ seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u root
6. Import tables containing HLL columns, which can be columns in tables or columns in data to generate HLL columns (users are in defalut_cluster)
curl --location-trusted -u root -T testData http://host:port/api/testDb/testTbl/_load?label=123\&max_filter_ratio=0.2
\&hll=hll_column1,tmp_k4:hll_column2,tmp_k5\&columns=k1,k2,k3,tmp_k4,tmp_k5
curl --location-trusted -u root -T testData http://host:port/api/testDb/testTbl/_load?label=123\&max_filter_ratio=0.2\&hll=hll_column1,k1:hll_column2,k2
\&columns=k1,k2,k3
curl --location-trusted -u root -T testData http://host:port/api/testDb/testTbl/_load?label=123\&max_filter_ratio=0.2
\&hll=hll_column1,tmp_k4:hll_column2,tmp_k5\&columns=k1,k2,k3,tmp_k4,tmp_k5
7. View imports after submission