数据压测"/>
tpcds数据压测
文章目录
- 1.tpcds数据压测(支持关系数据库)
- TPC资源准备
- SQL语句兼容性测试以及语句修改
1.tpcds数据压测(支持关系数据库)
TPC资源准备
这个部分主要介绍如何从TPC上获得测试所需资源,入1GB数据、500GB数据的生成,建表语句和sql语句的生成等等。
1.官网下载tpc-ds压缩包
参考 git clone .git
2.在tools文件中执行make
[root@node1tpc]# cd /tpcds-kit/tools
[root@node1 tools]# make
注意:若没有安装gcc会报错!
3.在tools目录下使用./dsdgen生成数据。--help查看帮助常用:- DIR 生成目录- SCALE 数据大小(单位GB)- DELIMITER 字段分隔符,默认|- TERMINATE 末尾是否有分隔符,参数 Y或者N例子:在/usr/datas目录下生成分隔符为|且末尾没有分隔符的1GB数据:./dsdgen -DIR /usr/datas -SCALE 1 -TERMINATE N
dsdgen是个单线程程序,一般在测试过程中,会通过指定表名(“-TABLE”参数)以多个进程并发(每个进程对应1张表)的方式来加快生成数据:
./dsdgen -SCALE 1 -DISTRIBUTIONS tpcds.idx -TERMINATE N -TABLE time_dim
因为事实表普遍比较大,所以考虑使用dsdgen通过分块的方式加速生成数据:
./dsdgen -SCALE 1 -DISTRIBUTIONS tpcds.idx -TERMINATE N -TABLE catalog_sales -PARALLEL 10 -CHILD 1
命令中:“-PARALLEL 10”参数表示整个表分成10块,“-CHILD 1”参数表示生成第1块;同时启动10个dsdgen进程,每个进程CHILD编号递增,加速效果就出来了。
如果数据集比较大,可以考虑把事实表分散到多台服务器上的多个进程来生成,比如20台服务器,每台服务器启动32个进程,这样效果就更明显了:4.(./dsqgen)生成sql测试用例--help 查看帮助常用:-input 输入,读取测试用例包含的模板,一般使用/query_templates/templates.lst即可。-directory 模板所在目录, 一般使用-directory../query_templates即可。-dialect 生成某个数据库的语言,可选项可以查看/query_templates目录,有oracle、db2、SqlServer等。-scale 数据大小(单位G)特别说明:/query_templates目录下的每一个query末尾都要加上:由此,可以在tools目录下生成一个query_0.sql,即为生成的sql测试用例
执行:
mkdir sql (tools下)
for id in `seq 1 99`; do ./dsqgen -DIRECTORY ../query_templates -TEMPLATE "query$id.tpl" -DIALECT netezza -FILTER Y > ./sql/"query$id.sql"; done6.建表语句
/tpcds-kit/tools目录下,tpcds.sql文件里。很多数据平台可能不能直接使用,需要修改。建表语句的修改主要是依据不同环境支持的数据类型修改和一些基础语法修正,还需依照生成的数据的分割符在建表时指定分隔符。
7.测试用例结果
在目录/tpcds-kit/answer_sets下。
SQL语句兼容性测试以及语句修改
1.SQL语句兼容性测试
在测试前最好能进行一次99条语句的格式整理,这是有必要的,原因是获取的脚本中可能含有制表符,
完成database的建立,进入这个database进行建表操作。注意,如果不建立新的数据库,操作将在default数据库中进行。
之后可以在少量数据(1GB)条件下,遍历进行SQL语句的支持情况测试(这个测试可以在虚拟机中完成,因为本测试仅测试兼容性,对测试性能要求较低)。
首先,将本地数据load到刚才建立好的表中,如:
Load完成后,检查各表数据是否完整,这之后开始SQL兼容性测试。主要测试方法是,先遍历执行各个测试案例,如果SQL语句能执行完成,进行可完成的记录;若不能完成,则将系统报错记录,直至所有语句测试完成。2.语句修改
通过上阶段的兼容性测试后,可将无法完成SQL的语句的报错进行汇总,将问题分别归类,同类问题一起修改。
每修改一次语句都需要对修改后的语句进行测试,这部分测试,为了效率的保证,也是在少量数据(1GB)的条件下进行测试的。在测试中可能会出现别的错误,这时需要立刻着手解决这个问题。3.TPC-DS测试
在完成SQL语句兼容性测试后,将可运行的语句汇总,编写在一个脚本中。在测试环境中重复上述建表和load过程,500GB数据的导入完成后,执行这个脚本,实现TPC-DS测试。这样做的主要原因是,这些SQL案例测试时间较长,可能需要通过利用夜间时间进行不断电测试,否则,也可以采用如同兼容性测试同样的方法,一条一条进行测试。
在执行load以及SQL语句时,需要将log保留,以hive中执行SQL语句为例,首先切换至root用户,新建一个log文档并改变它的读写权限,然后执行脚本:
若想知道语句的执行情况,可以通过cat语句查询log,在脚本出错时需及时停止执行,导出log,分析出错原因,重新执行或是执行下一条语句。
将所有语句执行完成后,将log导出,进行下一步处理。4.测试结果整理
将log中的各个平台的SQL执行时间数据制成表格,由此可直观的对比各个平台性能的优劣;
要了解具体支持程度,还可以进一步将测试结果正确与否、出错原因进行比对。本测试主要关心的因素有:
1.大表之间的join
2.数据倾斜问题
3.普通action测试
tpcds总共24张表,7张事实表(加粗)、17张维度表.
create table dbgen_version
(dv_version varchar(16) ,dv_create_date date ,dv_create_time TIMESTAMP ,dv_cmdline_args varchar(200)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/dbgen_version.dat' overwrite into table dbgen_version;create table customer_address
(ca_address_sk int ,ca_address_id char(16) ,ca_street_number char(10) ,ca_street_name varchar(60) ,ca_street_type char(15) ,ca_suite_number char(10) ,ca_city varchar(60) ,ca_county varchar(30) ,ca_state char(2) ,ca_zip char(10) ,ca_country varchar(20) ,ca_gmt_offset decimal(5,2) ,ca_location_type char(20)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/customer_address.dat' overwrite into table customer_address;create table customer_demographics
(cd_demo_sk int ,cd_gender char(1) ,cd_marital_status char(1) ,cd_education_status char(20) ,cd_purchase_estimate int ,cd_credit_rating char(10) ,cd_dep_count int ,cd_dep_employed_count int ,cd_dep_college_count int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/customer_demographics.dat' overwrite into table customer_demographics;create table date_dim
(d_date_sk int ,d_date_id char(16) ,d_date date ,d_month_seq int ,d_week_seq int ,d_quarter_seq int ,d_year int ,d_dow int ,d_moy int ,d_dom int ,d_qoy int ,d_fy_year int ,d_fy_quarter_seq int ,d_fy_week_seq int ,d_day_name char(9) ,d_quarter_name char(6) ,d_holiday char(1) ,d_weekend char(1) ,d_following_holiday char(1) ,d_first_dom int ,d_last_dom int ,d_same_day_ly int ,d_same_day_lq int ,d_current_day char(1) ,d_current_week char(1) ,d_current_month char(1) ,d_current_quarter char(1) ,d_current_year char(1)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/date_dim.dat' overwrite into table date_dim;create table warehouse
(w_warehouse_sk int ,w_warehouse_id char(16) ,w_warehouse_name varchar(20) ,w_warehouse_sq_ft int ,w_street_number char(10) ,w_street_name varchar(60) ,w_street_type char(15) ,w_suite_number char(10) ,w_city varchar(60) ,w_county varchar(30) ,w_state char(2) ,w_zip char(10) ,w_country varchar(20) ,w_gmt_offset decimal(5,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/warehouse_1_10.dat' overwrite into table warehouse;create table ship_mode
(sm_ship_mode_sk int ,sm_ship_mode_id char(16) ,sm_type char(30) ,sm_code char(10) ,sm_carrier char(20) ,sm_contract char(20)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/ship_mode.dat' overwrite into table ship_mode;create table time_dim
(t_time_sk int ,t_time_id char(16) ,t_time int ,t_hour int ,t_minute int ,t_second int ,t_am_pm char(2) ,t_shift char(20) ,t_sub_shift char(20) ,t_meal_time char(20)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/time_dim_1_10.dat' overwrite into table time_dim;create table reason
(r_reason_sk int ,r_reason_id char(16) ,r_reason_desc char(100)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/reason.dat' overwrite into table reason;create table income_band
(ib_income_band_sk int ,ib_lower_bound int ,ib_upper_bound int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/income_band.dat' overwrite into table income_band;create table item
(i_item_sk int ,i_item_id char(16) ,i_rec_start_date date ,i_rec_end_date date ,i_item_desc varchar(200) ,i_current_price decimal(7,2) ,i_wholesale_cost decimal(7,2) ,i_brand_id int ,i_brand char(50) ,i_class_id int ,i_class char(50) ,i_category_id int ,i_category char(50) ,i_manufact_id int ,i_manufact char(50) ,i_size char(20) ,i_formulation char(20) ,i_color char(20) ,i_units char(10) ,i_container char(10) ,i_manager_id int ,i_product_name char(50)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/item.dat' overwrite into table item;create table store
(s_store_sk int ,s_store_id char(16) ,s_rec_start_date date ,s_rec_end_date date ,s_closed_date_sk int ,s_store_name varchar(50) ,s_number_employees int ,s_floor_space int ,s_hours char(20) ,s_manager varchar(40) ,s_market_id int ,s_geography_class varchar(100) ,s_market_desc varchar(100) ,s_market_manager varchar(40) ,s_division_id int ,s_division_name varchar(50) ,s_company_id int ,s_company_name varchar(50) ,s_street_number varchar(10) ,s_street_name varchar(60) ,s_street_type char(15) ,s_suite_number char(10) ,s_city varchar(60) ,s_county varchar(30) ,s_state char(2) ,s_zip char(10) ,s_country varchar(20) ,s_gmt_offset decimal(5,2) ,s_tax_precentage decimal(5,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/store.dat' overwrite into table store;create table call_center
(cc_call_center_sk int ,cc_call_center_id char(16) ,cc_rec_start_date date ,cc_rec_end_date date ,cc_closed_date_sk int ,cc_open_date_sk int ,cc_name varchar(50) ,cc_class varchar(50) ,cc_employees int ,cc_sq_ft int ,cc_hours char(20) ,cc_manager varchar(40) ,cc_mkt_id int ,cc_mkt_class char(50) ,cc_mkt_desc varchar(100) ,cc_market_manager varchar(40) ,cc_division int ,cc_division_name varchar(50) ,cc_company int ,cc_company_name char(50) ,cc_street_number char(10) ,cc_street_name varchar(60) ,cc_street_type char(15) ,cc_suite_number char(10) ,cc_city varchar(60) ,cc_county varchar(30) ,cc_state char(2) ,cc_zip char(10) ,cc_country varchar(20) ,cc_gmt_offset decimal(5,2) ,cc_tax_percentage decimal(5,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/call_center.dat' overwrite into table call_center;create table customer
(c_customer_sk int ,c_customer_id char(16) ,c_current_cdemo_sk int ,c_current_hdemo_sk int ,c_current_addr_sk int ,c_first_shipto_date_sk int ,c_first_sales_date_sk int ,c_salutation char(10) ,c_first_name char(20) ,c_last_name char(30) ,c_preferred_cust_flag char(1) ,c_birth_day int ,c_birth_month int ,c_birth_year int ,c_birth_country varchar(20) ,c_login char(13) ,c_email_address char(50) ,c_last_review_date_sk int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/customer.dat' overwrite into table customer;create table web_site
(web_site_sk int ,web_site_id char(16) ,web_rec_start_date date ,web_rec_end_date date ,web_name varchar(50) ,web_open_date_sk int ,web_close_date_sk int ,web_class varchar(50) ,web_manager varchar(40) ,web_mkt_id int ,web_mkt_class varchar(50) ,web_mkt_desc varchar(100) ,web_market_manager varchar(40) ,web_company_id int ,web_company_name char(50) ,web_street_number char(10) ,web_street_name varchar(60) ,web_street_type char(15) ,web_suite_number char(10) ,web_city varchar(60) ,web_county varchar(30) ,web_state char(2) ,web_zip char(10) ,web_country varchar(20) ,web_gmt_offset decimal(5,2) ,web_tax_percentage decimal(5,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/web_site_1_10.dat' overwrite into table web_site;create table store_returns
(sr_returned_date_sk int ,sr_return_time_sk int ,sr_item_sk int ,sr_customer_sk int ,sr_cdemo_sk int ,sr_hdemo_sk int ,sr_addr_sk int ,sr_store_sk int ,sr_reason_sk int ,sr_ticket_number int ,sr_return_quantity int ,sr_return_amt decimal(7,2) ,sr_return_tax decimal(7,2) ,sr_return_amt_inc_tax decimal(7,2) ,sr_fee decimal(7,2) ,sr_return_ship_cost decimal(7,2) ,sr_refunded_cash decimal(7,2) ,sr_reversed_charge decimal(7,2) ,sr_store_credit decimal(7,2) ,sr_net_loss decimal(7,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/store_returns.dat' overwrite into table store_returns;create table household_demographics
(hd_demo_sk int ,hd_income_band_sk int ,hd_buy_potential char(15) ,hd_dep_count int ,hd_vehicle_count int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/household_demographics.dat' overwrite into table household_demographics;create table web_page
(wp_web_page_sk int ,wp_web_page_id char(16) ,wp_rec_start_date date ,wp_rec_end_date date ,wp_creation_date_sk int ,wp_access_date_sk int ,wp_autogen_flag char(1) ,wp_customer_sk int ,wp_url varchar(100) ,wp_type char(50) ,wp_char_count int ,wp_link_count int ,wp_image_count int ,wp_max_ad_count int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/web_page_1_10.dat' overwrite into table web_page;create table promotion
(p_promo_sk int ,p_promo_id char(16) ,p_start_date_sk int ,p_end_date_sk int ,p_item_sk int ,p_cost decimal(15,2) ,p_response_target int ,p_promo_name char(50) ,p_channel_dmail char(1) ,p_channel_email char(1) ,p_channel_catalog char(1) ,p_channel_tv char(1) ,p_channel_radio char(1) ,p_channel_press char(1) ,p_channel_event char(1) ,p_channel_demo char(1) ,p_channel_details varchar(100) ,p_purpose char(15) ,p_discount_active char(1)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/promotion.dat' overwrite into table promotion;create table catalog_page
(cp_catalog_page_sk int ,cp_catalog_page_id char(16) ,cp_start_date_sk int ,cp_end_date_sk int ,cp_department varchar(50) ,cp_catalog_number int ,cp_catalog_page_number int ,cp_description varchar(100) ,cp_type varchar(100)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/catalog_page.dat' overwrite into table catalog_page;create table inventory
(inv_date_sk int ,inv_item_sk int ,inv_warehouse_sk int ,inv_quantity_on_hand int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/inventory.dat' overwrite into table inventory;create table catalog_returns
(cr_returned_date_sk int ,cr_returned_time_sk int ,cr_item_sk int ,cr_refunded_customer_sk int ,cr_refunded_cdemo_sk int ,cr_refunded_hdemo_sk int ,cr_refunded_addr_sk int ,cr_returning_customer_sk int ,cr_returning_cdemo_sk int ,cr_returning_hdemo_sk int ,cr_returning_addr_sk int ,cr_call_center_sk int ,cr_catalog_page_sk int ,cr_ship_mode_sk int ,cr_warehouse_sk int ,cr_reason_sk int ,cr_order_number int ,cr_return_quantity int ,cr_return_amount decimal(7,2) ,cr_return_tax decimal(7,2) ,cr_return_amt_inc_tax decimal(7,2) ,cr_fee decimal(7,2) ,cr_return_ship_cost decimal(7,2) ,cr_refunded_cash decimal(7,2) ,cr_reversed_charge decimal(7,2) ,cr_store_credit decimal(7,2) ,cr_net_loss decimal(7,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/catalog_returns.dat' overwrite into table catalog_returns;create table web_returns
(wr_returned_date_sk int ,wr_returned_time_sk int ,wr_item_sk int ,wr_refunded_customer_sk int ,wr_refunded_cdemo_sk int ,wr_refunded_hdemo_sk int ,wr_refunded_addr_sk int ,wr_returning_customer_sk int ,wr_returning_cdemo_sk int ,wr_returning_hdemo_sk int ,wr_returning_addr_sk int ,wr_web_page_sk int ,wr_reason_sk int ,wr_order_number int ,wr_return_quantity int ,wr_return_amt decimal(7,2) ,wr_return_tax decimal(7,2) ,wr_return_amt_inc_tax decimal(7,2) ,wr_fee decimal(7,2) ,wr_return_ship_cost decimal(7,2) ,wr_refunded_cash decimal(7,2) ,wr_reversed_charge decimal(7,2) ,wr_account_credit decimal(7,2) ,wr_net_loss decimal(7,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/web_returns_1_10.dat' overwrite into table web_returns;create table web_sales
(ws_sold_date_sk int ,ws_sold_time_sk int ,ws_ship_date_sk int ,ws_item_sk int ,ws_bill_customer_sk int ,ws_bill_cdemo_sk int ,ws_bill_hdemo_sk int ,ws_bill_addr_sk int ,ws_ship_customer_sk int ,ws_ship_cdemo_sk int ,ws_ship_hdemo_sk int ,ws_ship_addr_sk int ,ws_web_page_sk int ,ws_web_site_sk int ,ws_ship_mode_sk int ,ws_warehouse_sk int ,ws_promo_sk int ,ws_order_number int ,ws_quantity int ,ws_wholesale_cost decimal(7,2) ,ws_list_price decimal(7,2) ,ws_sales_price decimal(7,2) ,ws_ext_discount_amt decimal(7,2) ,ws_ext_sales_price decimal(7,2) ,ws_ext_wholesale_cost decimal(7,2) ,ws_ext_list_price decimal(7,2) ,ws_ext_tax decimal(7,2) ,ws_coupon_amt decimal(7,2) ,ws_ext_ship_cost decimal(7,2) ,ws_net_paid decimal(7,2) ,ws_net_paid_inc_tax decimal(7,2) ,ws_net_paid_inc_ship decimal(7,2) ,ws_net_paid_inc_ship_tax decimal(7,2) ,ws_net_profit decimal(7,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/web_sales_1_10.dat' overwrite into table web_sales;create table catalog_sales
(cs_sold_date_sk int ,cs_sold_time_sk int ,cs_ship_date_sk int ,cs_bill_customer_sk int ,cs_bill_cdemo_sk int ,cs_bill_hdemo_sk int ,cs_bill_addr_sk int ,cs_ship_customer_sk int ,cs_ship_cdemo_sk int ,cs_ship_hdemo_sk int ,cs_ship_addr_sk int ,cs_call_center_sk int ,cs_catalog_page_sk int ,cs_ship_mode_sk int ,cs_warehouse_sk int ,cs_item_sk int ,cs_promo_sk int ,cs_order_number int ,cs_quantity int ,cs_wholesale_cost decimal(7,2) ,cs_list_price decimal(7,2) ,cs_sales_price decimal(7,2) ,cs_ext_discount_amt decimal(7,2) ,cs_ext_sales_price decimal(7,2) ,cs_ext_wholesale_cost decimal(7,2) ,cs_ext_list_price decimal(7,2) ,cs_ext_tax decimal(7,2) ,cs_coupon_amt decimal(7,2) ,cs_ext_ship_cost decimal(7,2) ,cs_net_paid decimal(7,2) ,cs_net_paid_inc_tax decimal(7,2) ,cs_net_paid_inc_ship decimal(7,2) ,cs_net_paid_inc_ship_tax decimal(7,2) ,cs_net_profit decimal(7,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/catalog_sales.dat' overwrite into table catalog_sales;
;create table store_sales
(ss_sold_date_sk int ,ss_sold_time_sk int ,ss_item_sk int ,ss_customer_sk int ,ss_cdemo_sk int ,ss_hdemo_sk int ,ss_addr_sk int ,ss_store_sk int ,ss_promo_sk int ,ss_ticket_number int ,ss_quantity int ,ss_wholesale_cost decimal(7,2) ,ss_list_price decimal(7,2) ,ss_sales_price decimal(7,2) ,ss_ext_discount_amt decimal(7,2) ,ss_ext_sales_price decimal(7,2) ,ss_ext_wholesale_cost decimal(7,2) ,ss_ext_list_price decimal(7,2) ,ss_ext_tax decimal(7,2) ,ss_coupon_amt decimal(7,2) ,ss_net_paid decimal(7,2) ,ss_net_paid_inc_tax decimal(7,2) ,ss_net_profit decimal(7,2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
load data local inpath '/glusterfs-user-data/data/store_sales.dat' overwrite into table store_sales;
更多推荐
tpcds数据压测
发布评论