Druid at Pulsar

编程入门行业动态更新时间:2024-10-23 19:31:59

Druid at Pulsar

作者：Xiaoming Zhang

A glance of Pulsar and druid

Pulsar is anopen source project of eBay and it includes two parts, pulsar pipeline andpulsar reporting. Pulsar pipeline is a streaming framework which willdistribute more than 8 billion events every day and pulsar reporting is in responseof storing, querying and visualizing these data. Druid is part of pulsarreporting.

This paper willhave an introduction and a little deep dive of druid and show you the role itis playing at pulsar reporting.

Druid components introduction

Druid is an open source project which is ananalytics data store designed for business intelligence (Online analyticalprocessing) queries on event data.

Druid Skills (From official website):

1. Sub-Second Queries.

Support multidimensional filtering, aggression and is ableto target the very data to do query.

2. Real time Ingestion

Support streaming data ingestion and offers insightson events immediately after they occur

3. Scalable

Able to deal with trillions of events for total,millions events for each second

4. Highly Available

SaaS (Software as a service), need to be up all the timeand Scale up and down will not lose data

5. Designed for Analytics

Supports a lot of filters, aggregators and query types, is ableto plugging in new functionality.

Supports approximate algorithms for cardinality estimation,and histogram and quantile calculations.

Glance at Druid Structure of Pulsarreporting:

Receiveabout 10 Billion events per day and the peak traffic is about 200k/s.

Eachmachine at our cluster is with 128GB memory and for each historical nodes, diskis more than 6 TB.

Druid ata glance:

Briefintroduction to all nodes:

Real-time

Real-timenode index the coming data and these indexed data are able to queryimmediately. Real-time nodes will build up data to segments and after a periodof time the segment will handover to historical node.

Anexample of real-time segment: 2015-11-18T06:00:00.000Z_2015-11-18T07:00:00.000Z,which will be stored at the folder of the scheme you defined. All segments arestored like the above format.

Here isthe segment information at My SQL:

Id |dataSource | created_date | start | end | partitioned | version | used |payload pulsar_event_2014-09-15T05:00:00.000-07:00_2014-09-15T06:00:00.000-07:00_2014-09-15T05:00:00.000-07:00_1| pulsar_event | 2014-09-15T09:37:30.231-07:00 | 2014-09-15T05:00:00.000-07:00| 2014-09-15T06:00:00.000-07:00 | 1 | 2014-09-15T05:00:00.000-07:00 | 0 | {"dataSource":"pulsar_event","interval":"2014-09-15T05:00:00.000-07:00/2014-09-15T06:00:00.000-07:00","version":"2014-09-15T05:00:00.000-07:00","loadSpec":{"type":"hdfs","path":"hdfs://xxxx/20140915T050000.000-0700_20140915T060000.000-0700/2014-09-15T05_00_00.000-07_00/1/index.zip"},"dimensions":"browserfamily,browserversion,city,continent,country,deviceclass,devicefamily,eventtype,guid,js_ev_type,linespeed,osfamily,osversion,page,region,sessionid,site,tenant,timestamp,uid","metrics":"count","shardSpec":{"type

更多推荐

Druid at Pulsar

本文发布于:2024-02-17 04:01:21，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1692567.html