Building a Customer Data Platform using Databricks in AWS
What is CDP?
In Customer Data Platform all the customer data are brought into one place, stitched together to give a holistic view of a customer. It would also server as a single source of truth for all the questions/queries an organisation would like to ask.
Architecture
Components
Ingestion Layer
In this layer we onboard every incoming data into the platform. All input data can broadly be classified into 2 parts
- Streaming or realtime data
- All streaming data will pass through streaming layer.
- Technology stacks used – AWS Kinesis + Databricks Structured streaming.
- Non streaming data
- examples of non streaming data are
- API calls
- s3 Dumps
- sftp transfer etc
- All non streaming data are would pass through batch layer
- Technology stacks used – Data bricks batch processing/ SQL Layer.
- examples of non streaming data are
Processing Layer
let us divide the processing layer into 3 parts based on the need of an organisation.
- Realtime Processing – discussed above
- Batch processing – discussed above
- On Demand processing
- Root Cause Analysis of an issue, Exploratory Data Analysis, use Case studies, Prototyping of a data science model etc can be put together under Adhoc needs.
- Predictive Analytics would ideally be achieved by combining all of three (Realtime + Batch + On Demand).
- Technology stacks used – Databricks notebooks + Databricks SQL Analytics.
Visualization Layer
visualisation needs of an organisation can be broadly categorised into following types
- BI Dashboards
- Redash – comes in default with databricks
- Bring your own BI Tool – Tableau, Microsoft BI etc
- Adhoc Visualizations
- We could use visualisations available in databricks notebooks.
- Custom Applications
- custom applications in ReactJS/Angular etc
Analytics Layer
The overall analytics needs of a company can be divided into following parts. We discussed all the except “Predictive Analytics”.
- Structured Analytics (or Business Intelligence/ Business Insights)
- Realtime Analytics
- On Demand Analytics ( or Adhoc Analytics)
- Predictive Analytics
- It would need the flexibility of using both batch and realtime.
- Batch modelling would fall under batch processing and realtime models would fall under realtime processing.
- However, you would fine that most of the realtime performant models would not just be realtime. It would also have some components from batch processing.
- hence, realtime models are actual hybrid of both batch processing and realtime processing.
0 Comments
Add Comment
You must be logged in to post a comment.