Let’s Work Together



Building a Customer Data Platform using Databricks in AWS

What is CDP?

In Customer Data Platform all the customer data are brought into one place, stitched together to give a holistic view of a customer. It would also server as a single source of truth for all the questions/queries an organisation would like to ask.



Ingestion Layer

In this layer we onboard every incoming data into the platform. All input data can broadly be classified into 2 parts

  • Streaming or realtime data
    • All streaming data will pass through streaming layer.
    • Technology stacks used – AWS Kinesis + Databricks Structured streaming.
  • Non streaming data
    • examples of non streaming data are
      • API calls
      • s3 Dumps
      • sftp transfer etc
    • All non streaming data are would pass through batch layer
    • Technology stacks used – Data bricks batch processing/ SQL Layer.

Processing Layer

let us divide the processing layer into 3 parts based on the need of an organisation.

  • Realtime Processing – discussed above
  • Batch processing – discussed above
  • On Demand processing
    • Root Cause Analysis of an issue, Exploratory Data Analysis, use Case studies, Prototyping of a data science model etc can be put together under Adhoc needs.
    • Predictive Analytics would ideally be achieved by combining all of three (Realtime + Batch + On Demand).
    • Technology stacks used – Databricks notebooks + Databricks SQL Analytics.

Visualization Layer

visualisation needs of an organisation can be broadly categorised into following types

  • BI Dashboards
    • Redash – comes in default with databricks
    • Bring your own BI Tool – Tableau, Microsoft BI etc
  • Adhoc Visualizations
    • We could use visualisations available in databricks notebooks.
  • Custom Applications
    • custom applications in ReactJS/Angular etc

Analytics Layer

The overall analytics needs of a company can be divided into following parts.  We discussed all the  except “Predictive Analytics”.

  • Structured Analytics (or Business Intelligence/ Business Insights)
  • Realtime Analytics
  • On Demand Analytics ( or Adhoc Analytics)
  • Predictive Analytics
    • It would need the flexibility of using both batch and realtime.
    • Batch modelling would fall under batch processing and realtime models would fall under realtime processing.
    • However, you would fine that most of the realtime performant models would not just be realtime. It would also have some components from batch processing.
    • hence, realtime models are actual hybrid of both batch processing and realtime processing.

Add Comment