Analytical Data Platform for the Digital Innovation Team of a US Bank
Simplity’s data integration, data validation, and data quality analysis to enable advanced data science for the Digital Innovation team of an American bank.
The Digital Innovation team needed a data solution for enabling data science analytics
Simplity was approached by the Digital Innovation and Data Science team of an American bank. The team was responsible for digital experiences of their customers, namely internet and mobile banking. The bank had identified the need to have a data solution for enabling data science analytics, such as predictive analytics, measuring digital adoption, or product management of deposits. The bank also needed to operationalize the data for the business users and was missing the consumption layer to enable this. The teams at the bank were facing these challenges:
Data were dispersed across different systems with no single point of access
The digital innovation team was lacking historical data and needed to prepare data for each analytic task individually without the ability to reuse it next time.
Their data warehouse (DWH) was built primarily for the purpose of their CRM. Given this, it lacked the flexibility of the incremental definition of use cases, not allowing advanced analytics using data science.
Time to delivery in the DWH needed to be improved to allow faster operations and decision making.
Lack of automation in ETL processes development and preparation plus lack of processes overall.
Access for business users to data was not flexible enough.
Project in numbers
source systems implemented
tables in 3 months on average
measures in consumption layer
terabytes of data
Enable data for digital adoption of internet and mobile banking
The bank intended to increase the digital adoption of its internet and mobile banking, while tracking relevant KPIs on the digital adoption dashboard (e.g., adoption rate, churn rate) and monitoring frictions in digital experiences. They needed a flexible way to store and work with historic data to enable both the Data Science team and business users to be able to get insights from data and make data-driven product and customer management decisions.
Simplity designed and implemented a data vault-based solution for the bank. The chosen solution had the advantage that adding new source systems or calculated values does not change the data model and can be done incrementally.
According to our experience, statistical modelling techniques prefer flat (denormalized) datasets as input – rather than a classic dimensional model. As new sources are added over time the changing of existing parts of the data model was minimized and also ETLs are kept intact.
Metadata-driven approach: The whole solution is defined and driven by metadata stored in a combination of storage areas – a data model, a service database, and Excel sheets. The same metadata is also used for code generation.
ETL transformation code & documentation: Transformation code moving data from sources into the data vault using the to-be-defined technology or technologies (SQL, MS SSIS, etc.), plus additional documentation was created based on the agreed documentation approach.
Documented data validation findings: During data validations, data quality issues were identified and documented – ready to be raised to data owners for correction.
Defined data quality rules for measurement: Findings were translated into data quality rules that can be used for subsequent data quality measurement and reporting (monitoring to track improvements in the future).
Documented best practices: Simplity contributed with its experience and best practices throughout the whole project. The first results were presented to the bank before even the official engagement in the remote workshop to review the data model and historization approach took place.
Automation approach: Both ETL processes and data quality measurements needed to be operationalized to run smoothly in BAU – Simplity experts, shared their experience to contribute to defining the right approach. The code itself is generated using BIML.
Data marts and consumption layer: In phase two of the project, Simplity introduced a consumption layer using data marts empowering business users to also be able to use the data vault.
Handover and knowledge transfer: Besides providing proper documentation as part of the above deliverables, Simplity conducted handover workshops to ensure proper knowledge transfer to our client team members.
To illustrate the scope of work done on the project we describe the examples of sources, source tables, and measures added in each quarter iteration (not a complete list):
Q4/2019 - 120 source tables added, mainly for loan origination data used to assess clients' risk profile to approve or disapprove a loan.
Q1/2020 - 120 source tables added, in this phase, we focused mainly on the alignment of data and processes in the data vault.
Q2/2020 - 143 source tables added with a focus on wealth management portfolio of services. Table examples are details about clients, their accounts, their activities, advisors that set the deals, what stocks/bonds they own, etc.
Q3/2020 - 166 source tables added, the main theme for this phase was Master Data Repository (MDR), a system keeping the most accurate master data (non-transactional data about customers, accounts, products). This enables the bank to have a comprehensive picture of its customers. Another source added was ATM data e.g., info about transactions carried out by ATM users, details about ATMs such as their location.
Q4/2020 - 187 source tables added. Simplity delivered the first POC in the form of a "Consumption Layer” for digital users’ data. We also focused on credit card data and household data plus adding more cleaned data about accounts and product catalog into the MDR.
Q1/2021 - 206 source tables added. Adding more measures for end users e.g., account balance over time or transactions on accounts, both from digital and core banking systems. In addition, data from the call center were included in this phase.
Q2/2021 - 234 source tables added. This time the focus was on fees and an Early Warning System. The EWS is a general application gathering data about the clients and their history in terms of paying their obligations through regular installments.
Q3/2021 - 265 source tables added, mainly data used by the Data Science team and the Small to Medium Enterprise Underwriting application. This application is used by a bank to assess whether a small entrepreneur can get a loan or not.
The Data Vault is a unique hybrid approach combining the advantages of the classical data warehouse with the dimensional design. The main advantages are:
Incremental delivery: We started with a pilot and integrated more sources over time.
Short time to value: Rapidly delivering impact to business even in the pilot project phase.
Flexibility: Data sources can be added incrementally to the data vault. Likewise, the raw and business-derived data are stored separately, which supports changes to business rules with ease.
Reduced Complexity: As the data vault separates the technical data restructuring from the application of business rules, this helps isolate these potentially complex stages. Also, data cleaning is considered a business rule and can be managed independently of the initial data load.
Data lineage and audit: A metadata-driven approach helps identify the source systems and makes it easier to support data lineage. Results are never lost during incremental data vault changes, providing an automatic audit trail.
The bank is now benefiting from a flexible data vault enabling tracking of digital adoption and performing predictive analytics analyzing customer behavior e.g., correlate behaviors as an indicator for major purchases, identify usage patterns and volume as a predictor to account close, identify friction points in a digital process where customers abandon the application or call into the contact center, or to determine life stage or lifestyle for relevant products and services.
The implemented solution increased the quality and trust in data across the entire organization of our client.