Solid ground for data silo development

November 28, 2019

What is DataStore Community?
Why use DataStore Community?
How we deliver the DataStore Community
Useful links

What is DataStore Community?

DataStore Community is a light-weight database based on the DataStore schema and populated with data from credit bureau reports. It is open-source and contains several components:

DataStore model / schema and SQL code to create a sample database in SQL Server (any edition – Express, Standard or Enterprise)
XSLT code to re-enginner data from credit report XML files into DataStore
SSIS package(s) to parse transformed XML files into SQL Server
SQL code to detect and fix common data quality issues
Documentation and source code on our GitHub page

Note: more features will be added in the future, through community contribution and from DataFactory Enterprise

Please see DataFactory overview and feature comparison for more details.

Why use DataStore Community?

It is free

Great value for zero money – the DataStore gets better with every credit report parsed and stored in it, eventually becoming a mini-copy of the credit histories database, limited only by the number of people and legal entities whose credit reports are loaded into it.

The bigger the data —> the better the predictive models

Loading customer-centric data from ERP, CRM and other data sources into the datastore will make your datasets richer. Whatever credit bureau’s models can do – your models will do better.

Thus a valuable digital asset is created for free from credit report costs incurred over the years.

A 3NF normalized datastore that is easily extensible

Thanks to the 3NF database model normalization, additional data sources (e.g. customer card transactions, marketing campaign responses) are added simply as new tables alongside the tables storing data from credit reports.

A 3NF normalized datastore is a good starting point for building a designated DS/ML data silo, and can be stuctured to function as the single source of truth in data integration projects.

Integration into existing datawarehousing infrastructure

Like other mature database management systems, SQL Server has well designed tools for importing all sorts of data from various sources – relational and NoSQL databases, CSV, TXT, XML files. There are also multiple options for exporting data from SQL Server to other destinations.

The data warehousing ecosystem of your company probably has a number of data silos, some of which are used for BI, reporting and data mining purposes. Depending on the specific settings and business requirements, it creates two major options and a range of combinations in-between:

As yet another datamart in the existing datawarehousing infrastructure by importing pre-processed and ready-to use ‘final’ data into a master DWH;
As a stand-alone data silo for DS/ML projects, eventually tranforming it into a master DWH;
Various combinations of the two options above.

How we deliver the DataStore Community

It is available for review and free download from our GitHub page.

Useful links

Datawarehouses, data lakes and datamarts are a big topic revolving around two major datawarehouse philosophies, Inmon and Kimball. Our design of DataStore is in line with Bill Inmon’s paradigm. Please see below a few links on DWH design.

While the source data in DataStore is structured into many tables linked with each other through foreign keys to ensure data integrity and minimize data redundancy, denormalized datamarts can be created as and when required. In practice ‘denormalized’ usually means that tables with ‘final’ data such as variables for predictive analytics and modeling will be created for use in BI, reporting, DS/ML and other projects. Strict 3NF data atomicity requirements do not apply to denormalized datamarts.