November 28, 2019
What is DataFactory Azure?
DataFactory Azure
is our flagship product designed for ease-of-use,
low cost of ownership and rich extra features.
It requires no costly complex in-house DS/ML ecosystems (servers, applications, etc.)
as all data is processed in the cloud.
Most of the modeling is done by us and verified by you before the deployment,
so having a large team of data scientists is not a strict requirement.
Taking advantage of ample functionality extension possibilities
in the Azure cloud, DataFactory Azure
allows to start small and grow big.
Powered by Microsoft Azure and managed by us, it is, like most SaaS., priced
as a monthly/yearly subscription
DataFactory Azure
is equally suitable for lending, retail, insurance
and manufacturing companies that want to leverage the power of DS/ML by
putting data from ERP, CRM, accounting software, etc. to good use.
The data
For use-cases where most of the data comes from credit reports,
we deploy DataFactory Enterprise
in SQL Server on Azure upon adding
custom tables & business logic on top of the standard DataStore
schema
to handle extra data sources.
For data coming primarily from ERP, CRM and similar business applications, we deploy a data lake according to the Common Data Model, or CDM schema.
Backed by the consortium of Microsoft, Adobe, SAP and other partners,
CDM is an open-source collection of schemas for storing high variety and
high volume of data from business software, and is universally applicable
for building ML applications across many industries.
For example, Microsoft’s Dynamics 365
cloud-based enterprise software platform
relies on CDM for data storage.
See CDM GitHub page and our
FAQ for more details on CDM.
The models
To achieve a balanced assessment for problem at hand we build ensembles of different types of predictive models. For credit risk modeling the ‘classic’ ML algorithms and deep neural networks are combined in 2-4 different scenarios to assess risk on both new loan applications and existing loans. Thus the data and the models is applied to loan approval and portfolio monitoring alike.
- standard approach: classification models with binary default target variables
- regression models with numeric measures of default (share of debt in default, etc.)
- survival models with time-based measures of default, e.g. ‘time till default’
- complex architectures like Markov decision processes and deep neural networks for modeling more complex scenarios like ‘which of several outstanding loans a borrower will default on, and why’
Basel and IFRS9 compliant risk assessment is built from combinations of these models to address PD, EAD and LGD calculations.
Retail companies have diverse data usage scenarios and prediction use-cases. We use various methods and algorithms according to specific business requirements.
- Classification, regression and time series for ‘standard’ scenarios like customer churn, sales forecasting, inventory management
- Apriori, Eclat, FP-growth algorithms for frequent itemset analysis
- Causality analysis (e.g. marketing campaigns)
- Deep learning based recommender systems
The Azure cloud power
Powered by Azure Data Factory and Microsoft Flow, automated workflows take care of repetitive administrative tasks like frequent data export/import across various applications without the need for complex custom integrations.
Ancillary web and mobile apss are created by us and/or by you with PowerApps, a service for building custom business apps that connect to your data without the time and expense of custom software development. Apps built using PowerApps provide rich business logic and workflow capabilities to transform manual business processes to digital, automated processes. Employee engagement surveys, cost estimators, budget trackers, to-do lists, booking apps; these can all be created for web and mobile without the need for coding knowledge or input from developers. PowerApps “democratizes” the custom business app development experience by enabling users to build feature-rich, custom business apps without writing code.
Another important component is PowerBI. It comes as a cloud service, a server installation and a desktop application. See examples of business analytics applications developed with PowerBI.
Why use DataFactory Azure?
Start using machine learning without making upfront investment. Pay as you go for actual data storage and computing power that you use; pay for bottom line deliverables like predictive models that you need instead of investing in hardware, software and labour.
Goodbye ownership, hello usership
is the new paradigm for companies that want to the leverage cloud-based computing and analytic resources without having to invest heavily in DS/ML.
Customizable
Combining our Core ML track
and Deep learning fundamentals
training programs
with the CDM-based Azure data lake, DataFactory Azure
can be upgraded and customized
on a large scale to cover most of the
data science needs.
Stable enterprise-grade data schema
The CDM data schema is a battle-tested design that underpins Microsoft’s Dynamics 365 ERP software and is supported by the CDM consortium. Unlike custom-developed datawarehouses, the CDM schema, a mission-critical component of Microsoft’s cloud business, will continue to be developed. Be assured that most of the data that you will employ in DS/ML projects will match the Common Data Model, with substantial schema depth remaining for more variety and volume of the data in the future.
Easy integration
The models are as light-weight as can be: there is no need for servers, applications and infrastructure administrators, only a minor customization of your core business applications (e.g. ERP) for web services that communicate with your predictive models’ deployed in Microsoft Azure Machine Learning Studio or Azure Machine Learning Service.
How we deliver DataFactory Azure
SaaS applications are usually ready-to-use solutions that come with standardized features out of the box but actual features may vary depending on the edition.
Think of DataFactory Azure
as SaaS applications that come in two editions,
the DataStore
schema and the Common Data Model
schema.
The complexity of data engineering, predictive modeling and overall cost
of the CDM edition will depend on the variety and amount of data
to be stored. The DataStore
edition is standardized.
The next factor is the number of predictive models to be deployed. Our standardized models are ‘pre-fabricated’, i.e. they run on pre-specified datasets and require calibration on your data prior to deployment. Custom development of predictive models that may be required for the customer’s specific use-cases will affect the upfront deployment cost.