Building Blocks of Modern Data Management: Data Subsets and Data Products – Part 1

3d Printed Digital Humanoid Robot Watching Modern New World Thinking AbouDeep Learning Artificial Intelligence 3d Illustration

We are witnessing the transformation of data management into a Commercial Discipline. I challenged the business and IT communities to reframe the conversation about data management; transform data management from an IT practice into a Commercial Discipline focused on leveraging data and advanced analytics (AI/ML) to derive and drive new sources of customer, product, service, and operational value within an ever-changing and adapting organization (Figure 1).

Slide 1

Figure 1: Accelerate the data-driven value creation wheel

As a reminder:

A Discipline consists of systematic research, observation, measurement and experimentation resulting in the assimilation of learnings into laws, theorems, concepts, principles, practices, frameworks and formulas to enable the consistent application and continuous improvements from the real-world application of this discipline.

I believe there are two key modern data management “products” needed to make data management a business discipline focused on helping organizations accelerate their data-driven business innovation. One of these “products” – Data products – is already widely accepted as a way for organizations to monetize their information about customers, products, services and operations or predicted behavioral and performance propensities. However, I want to introduce the concept of a second data management product – Data Subsets – which are the data conditioning and its supporting equipment to accelerate the development, operationalization and scaling of data products (Figure 2):

  • Data subsets (or Data-as-a-Product) are the packaging and pre-wiring of data and its supporting accessories (e.g., rich metadata, data access methods, data governance policies and procedures, security protocols access codes, data quality scores, privacy policies and regulations, usage patterns) in a single “package” with the aim of simplifying the discovery, access and exploration of data in order to optimize the efficiency and productivity of data workers (e.g. data engineers, data scientists, business analysts). Data subsets enable key results in data management, including accelerating the discovery of relevant data sources, simplifying data access and data exploration, optimizing experimentation analytics and AI/ML modeling, increasing feature engineering and improving the development and validation of analytical (AI/ML) models.
  • Data products (or Data Apps) are a set of AI/ML analysis and customer, product, service and/or operational analytical insights (predicted behavioral and performance propensities) as an application to help end users (non-data workers) achieve specific business or operational results. Data products include the integration of data visualizations, key performance indicators (KPIs) and composite metrics, data transformations and enrichments, analytical (predictive) scores, intelligent data pipelines, ML features, ML models and APIs. Data products require comprehensive DevOps product and capability management, including user interface, user experience, application development, operationalization, support, maintenance, and upgrade management.
Slide2

Figure 2: Data Subset vs. Data Product

These two modern data management “products” are essential in helping organizations more effectively leverage data and analytics to power their business and operating models. Let’s look at some real-world examples to bring these important data management products to life.

Data subset is data conditioning and its supporting features and capabilities to optimize the productivity and efficiency of data workers.

Data subsets package data and its accessories to help data artisans improve their productivity and efficiency in data science and data engineering. An excellent concrete example of the important role of sub-assemblies can be found in the automotive industry. In the automotive industry, subassemblies are the pre-packaging and pre-wiring of related components to expedite the construction of the final automotive product (Figure 3).

Slide3

Figure 3: Automotive sub-assemblies

Automotive sub-assemblies increase production efficiency and reduce production risk by reducing time to final product, decreasing manufacturing risk, improving worker safety, reducing assembly failures , improving product quality and reliability and reducing costs associated with labor, manufacturing, procurement, inventory, logistics, maintenance and support. The primary beneficiaries of subassemblies are the engineers and technicians who assemble and build the final automotive product.

In the case of data subsets, the beneficiaries are data engineers, data scientists and business analysts who are now trying to accelerate their ability to discover, explore, analyze, model and visualize data in the delivery of business and operational results.

Data products are the packaging of domain-based, AI/ML-powered applications designed to help non-technical users manage data and analytics-intensive operations to achieve specific, meaningful and relevant business or operational results.

Data products deliver well-defined business or operational outcomes, such as improved customer retention, improved cross-sell and customer efficiencies, reduced unplanned operational downtime, reduced outages, inventory, optimizing load balancing and asset utilization, improving marketing and sales efficiency, and reducing unplanned hospital readmissions.

My example of a favorite data product is Uber. The Uber data product aggregates passenger, driver, and traffic data, along with analytical information to optimize the desired outcome of getting from where I am to where I want to go. And it does so with an easy-to-understand, game-like user experience (UEX).

There are many real-life examples of data products – end-user products that leverage data and advanced analytics to uncover and apply analytical insights to deliver well-defined business or operational outcomes, including (Figure 4):

  • Google Nest Thermostat regulates your home’s temperature in response to your heating and cooling needs – raising temperatures when you’re home and lowering temperatures when you’re out – ultimately reducing energy use and energy costs.
  • BeClose Elderly Care Monitor uses an array of sensors placed throughout the home to track an elderly person’s routines, allowing those who live independently to continue to do so while allowing family and caregivers to monitor their well-being -be.
  • Babolat Play is a tennis racquet designed to improve performance by allowing a player or coach to set specific goals and track their progress using data analysis. The system adds a social element by making it easy to post game data to social media, challenge a friend to a side-by-side stats battle or see how your numbers stack up against the pros.
  • GreenIQ saves you time and resources by keeping plants fed according to their growing needs and conditions while automating much of the work process.
  • Echelon Smart Lighting enables a city to intelligently provide the right level of lighting needed depending on the time of day, season and weather conditions.
  • At the farm combines real-time sensor data on soil moisture levels, weather forecasts and pesticide use to detect crop issues and remotely monitor all farm assets and resource usage levels .
Slide4

Figure 4: Examples of real-world data and analytics-driven data products

And what’s most exciting about the potential of data products is the ability to integrate AI/ML into the operations of these data products so that they can continuously learn and adapt, becoming subsequently more valuable as the data products are used (i.e., becoming more predictable, more accurate, more relevant, more reliable and more personalized). This is a subject that I will explore in a future blog.

I hope I have forcefully demonstrated the need for two important data management products:

  • Data subsets are the packaging and pre-cabling of data and its supporting equipment in a single “package” to accelerate the productivity and efficiency of data workers.
  • Data products are the package of domain-based, AI/ML-powered applications designed to help non-technical users manage data and analytics-intensive operations to drive specific business and operational outcomes, meaningful and relevant.

In Part 2 of this blog series, I’ll explore the economics (of course) of data subsets and the role that data subsets and data products play in accelerating your journey of data management, from one-time data projects to engineering composable and reusable data products. .

Ramon J. Espinoza