Summary

Data Products  is a newsletter by Chad Sanderson

… on my thoughts around Data Product Development, Next Generation Data Modeling, Knowledge Layers, Semantic Layers, Data Mesh and more (any more buzzwords or is that enough?)

 

OnAir Post: Data Products

News

The Shift Left Data Manifesto: Data for the AI Era
Data Products, Chad SandersonMarch 26, 2025

Conway’s Law

Conway’s Law is the observation that organizations tend to design systems that mirror their communication structure. A product designed by a three-person organization will likely have three components. If it is designed by a single team, it will likely all be built within a single large service.

A media company with separate teams for video encoding, recommendation algorithms, and user interfaces might build a streaming platform where these components are loosely coupled, reflecting the team’s structure. Hospitals and insurance companies have separate IT systems due to distinct legal and compliance teams. This results in fragmented medical records across different providers, forcing patients to manually transfer records or redo tests.

There are three primary stakeholders in the data management value chain:

1. Producers: The teams generating the data

2. Platforms: The teams maintaining the data infrastructure

3. Consumers: The teams leveraging the data to accomplish tasks

The Data-Conscious Software Engineer
Substack, Mark FreemanJanuary 28, 2025

The Unicorn That Data Teams Actually Need

Do Data Teams Need SWE Best Practices?

Before you get your pitchforks, the answer is a resounding yes to this question, but I’ve recently changed my opinion on how data teams accomplish this. Something I prided myself on early in my startup career was being a “full stack data scientist,” where I felt comfortable not only developing models and analyses but also having the ability to put them into production as data products. I saw the tremendous impact it had on my career growth and on the teams I worked with, and thus, I pushed hard for any data professional to upskill as a means to bridge the gap between software and data teams.

However, I overlooked one caveat when thinking my situation could be applied to others– software engineering teams often don’t trust the code written by data teams (the irony of a data scientist falling for selection bias). In hindsight this was obvious even for my own situation. For example, in my last startup role, it took me over a year before the engineering team trusted me with read/write access to the transactional database. Other clues included engineers saying things like, “Oh, you created unit tests… Nice!” in complete shock that someone on the data team could write production code.

While frustrating at times, it was the right move by those engineers to be skeptical, and I would honestly hold the same sentiment if I were in their shoes. At the end of the day, software engineering teams are the ones held accountable for application uptime, managing the transactional database, and protecting their codebase. Anyone implementing code outside of these constraints is a potential threat to the above, and why, internally, they have so many best practices to prevent issues (e.g., CI/CD, unit tests, version control, etc.). Furthermore, while all SWEs are aware of these standards, it’s more of a spectrum of understanding among the various data roles they interact with.

While I still think data teams need to upskill on SWE best practices, I no longer think engineering-focused data teams are enough to have robust data workflows and bridge the gap between both sides. This problem statement of “improving the collaboration between data and software developers” is what pushed me to devote the last couple of years to data contracts and allowed me to find the missing link.

 

About

Overview

About me

My name is Chad Sanderson. Currently, I lead the Data Platform team at Convoy, a Digital Freight Unicorn. The data we work with is incredibly complex with an inordinate amount of entities, real-world events, non-linear workflows, and more. For the last ~3 years I have been exclusively focused on rebuilding a framework of modeling for the modern data stack that will allow our data consumers to define the data they need in advance, seamlessly create data products, and construct semantic-based metrics, attributes, and dimensions.

In terms of what you can expect from this newsletter –

  • 1x per week longform content on my philosophical musings around data, architecture, governance, semantics, and data APIs
  • An open line to me to talk data! My calendly is available on request to practitioners and data operators (vendors interested in collaboration, please reach out via LinkedIn)
  • Deep insights into the work we are currently doing at Convoy, including architecture designs, UX, and videos

A fun architecture!

I’m looking forward to hearing from other forward-thinking data folks in the space (Like you?). Feedback always welcome. Talk soon

Chad

Source: Data Products

Web Links

Videos

Data Contracts: The Rise of Modern Data Management with Chad Sanderson

(19:07)
By: Starburst

In this session, Chad Sanderson, CEO of Gable.ai and author of the upcoming O’Reilly book: “Data Contracts,” tackles the necessity of modern data management in an age of hyper iteration, experimentation, and AI. He will explore why traditional data management practices fail and how the cloud has fundamentally changed data development. The talk will cover a modern application of data management best practices, including data change detection, data contracts, observability, and CI/CD tests, and outline the roles of data producers and consumers. Attendees will leave with a clear understanding of modern data management’s components and how to leverage them for better data handling and decision-making.

825: Data Contracts: The Key to Data Quality — with Chad Sanderson

October 8, 2024 (59:31)
By: Super Data Science: ML & AI Podcast with Jon Krohn

Data contracts are redefining data quality and governance, and Chad Sanderson, CEO of Gable.ai, joins host ‪@JonKrohnLearns‬ to explain how they can transform your data strategy. He breaks down what data contracts are, how they shift data quality checks closer to production, and why they’re essential for reducing data debt. Chad also highlights how better alignment between data producers and consumers can elevate data reliability and tackle change-management challenges in modern organizations.

Discuss

OnAir membership is required to make comments and add content.
Contact this post’s lead Curator/Moderator, onAir Curators.

For more information, see our
DE Curation & Moderation Guidelines post. 

This is an open discussion on the contents of this post.

Home Forums Open Discussion

Viewing 1 post (of 1 total)
Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.
Skip to toolbar