The Data-Conscious Software Engineer

Source: Substack

By Mark FreemanJanuary 28, 2025

The Unicorn That Data Teams Actually Need

Do Data Teams Need SWE Best Practices?

Before you get your pitchforks, the answer is a resounding yes to this question, but I’ve recently changed my opinion on how data teams accomplish this. Something I prided myself on early in my startup career was being a “full stack data scientist,” where I felt comfortable not only developing models and analyses but also having the ability to put them into production as data products. I saw the tremendous impact it had on my career growth and on the teams I worked with, and thus, I pushed hard for any data professional to upskill as a means to bridge the gap between software and data teams.

However, I overlooked one caveat when thinking my situation could be applied to others– software engineering teams often don’t trust the code written by data teams (the irony of a data scientist falling for selection bias). In hindsight this was obvious even for my own situation. For example, in my last startup role, it took me over a year before the engineering team trusted me with read/write access to the transactional database. Other clues included engineers saying things like, “Oh, you created unit tests… Nice!” in complete shock that someone on the data team could write production code.

While frustrating at times, it was the right move by those engineers to be skeptical, and I would honestly hold the same sentiment if I were in their shoes. At the end of the day, software engineering teams are the ones held accountable for application uptime, managing the transactional database, and protecting their codebase. Anyone implementing code outside of these constraints is a potential threat to the above, and why, internally, they have so many best practices to prevent issues (e.g., CI/CD, unit tests, version control, etc.). Furthermore, while all SWEs are aware of these standards, it’s more of a spectrum of understanding among the various data roles they interact with.

While I still think data teams need to upskill on SWE best practices, I no longer think engineering-focused data teams are enough to have robust data workflows and bridge the gap between both sides. This problem statement of “improving the collaboration between data and software developers” is what pushed me to devote the last couple of years to data contracts and allowed me to find the missing link.

 

Discuss

OnAir membership is required to make comments and add content.
Contact this post’s lead Curator/Moderator, Matthew Kovacev.

For more information, see our
DE Curation & Moderation Guidelines post. 

This is an open discussion on this news piece.

Home Forums Open Discussion

Viewing 1 post (of 1 total)
Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.
Skip to toolbar