Data Engineering: Now with 30% More Bullshit

Tools don't solve problems. People do. No buzzword replaces craftsmanship.

Apr 15, 2025

Cue the music. Roll the Gartner slide deck.

Let's talk about something that everyone in data engineering has felt at some point but rarely says out loud: we're being bombarded with marketing BS that's trying to replace actual engineering.

You know what I mean.

Every week there's a new "revolutionary" architecture, some AI-powered hammer promising to fix all our problems. And of course, everyone on LinkedIn swears this new thing is the future.

Spoiler: it's not.

Most of it is just rebranded old ideas, overpriced software, or in the best-case scenario thin abstractions over things we've already been doing. Worst case? It's a distraction that wastes your time, burns your budget, and adds complexity without delivering real value.

Let's dig into some of the tools for data engineers - the ones that I kinda sick of hearing - I hope you will convince me that I'm wrong.

Data Fabric

"Data Fabric" sounds like something woven by data elves in the night — a magical tapestry where all your systems are seamlessly connected, metadata flows like lifeblood, and business users get everything they need without ever pinging engineering again.

Just buy a few licenses from a big vendor, and voilà — your data problems disappear.

Except... they don't.

So, what is Data Fabric, really?

Behind the buzzword, Data Fabric is just a fancy way of describing a mix of:

Centralized metadata
Data virtualization
Real-time sync
And a dash of machine learning

That's it.

On the surface, it sounds amazing:

No more complex pipelines — everything connects itself!
Metadata handles everything — no manual work!
Real-time data access — who needs ETL?
Self-service for business users — no more Slack discussions!

In theory, it's an architecture designed to unify your scattered systems, automate data workflows, and make data accessible across the board. Sounds like a dream, right?

In practice, though:

Every connector still needs to be configured, secured, and monitored
Metadata still has to be collected, cleaned, and constantly updated
Dirty source data is still dirty — licenses don't scrub data
Each data source behaves differently — there's no magic fix for that
AI doesn't clean bad inputs — it just guesses faster
And performance suffers when you try to make everything talk to everything else in real time

All those glowing articles about "data stitching" and "seamless connectivity"? Just creative ways to avoid admitting the obvious: data integration is still hard.

Data Fabric doesn't eliminate these problems; it just moves them around.

This isn't new

Despite what vendors claim, none of this is revolutionary. It's all been done before — we just didn't call it something this shiny.

Data virtualization? Been around since the early 2000s — Denodo, Composite, take your pick.
Metadata/catalogs? Alation, Collibra, Informatica — nothing new there.
Orchestration? Airflow, Dagster, Luigi. We've had those tools for years.

The concepts behind Data Fabric have existed for decades. The only thing that's really new is the branding. And hey — that's fine. Everything's a remix. Just don't pretend it's magic.

a man in a suit says that 's not all folks in front of a green screen

You're getting an exclusive preview of my latest article! Want to dive deeper? The full story is just a click away:

Continue Reading on My Blog

Blog | luminousmen

Discussion about this post

Ready for more?