INSERT vs. UPDATE and Their Impact on Data Ingestion

INSERT vs UPDATE, which one is faster for Data Ingestion?

Nov 15, 2024

I recently stumbled across an interesting interview question: "Which is harder for a database to handle — INSERT or UPDATE?"

At first, you might think, "Does it even matter? They serve different purposes, right?" INSERT adds new rows, and UPDATE modifies existing ones — case closed. But then when it comes to Data Ingestion, these two operations often collide. When you're loading data into a system — syncing external sources, streaming events, or updating analytics tables — choosing between INSERT and UPDATE can significantly affect your database's performance and scalability.

So, let's explore how INSERT and UPDATE work under the hood (we will focus on the data ingestion context, but the concept should apply to other scenarios as well). We'll dig into the challenges they create, the trade-offs you should consider, and when you might use one over the other—or a hybrid approach like UPSERT.

The INSERT Operation

At first glance, INSERT seems straightforward — add a new row into the system. However, behind the scenes, it's far from trivial. Let's dive into the mechanics.

Indexes

When you insert a row, it's not just about adding data to the table. The database engine must also update all associated indexes. If you think of indexes as catalogs, every new entry needs to be carefully placed into its correct place, ensuring the catalog stays sorted and useful.

More indexes mean more overhead. Each additional index requires extra effort to calculate, locate, and store the new data. While a single-column index is relatively quick to update, composite or unique indexes (like (user_id, created_at)) add complexity and latency.

💡 Avoid blindly adding indexes. Each one improves query performance but slows down INSERT. Aim for balance.

Constraints and Triggers

Before new data officially lands in a table, it must pass several integrity checks. These constraints include foreign keys to enforce relationships, UNIQUE constraints to prevent duplicate entries, and NOT NULL requirements to ensure data quality. If the schema includes computed columns, the database must calculate their values on-the-fly.

Additionally, triggers can execute custom logic during insertion, like logging or cascading updates (or whatever custom logic you might have). While useful, they can significantly increase the workload, especially for complex data schemas.

a man in a suit says that 's not all folks in front of a green screen

You're getting an exclusive preview of my latest article! Want to dive deeper? The full story is just a click away:

Read the full article here

Blog | luminousmen

Discussion about this post