<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[luminousmen: Spark Under the Hood]]></title><description><![CDATA[Engineering-grade breakdowns of Spark internals, tuning, and pitfalls]]></description><link>https://luminousmen.substack.com/s/apache-spark-under-the-hood</link><image><url>https://substackcdn.com/image/fetch/$s_!JtUF!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b28cd70-157c-4b06-872e-a38fe5155009_297x297.png</url><title>luminousmen: Spark Under the Hood</title><link>https://luminousmen.substack.com/s/apache-spark-under-the-hood</link></image><generator>Substack</generator><lastBuildDate>Wed, 06 May 2026 03:41:19 GMT</lastBuildDate><atom:link href="https://luminousmen.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[luminousmen]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[luminousmen@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[luminousmen@substack.com]]></itunes:email><itunes:name><![CDATA[luminousmen]]></itunes:name></itunes:owner><itunes:author><![CDATA[luminousmen]]></itunes:author><googleplay:owner><![CDATA[luminousmen@substack.com]]></googleplay:owner><googleplay:email><![CDATA[luminousmen@substack.com]]></googleplay:email><googleplay:author><![CDATA[luminousmen]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Apache Spark Optimization Checklist]]></title><description><![CDATA[Distilled from 16 deep dives and roughly that many incidents I'd rather forget]]></description><link>https://luminousmen.substack.com/p/the-apache-spark-optimization-checklist</link><guid isPermaLink="false">https://luminousmen.substack.com/p/the-apache-spark-optimization-checklist</guid><dc:creator><![CDATA[luminousmen]]></dc:creator><pubDate>Tue, 05 May 2026 13:02:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4lPN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4lPN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4lPN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!4lPN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!4lPN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!4lPN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4lPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!4lPN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!4lPN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!4lPN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!4lPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0180af6b-cabf-4c65-92d6-2e96787a8d3c_2752x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">I&#8217;ve made every dumb Spark mistake at least once. At production scale &#8212; real data, real concurrency, real stakeholders yelling in Slack &#8212; <em>&#8220;it works&#8221;</em> and <em>&#8220;it works well&#8221;</em> are completely different conversations. So I started writing them down.</p><p style="text-align: justify;">This is the checklist I wish I had taped to my monitor when I started. Every item comes from a real production screwup &#8212; mine or someone else&#8217;s.</p><h2>Before You Write a Single Line</h2><ul><li><p><strong>Use the <a href="https://luminousmen.com/post/spark-tips-dataframe-api/">DataFrame / Dataset API</a>, not RDDs.</strong> RDDs are lambda-driven &#8212; Spark can&#8217;t see inside them, can&#8217;t optimize them. DataFrames go through the Catalyst optimizer. You get predicate pushdown, filter reordering, Adaptive Query Execution, and cost-based join reordering for free. The RDD API in MLlib is in maintenance mode. Let it go.</p></li><li><p><strong>Pick the right file format.</strong> <a href="https://luminousmen.com/post/why-parquet-is-the-goto-format-for-data-engineers/">Parquet</a> for analytical queries &#8212; column pruning, predicate pushdown, stats-based skipping all work out of the box. <a href="https://luminousmen.com/post/big-data-file-formats/">Avro</a> for schema-evolution-heavy ingestion. For anything read more than once in the same week, use a table format on top &#8212; Iceberg or Delta &#8212; so you get ACID, time travel, and stats the query planner can actually use. If you&#8217;re reading raw CSV or JSON, always specify the schema explicitly. Inference means a full scan just to figure out types.</p></li><li><p><strong>Use splittable <a href="https://luminousmen.com/post/choosing-the-right-compression-codec/">compression</a>.</strong> Snappy, LZ4, or ZSTD &#8212; never GZIP. A 10GB GZIP file can&#8217;t be split across executors &#8211; one poor node has to decompress the whole thing. Snappy is the reliable default. ZSTD is higher-compression and, as of Spark 4.x, runs in parallel for shuffle (<a href="https://issues.apache.org/jira/browse/SPARK-46256">SPARK-46256</a>) &#8212; use it for shuffle spill and intermediate files to cut network time.</p></li><li><p><strong>Know your Spark version.</strong> Spark 4.0 dropped Scala 2.12, JDK 8, JDK 11, Mesos, and Python 3.8. If your platform team is still on any of those, that&#8217;s the first fight to pick &#8212; the rest of this checklist assumes you&#8217;re on JDK 17, Scala 2.13, and Python 3.9+. Don&#8217;t tune a job on a platform that&#8217;s already a stage behind.</p></li></ul><h2>Partitioning &#8212; Hidden Architecture</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qZoS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qZoS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png 424w, https://substackcdn.com/image/fetch/$s_!qZoS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png 848w, https://substackcdn.com/image/fetch/$s_!qZoS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png 1272w, https://substackcdn.com/image/fetch/$s_!qZoS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qZoS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png" width="1000" height="698" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:698,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44180,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/195456322?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!qZoS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png 424w, https://substackcdn.com/image/fetch/$s_!qZoS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png 848w, https://substackcdn.com/image/fetch/$s_!qZoS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png 1272w, https://substackcdn.com/image/fetch/$s_!qZoS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd310aea3-cf98-420e-b198-c7073eb5ae37_1000x698.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Tune </strong><code>spark.sql.files.maxPartitionBytes</code><strong> for your storage layout.</strong> Default is 128 MB. If your Parquet files are mostly 256 MB+, you&#8217;re under-parallelizing reads. If they&#8217;re 8 MB, you&#8217;ve got too many tasks. Match the split size to your actual file distribution, not the Spark default. This is the input-side knob &#8212; it sets your initial partition count before any AQE coalescing kicks in.</p></li><li><p><strong>Target <a href="https://luminousmen.com/post/spark-tips-partition-tuning/">2-4 partitions per available core</a>.</strong> Fewer than that and cores sit idle. More than that and the scheduler spends more time tracking tasks than running them. If tasks routinely finish in under 100ms, your partitions are too small. If one task runs 10x longer than the others, you have skew &#8212; see the joins section.</p></li><li><p><strong>Filter early, filter hard.</strong> Push filters as close to the data source as possible. Partition pruning exists for a reason &#8212; if your data is partitioned by date and you only need last week, Spark should never touch the other 51 weeks. With Dynamic Partition Pruning (default-on in 4.x), this also works at runtime across joins. Which brings me to the next point.</p></li><li><p><strong>[4.x] Use multi-key Dynamic Partition Pruning for compound partitions.</strong> <a href="https://issues.apache.org/jira/browse/SPARK-46946">SPARK-46946</a> added multi-key DPP. If your fact table is partitioned by (date, region) and you join it against a small filtered dim, Spark now prunes on both keys at runtime. This unlocks real star-schema performance that was 3.x-impossible. No config needed &#8212; it just works if the dim side broadcasts.</p></li><li><p><strong>Coalesce after heavy filtering.</strong> You just filtered 2 billion rows down to 2 million but still have 10,000 partitions. That&#8217;s 10,000 nearly-empty tasks. <code>.coalesce()</code> fixes this without a full shuffle. <code>.repartition()</code> if you need even distribution across the new partition count. AQE&#8217;s coalescePartitions handles shuffle-output coalescing automatically, but won&#8217;t help on input-side partition count after filter.</p></li><li><p><strong>Repartition on join keys before multiple joins.</strong> If you&#8217;re joining the same DataFrame three times on user_id, repartition on user_id once and cache it. Otherwise you&#8217;re shuffling the same data three times. Better: if the table is long-lived, bucket it &#8212; see the joins section.</p></li><li><p><strong>Repartition after </strong><code>flatMap</code><strong>.</strong> flatMap can 10x your row count without touching the partition count. Now you have massively uneven partitions. Repartition explicitly or enjoy your disk spills.</p></li><li><p><strong>Use </strong><code>.partitionBy()</code><strong> on writes</strong> when downstream jobs filter on those columns. Limit partitioned columns to low-cardinality (&#8804; a few hundred distinct values). <code>.partitionBy("user_id")</code> on a user table is a <a href="https://luminousmen.com/post/how-not-to-partition-data-in-s3-and-what-to-do-instead/">disaster</a> &#8212; millions of tiny directories. <code>.partitionBy("date")</code> on daily data is the canonical right answer.</p></li></ul><h2>Memory</h2><ul><li><p><strong>Know your <a href="https://luminousmen.com/post/dive-into-spark-memory/">memory layout</a>.</strong> Default: 60% of executor memory for execution + storage (<code>spark.memory.fraction</code>), split 50/50 (<code>spark.memory.storageFraction</code>). The remaining 40% is user memory. Don&#8217;t blindly increase executor memory &#8212; understand <em>which</em> pool is running out first. The Spark UI&#8217;s Storage tab tells you what&#8217;s cached; the Executors tab tells you what&#8217;s in use.</p></li><li><p><strong>For PySpark: bump </strong><code>memoryOverhead</code><strong> to 20&#8211;25%.</strong> Default is 10% or 384 MB. Arrow and pandas UDFs allocate native memory that doesn&#8217;t show up in JVM metrics. Your executor gets killed by <a href="https://luminousmen.substack.com/p/cluster-managers-for-apache-spark">YARN or Kubernetes</a> and you have no idea why. This is why.</p></li><li><p><strong><a href="https://luminousmen.com/post/spark-tips-dont-collect-data-on-driver/">Don&#8217;t collect on the driver</a>.</strong> <code>df.collect()</code> pulls the entire dataset to a single machine. Use <code>.take()</code>, <code>.takeSample()</code>, or <code>.show()</code>. Same goes for <code>.countByKey()</code>, <code>.countByValue()</code>, <code>.collectAsMap()</code> &#8212; all driver-side. Spark logs a warning when task serialized size exceeds 1 MB (<code>TASK_SIZE_TO_WARN_KIB = 1000</code>); past <code>spark.driver.maxResultSize</code> (1 GB default), it errors out.</p></li><li><p><strong>Watch for disk spills.</strong> Check the Spark UI&#8217;s Stages tab. If you see &#8220;Spill (Memory)&#8221; or &#8220;Spill (Disk)&#8221; on a stage: reduce data per partition (more partitions), increase executor memory, or both. Spills mean your data didn&#8217;t fit and Spark wrote intermediate results to disk. That&#8217;s 10-100x slower than in-memory.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bwgJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bwgJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png 424w, https://substackcdn.com/image/fetch/$s_!bwgJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png 848w, https://substackcdn.com/image/fetch/$s_!bwgJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png 1272w, https://substackcdn.com/image/fetch/$s_!bwgJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bwgJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png" width="387" height="191.68309859154928" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:422,&quot;width&quot;:852,&quot;resizeWidth&quot;:387,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Skew and spill | Databricks on AWS&quot;,&quot;title&quot;:&quot;Skew and spill | Databricks on AWS&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Skew and spill | Databricks on AWS" title="Skew and spill | Databricks on AWS" srcset="https://substackcdn.com/image/fetch/$s_!bwgJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png 424w, https://substackcdn.com/image/fetch/$s_!bwgJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png 848w, https://substackcdn.com/image/fetch/$s_!bwgJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png 1272w, https://substackcdn.com/image/fetch/$s_!bwgJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12d08f8f-fc16-4dd8-bac9-8ec7716840b1_852x422.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><ul><li><p><strong>Use off-heap memory for big shuffles.</strong> Set <code>spark.memory.offHeap.enabled=true</code> and <code>spark.memory.offHeap.size</code> for jobs doing heavy shuffles or joins. Off-heap bypasses GC pressure and is more predictable. But you pay for it with a fixed allocation &#8212; but worth it for anything that spills regularly.</p></li></ul><h2>Caching &#8212; It&#8217;s Not Free</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PJMy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PJMy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png 424w, https://substackcdn.com/image/fetch/$s_!PJMy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png 848w, https://substackcdn.com/image/fetch/$s_!PJMy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png 1272w, https://substackcdn.com/image/fetch/$s_!PJMy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PJMy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png" width="1000" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51341,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/195456322?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!PJMy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png 424w, https://substackcdn.com/image/fetch/$s_!PJMy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png 848w, https://substackcdn.com/image/fetch/$s_!PJMy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png 1272w, https://substackcdn.com/image/fetch/$s_!PJMy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a7e7476-2753-4889-896c-e931bf32163a_1000x375.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong><a href="https://luminousmen.com/post/explaining-the-mechanics-of-spark-caching/">Only cache what you reuse</a>.</strong> Caching a DataFrame you touch once just wastes memory that could go to shuffles and joins. Cache when you have branching logic, iterative ML workloads, or multiple actions on the same data.</p></li><li><p><strong><a href="https://luminousmen.com/post/spark-tips-caching/">Force materialization after caching</a>.</strong> <code>.cache()</code> is lazy. Until you trigger an action, nothing is cached. Always follow with <code>.count()</code> or a full action. Otherwise you think you cached it, but you didn&#8217;t, and your next job re-computes everything.</p></li><li><p><strong>Use </strong><code>MEMORY_AND_DISK</code><strong> as your default storage level.</strong> Pure <code>MEMORY_ONLY</code> means data gets evicted silently when memory fills up. <code>MEMORY_AND_DISK</code> spills to disk instead of recomputing. That&#8217;s almost always what you want.</p></li><li><p><strong>Assume your cache will be evicted.</strong> Cache competes with execution memory. Spark uses LRU eviction and won&#8217;t warn you when it drops your cached blocks. Design your job so it&#8217;s correct even without the cache &#8212; cache is for speed, not correctness.</p></li><li><p><strong>Beware partial caching.</strong> <code>.cache()</code> followed by <code>.take(10)</code> only materializes the partitions Spark touched to get 10 rows. The rest is uncached, and the next action recomputes them without warning. Always cache with a <code>.count()</code> or full action first.</p></li></ul><h2>Joins &#8212; The Biggest Shuffle of Your Life</h2><ul><li><p><strong><a href="https://luminousmen.com/post/introduction-to-pyspark-join-types/">Broadcast small tables</a>.</strong> If one side of your join is under <code>spark.sql.autoBroadcastJoinThreshold</code> (10 MB default), Spark broadcasts it &#8212; no shuffle, no exchange, just a hash lookup on every executor. For medium dims (10-200 MB), consider raising the threshold or using <code>.broadcast(df)</code> explicitly. Past 200 MB, broadcasting costs more than it saves.</p></li><li><p><strong>Diagnose skew before you &#8220;fix&#8221; joins.</strong> Open the Spark UI, go to the Stages tab, look at the task duration distribution. If 99 tasks finish in 2 seconds and 1 takes 40 minutes, you have skew. AQE&#8217;s skewJoin handling (default-on in 4.x) covers most cases automatically. If it doesn&#8217;t trigger: broadcast the small side, salt the join key, or do an iterative broadcast join.</p></li><li><p><strong>[4.x] Use Storage Partition Join for pre-partitioned tables.</strong> If both sides of a join come from a DSv2 source (Iceberg, Delta) and are partitioned on the same columns, SPJ (<a href="https://issues.apache.org/jira/browse/SPARK-51938">SPARK-51938</a> improvements in 4.x) skips the shuffle entirely. Set <code>spark.sql.sources.v2.bucketing.enabled=true</code>. This is the single biggest shuffle-elimination feature in Spark 4.x and it&#8217;s criminally underused.</p></li><li><p><strong><a href="https://luminousmen.com/post/the-5-minute-guide-to-using-bucketing-in-pyspark/">Bucket your tables</a> when SPJ isn&#8217;t an option.</strong> For Hive-style writes, pre-bucket both tables on the join key with the same bucket count. Spark skips the shuffle on subsequent joins. Older pattern than SPJ but still relevant for non-DSv2 sources.</p></li><li><p><strong>Order joins smallest-first when AQE isn&#8217;t picking up slack.</strong> AQE reorders most joins automatically based on runtime stats, but if you&#8217;re outside AQE&#8217;s reach (e.g. RDD path, or a shuffle-heavy plan that AQE can&#8217;t retry), put the smallest table first in your explicit join chain.</p></li></ul><h2>JDBC Sources</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qlbF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qlbF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png 424w, https://substackcdn.com/image/fetch/$s_!qlbF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png 848w, https://substackcdn.com/image/fetch/$s_!qlbF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png 1272w, https://substackcdn.com/image/fetch/$s_!qlbF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qlbF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png" width="549" height="411.75" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:800,&quot;resizeWidth&quot;:549,&quot;bytes&quot;:43638,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/195456322?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!qlbF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png 424w, https://substackcdn.com/image/fetch/$s_!qlbF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png 848w, https://substackcdn.com/image/fetch/$s_!qlbF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png 1272w, https://substackcdn.com/image/fetch/$s_!qlbF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa888a500-1cc2-412e-b14e-bb5033cb09b2_800x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong><a href="https://luminousmen.com/post/spark-tips-optimizing-jdbc-data-source-reads/">Set </a></strong><code>numPartitions</code><strong><a href="https://luminousmen.com/post/spark-tips-optimizing-jdbc-data-source-reads/"> for parallel reads</a>.</strong> Default JDBC reads load everything into a single partition on a single executor. Set <code>.option("numPartitions", N)</code> with <code>.partitionColumn()</code>, <code>.lowerBound()</code>, and <code>.upperBound()</code> to parallelize. This can be the difference between a 2-hour read and a 5-minute read.</p></li><li><p><strong>Use predicates for non-numeric partitioning.</strong> If your partition key isn&#8217;t a clean numeric range, pass an array of SQL WHERE clauses via <code>.option("predicates", ...)</code>. One task per predicate, hand-sized ranges. Ugly but effective.</p></li><li><p><strong>Push down what you can, don&#8217;t trust the driver to do it for you.</strong> Spark pushes basic filters (column equality, IN, IS NULL) to JDBC automatically, but anything with a computed column or cast won&#8217;t push and will get evaluated in Spark after the full read. For complex predicates, write the filter explicitly in a query option instead of relying on pushdown.</p></li></ul><h2>What Spark 4.x Actually Changed</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C03A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C03A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png 424w, https://substackcdn.com/image/fetch/$s_!C03A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png 848w, https://substackcdn.com/image/fetch/$s_!C03A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!C03A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C03A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png" width="1456" height="798" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:798,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Apache Spark&quot;,&quot;title&quot;:&quot;Apache Spark&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Apache Spark" title="Apache Spark" srcset="https://substackcdn.com/image/fetch/$s_!C03A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png 424w, https://substackcdn.com/image/fetch/$s_!C03A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png 848w, https://substackcdn.com/image/fetch/$s_!C03A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!C03A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57281b-effc-47ac-bd72-7035366d7195_2368x1298.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://www.databricks.com/blog/introducing-apache-spark-40">Databricks blog</a></figcaption></figure></div><p style="text-align: justify;">These items used to require config flags. In 4.x they&#8217;re on by default. If you&#8217;re copy-pasting old configs forward, some are now redundant and a couple are quietly doing the opposite of what you think.</p><ul><li><p><strong>[4.x] AQE is default-on. Stop toggling it.</strong> spark.sql.adaptive.enabled=true is the default since 3.2. If you&#8217;re copy-pasting configs that explicitly set it, trim them. What to care about instead: adaptive.coalescePartitions.parallelismFirst (default false &#8212; set true if you want AQE to prioritize parallelism over partition size), and the skew join thresholds if your data is unusual.</p></li><li><p><strong>[4.x] DPP is default-on, and now multi-key.</strong> Same story &#8212; stop toggling dynamicPartitionPruning.enabled. The upgrade that matters in 4.x is <a href="https://issues.apache.org/jira/browse/SPARK-46946">SPARK-46946</a>: DPP now broadcasts multiple keys, so joins against compound-partition fact tables prune at runtime.</p></li><li><p><strong>[4.x] RocksDB is the default shuffle service DB backend</strong> (<a href="https://issues.apache.org/jira/browse/SPARK-45351">SPARK-45351</a>). If you run external shuffle service with a database backend, this changed under you. Usually a win, but worth checking your ESS metrics after upgrade.</p></li><li><p><strong>[4.x] </strong><code>spark.shuffle.service.removeShuffle</code><strong> is default-on</strong> (<a href="https://issues.apache.org/jira/browse/SPARK-47448">SPARK-47448</a>). Shuffle data is cleaned up automatically when referenced RDDs are GC&#8217;d. Your &#8220;cluster disk fills up after long jobs&#8221; problem from 3.x is probably gone. If it isn&#8217;t, check your lineage &#8212; something is holding references.</p></li><li><p><strong>[4.x] Parallel ZSTD/LZF for shuffle compression.</strong> <a href="https://issues.apache.org/jira/browse/SPARK-46256">SPARK-46256</a> and <a href="https://issues.apache.org/jira/browse/SPARK-48518">SPARK-48518</a>. If you&#8217;re still on default Snappy for shuffle compression, you&#8217;re leaving CPU parallelism unused on modern multi-core executors. Set <code>spark.shuffle.compress=true</code> (default) and <code>spark.io.compression.codec=zstd</code>.</p></li><li><p><strong>Kryo vs Java serializer &#8212; still worth it.</strong> Default is still Java, which is 2-10x slower and larger on the wire. <code>spark.serializer=org.apache.spark.serializer.KryoSerializer</code>. You pay this cost on every shuffle. Register your custom classes (<code>spark.kryo.classesToRegister</code> or <code>spark.kryo.registrator</code>) or Kryo silently falls back to Java for unregistered types.</p></li></ul><h2>Before You Ship to Prod</h2><ul><li><p><strong><a href="https://luminousmen.com/post/spark-history-server-and-monitoring-jobs-performance/">Actually read the Spark UI</a>.</strong> Look at the DAG, the stage timeline, the task distribution. Most performance problems are visible in the UI if you bother to look. Uneven task bars = skew. Lots of stages = unnecessary shuffles. Red bars in the stage view = spills. The SQL tab shows you the physical plan with runtime stats &#8212; this is where AQE&#8217;s decisions become visible.</p></li><li><p><strong>Monitor first, tune second.</strong> Don&#8217;t guess. Don&#8217;t pre-optimize. Run the job, look at the metrics, then adjust. Bumping executor memory to 64 GB &#8220;just in case&#8221; is overprovisioning. You&#8217;ll keep paying for it every day until someone audits the bill.</p></li><li><p><strong>Use </strong><code>.localCheckpoint()</code><strong> to break lineage.</strong> Long chains of transformations build massive execution plans. A checkpoint before a repartition-and-write breaks the plan into manageable stages and can prevent stack overflows on deeply nested DAGs. Also the only way to truncate lineage cheaply when you don&#8217;t have reliable distributed storage.</p></li><li><p><strong>Turn on Prometheus metrics.</strong> <code>spark.ui.prometheus.enabled=true</code> is default-on in 4.x (<a href="https://issues.apache.org/jira/browse/SPARK-46886">SPARK-46886</a>). Scrape the executor endpoints into whatever observability stack you have. If you don&#8217;t have metrics on executor memory pressure and shuffle throughput, you&#8217;re tuning blind.</p></li><li><p><strong>Set a job timeout.</strong> <code>spark.task.reaper.killTimeout</code> and a driver-level cutoff. Without one, a runaway job will burn cluster cost over the weekend before anyone notices.</p></li></ul><h2>TLDR;</h2><p style="text-align: justify;">If you only remember five things:</p><ol><li><p><strong>Use the DataFrame API.</strong> Everything else in this list depends on it.</p></li><li><p><strong>Filter and partition-prune as early as possible.</strong> Don&#8217;t compute on rows you&#8217;re going to throw away.</p></li><li><p><strong>Find your skew before you tune anything else.</strong> One slow task in 200 is the whole problem 80% of the time.</p></li><li><p><strong>AQE, DPP, SPJ are default-on in 4.x.</strong> Know what they do so you stop double-configuring them and start noticing when they don&#8217;t fire.</p></li><li><p><strong>Read the UI.</strong> The answer is almost always in the UI if you look.</p></li></ol><div><hr></div><p style="text-align: justify;"><strong>Want this as a printable PDF?</strong> Subscribe at <a href="https://luminousmen.substack.com/welcome">luminousmen.substack.com</a> and I&#8217;ll send it &#8212; plus a new data engineering deep-dive every week.</p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:1936637,&quot;name&quot;:&quot;luminousmen&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JtUF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b28cd70-157c-4b06-872e-a38fe5155009_297x297.png&quot;,&quot;base_url&quot;:&quot;https://luminousmen.substack.com&quot;,&quot;hero_text&quot;:&quot;Helping robots conquer the earth and trying not to increase entropy. Deep-dives on data engineering, career, and industry rants from the big-tech trenches. Author of Grokking Concurrency&quot;,&quot;author_name&quot;:&quot;luminousmen&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:null,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://luminousmen.substack.com?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!JtUF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b28cd70-157c-4b06-872e-a38fe5155009_297x297.png" width="56" height="56"><span class="embedded-publication-name">luminousmen</span><div class="embedded-publication-hero-text">Helping robots conquer the earth and trying not to increase entropy. Deep-dives on data engineering, career, and industry rants from the big-tech trenches. Author of Grokking Concurrency</div></a><form class="embedded-publication-subscribe" method="GET" action="https://luminousmen.substack.com/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div>]]></content:encoded></item><item><title><![CDATA[Spark Caching Explained: What Really Happens Under the Hood]]></title><description><![CDATA[You called .cache(). Spark said "maybe". Let&#8217;s talk about what actually happens]]></description><link>https://luminousmen.substack.com/p/spark-caching-explained-what-really</link><guid isPermaLink="false">https://luminousmen.substack.com/p/spark-caching-explained-what-really</guid><dc:creator><![CDATA[luminousmen]]></dc:creator><pubDate>Tue, 03 Feb 2026 14:01:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!yEX7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yEX7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yEX7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yEX7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yEX7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yEX7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yEX7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg" width="800" height="599" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:599,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70485,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/183597683?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yEX7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yEX7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yEX7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yEX7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88140424-7de6-43d8-8757-84fc0bdfed45_800x599.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most Spark engineers use caching without actually knowing how it works. Not because they&#8217;re lazy, but because Spark makes it look simple. Under the hood, caching is tightly coupled to how Spark handles <a href="https://luminousmen.substack.com/i/165644813/logical-vs-physical-plans">plans</a>, <a href="https://luminousmen.substack.com/p/deep-dive-into-spark-memory-management">memory</a>, <a href="https://luminousmen.substack.com/p/spark-partitions">partitions</a>, and block-level execution. Miss one of those details, and Spark will ignore your cache, drop it, or recompute the entire DAG from scratch without saying a word.</p><p>This post isn&#8217;t about what <code>.cache()</code> does. It&#8217;s about what Spark actually does when you call it, and all the reasons your cache might not behave the way you think.</p><h2>Recap</h2><p>Before we touch caching, let&#8217;s remind ourselves how Spark thinks. </p><p>At its core, Spark builds everything on top of RDDs (Resilient Distributed Datasets) , which are immutable collections of records split across your cluster. Think of RDDs as the assembly language of Spark: everything higher-level (like DataFrames or Datasets) eventually compiles down to RDDs and their operations.</p><p>When you write a Spark job, you&#8217;re either transforming data (via transformations like <code>map</code>, <code>filter</code>, <code>join</code>) or triggering actions (<code>count</code>, <code>take</code>, <code>collect</code>). <a href="https://luminousmen.substack.com/i/165644813/transformations">Transformations</a> are lazy &#8212; they don&#8217;t do anything when you define them. Instead, Spark just builds a DAG of steps it <em>could</em> run, like a to-do list for later. It&#8217;s only when you hit an <a href="https://luminousmen.substack.com/i/165644813/actions">action</a> that Spark wakes up and starts actually executing the DAG to produce a result.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W_2m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W_2m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png 424w, https://substackcdn.com/image/fetch/$s_!W_2m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png 848w, https://substackcdn.com/image/fetch/$s_!W_2m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png 1272w, https://substackcdn.com/image/fetch/$s_!W_2m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W_2m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png" width="1000" height="698" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:698,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W_2m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png 424w, https://substackcdn.com/image/fetch/$s_!W_2m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png 848w, https://substackcdn.com/image/fetch/$s_!W_2m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png 1272w, https://substackcdn.com/image/fetch/$s_!W_2m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3238d569-08ad-48d9-9a8f-d55979ed74cd_1000x698.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But once that action is done, Spark throws away the intermediate results. <em>Poof</em> &#8212; gone from memory. If you later run another action on the same logic &#8212; say, you do <code>.count()</code> a filtered DataFrame and then <code>.show()</code> it &#8212; Spark will re-run the entire DAG from the original source. That might mean fetching a terabyte of data from S3, reading from Kafka, or doing expensive decompression on disk. None of this is free, and if you repeat that logic in multiple places &#8212; like in two different branches of your pipeline &#8212; you&#8217;re basically signing up to recompute everything from scratch multiple times, burning time and cluster resources.</p><p>This is where caching enters the picture. Or rather &#8212; <em>should</em>. But as we&#8217;ll see next, it&#8217;s not always as straightforward as calling <code>.cache()</code> and calling it a day.</p><h2>Caching in Spark</h2><div class="pullquote"><p><strong>Let&#8217;s get one thing out of the way first: Spark will recompute things as many times as you ask it to, unless you explicitly tell it not to.</strong></p></div><p>This might sound obvious, but it&#8217;s a common source of confusion for folks expecting Spark to &#8220;remember&#8221; results just because the logic hasn&#8217;t changed. It will not. Every time you call an action on an uncached DataFrame or RDD, Spark starts from the source and rebuilds all intermediate steps &#8212; even if you already did the exact same thing 5 seconds ago (according to the 5-second rule).</p><p>To prevent this waste, Spark offers two tools: <code>.cache()</code> and <code>.persist()</code>. Both let you tell Spark that you are probably going to need this data again in the future. The difference is in how and where the data gets stored.</p><p><code>.cache()</code> is the friendly one-liner. Internally, it&#8217;s just shorthand for <code>.persist(StorageLevel.MEMORY_AND_DISK)</code>, which means Spark will try to keep the data in RAM. But if that fills up, it&#8217;ll spill evicted blocks to disk. You get some performance benefits without risking immediate OutOfMemory errors.</p><p>If you need more control, you use <code>.persist()</code> and pick a <code>StorageLevel</code>. There&#8217;s a whole zoo of these, and they&#8217;re worth understanding &#8212; especially when you&#8217;re tuning long-running pipelines or squeezing performance out of memory-starved clusters.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vt0B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vt0B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vt0B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vt0B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vt0B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vt0B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg" width="1456" height="546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:335551,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/183597683?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!vt0B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vt0B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vt0B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vt0B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69cd8fe-7a45-4926-900e-c9c45463d9c9_2388x895.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At the simple end, you&#8217;ve got <code>MEMORY_ONLY</code>, which stores raw, unserialized objects in RAM &#8212; fast, but risky. If memory runs out, Spark just evicts blocks, and you&#8217;re on your own. Then there&#8217;s <code>MEMORY_AND_DISK</code>, which gives you a safety net by backing up evicted blocks to disk. You can also go with the serialized variants (<code>_SER</code>), which store data as byte arrays instead of full JVM objects &#8212; reducing memory usage and garbage collection pressure, but increasing CPU cost due to (de)serialization.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mG3a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mG3a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mG3a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mG3a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mG3a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mG3a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg" width="1456" height="1052" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1052,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:328114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/183597683?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mG3a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mG3a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mG3a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mG3a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1b9e6c9-3b6a-4b0d-a4af-c2231dafc344_2064x1491.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>&#128161; You can also add a <code>_2</code> suffix to any storage level (like <code>MEMORY_AND_DISK_2</code>) to enable replication &#8212; useful for fault-tolerant workloads where you&#8217;d rather not recompute partitions from scratch if a node goes down.</p></blockquote><p>For cold-storage-style caching, there&#8217;s <code>DISK_ONLY</code>, which doesn&#8217;t even try to use memory. And if you&#8217;re dealing with native code or need to avoid the JVM heap entirely, there&#8217;s <code>OFF_HEAP</code>, which pushes data out of the garbage collector&#8217;s reach altogether.</p><blockquote><p>&#128161;DISK_ONLY and OFF_HEAP always serialize, whether you like it or not.</p></blockquote><p>Now here&#8217;s what this looks like in practice:</p><pre><code>df = spark.read.parquet(data_path)
df.cache()  # marks for caching (lazy)
df.count()  # materializes the cache
df.is_cached  # True
df.count()  # uses cached data
df.storageLevel  # StorageLevel(True, True, False, True, 1)
df.unpersist()  # manually clears it</code></pre><p><code>.cache()</code> doesn&#8217;t do anything until you trigger an action. It just marks the plan as cacheable. Only when Spark actually executes a full pass over the data &#8212; for example, during a <code>.count()</code> action &#8212; will it start caching partitions as they&#8217;re computed.</p><blockquote><p>&#128161; Once you persist an dataframe with a given storage level, you can&#8217;t change it. You&#8217;d have to unpersist and re-persist it.</p></blockquote><p>You can do the same with temporary views, which is sometimes cleaner when working with Spark SQL-style APIs:</p><pre><code>df.createOrReplaceTempView("df")

spark.catalog.cacheTable("df")
spark.catalog.isCached("df")  # True
spark.catalog.uncacheTable("df")
spark.catalog.clearCache()</code></pre><h4>So, does caching make things faster?</h4><p>The answer, of course, is: <em>sometimes</em>.</p><p>When used correctly, caching can be a massive win &#8212; especially in iterative jobs (like ML workflows), interactive analysis, or when you&#8217;re branching logic and reusing intermediate results. But misuse it &#8212; say, by caching everything just because it &#8220;seems safe&#8221; &#8212; and you&#8217;ll tank memory, increase garbage collector (GC) time, and possibly make things worse.</p><p>The trick isn&#8217;t just to use caching. It&#8217;s to know when it helps and what is actually being cached. Spark will happily let you believe something is cached, and then silently recompute everything anyway if you break its assumptions. We&#8217;ll get to that next.</p><h3>Caching Behavior</h3><p>So you&#8217;ve called <code>.cache()</code>, maybe even <code>.persist()</code>, and you&#8217;re expecting Spark to behave like a good little compute engine and reuse your precious data instead of recomputing it. Seems simple, right? Just stash the data and read it back later.</p><p>Well&#8230; not quite.</p><p>Let&#8217;s talk about what actually happens under the hood &#8212; starting with the thing that controls all of this: <a href="https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala">CacheManager</a>.</p><p>When you mark a DataFrame to be cached, what Spark really does is modify the logical plan of your query by wrapping it in a special logical operator called InMemoryRelation. This isn&#8217;t some background process &#8212; it&#8217;s part of Spark&#8217;s planning pipeline. Specifically, this wrapping happens <em>after</em> the analyzer phase, but <em>before</em> optimization. In other words, the cache is tied to the <strong>analyzed</strong> <strong>logical plan</strong>, not the final optimized one. And this is where things get tricky.</p><p>Let&#8217;s walk through a real example:</p><pre><code>df = spark.read.parquet(path)
df.select("col1", "col2").filter("col2" &gt; 0).cache()
df.filter("col2" &gt; 0).select("col1", "col2")</code></pre><p>Now, if you&#8217;ve worked with Spark optimizations before, you&#8217;re probably thinking: <em>&#8220;Hey, those two queries are semantically the same. The optimizer will push down the filter either way. The final execution plan should be identical&#8221;.</em></p><p>And you&#8217;d be right &#8212; but that doesn&#8217;t matter.</p><p>Because again: caching happens <em>before</em> optimization. The second query has a different analyzed plan than the first, even if the optimizer will eventually produce the same physical plan. Spark will look at the analyzed plan of your current query, compare it to the analyzed plan that was cached, and if they don&#8217;t match exactly &#8212; it will skip the cache and recompute everything.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ICGD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ICGD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ICGD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ICGD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ICGD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ICGD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg" width="1456" height="418" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:418,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:211152,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/183597683?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ICGD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ICGD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ICGD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ICGD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14d9e24d-6b4f-46a4-a29c-0cdf51db4353_2394x687.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is why you can &#8220;cache something&#8221; and then later run what looks like the same query &#8212; and Spark still redoes the work. And unless you know this, it&#8217;s incredibly easy to shoot yourself in the foot (or other sensitive areas).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!19oc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!19oc!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif 424w, https://substackcdn.com/image/fetch/$s_!19oc!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif 848w, https://substackcdn.com/image/fetch/$s_!19oc!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif 1272w, https://substackcdn.com/image/fetch/$s_!19oc!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!19oc!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif" width="292" height="365" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:384,&quot;resizeWidth&quot;:292,&quot;bytes&quot;:6128086,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!19oc!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif 424w, https://substackcdn.com/image/fetch/$s_!19oc!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif 848w, https://substackcdn.com/image/fetch/$s_!19oc!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif 1272w, https://substackcdn.com/image/fetch/$s_!19oc!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a257786-8194-4972-b734-ad1b5333e8c5_384x480.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Block-Level Reality</h3><p>So far, we&#8217;ve been talking about caching at the level of DataFrames, queries, logical plans &#8212; the kind of stuff you see in code. But underneath all that, Spark doesn&#8217;t think in terms of &#8220;DataFrames&#8221;. It thinks in blocks. And understanding how block-level caching works will save you from some very expensive surprises.</p><p>Every RDD (and by extension, every DataFrame) is made up of partitions. Each partition maps to a <strong>block</strong>, and these blocks are the actual units of caching in Spark.</p><p>Let&#8217;s say you run a <code>.cache()</code> followed by a <code>.count()</code>. Spark will compute every partition, and as each partition is computed on a given executor, the resulting block is stored by that executor&#8217;s local <code>BlockManager</code>. That <code>BlockManager</code> decides where and how to store the block &#8212; in memory, on disk, serialized, off-heap &#8212; depending on the <code>StorageLevel</code> you set.</p><p>But the important part is this: <strong>caching happens locally and incrementally.</strong></p><p>There&#8217;s no centralized controller orchestrating &#8220;cache this entire dataset&#8221;. Each executor only caches the blocks <em>it computes</em>. And it only caches them <em>when they&#8217;re computed</em>. No data is prefetched or broadcasted unless you explicitly tell Spark to do so.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ynvs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ynvs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ynvs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ynvs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ynvs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ynvs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:595204,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/183597683?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ynvs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ynvs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ynvs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ynvs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b8eef9-d7da-4657-a7de-e6b4625ea476_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This has two major implications:</p><p>First, if a block is computed on Executor A, then cached, and Executor A goes down before you reuse that block &#8212; Spark will have to recompute it. Unless you used a replicated storage level like <code>MEMORY_AND_DISK_2</code>, the data&#8217;s just gone. This is why block replication exists &#8212; not for performance, but for fault tolerance.</p><p>Second, the only blocks that get cached are the ones Spark was forced to touch. If your action only scans a subset of the dataset &#8212; think <code>.take(10)</code>, <code>.limit(100)</code>, or any filtered scan &#8212; then only those blocks will be materialized and cached. The rest? Still lazily waiting. So unless you run an action that touches every partition (like <code>.count()</code> or a full write), you&#8217;re working with a partial cache at best.</p><p>Even worse, Spark won&#8217;t warn you about any of this. There&#8217;s no built-in visibility into how much of your DataFrame is actually cached, or how much has already been evicted. You&#8217;re flying blind unless you hook into metrics yourself or use third-party observability tools.</p><blockquote><p>&#128161; This is why experienced Spark users follow <code>.cache()</code> with a full action like <code>.count()</code> &#8212; not because they&#8217;re curious about how many rows there are, but because they want to <em>force Spark to scan all partitions</em> and trigger caching across the whole dataset.</p></blockquote><h2>Eviction</h2><p>Now, let&#8217;s talk memory. The <code>BlockManager</code> doesn&#8217;t get to use the whole heap. Spark allocates a chunk of the JVM memory for execution and storage via spark.memory.fraction, and then further splits that chunk between caching and temporary execution data using <code>spark.memory.storageFraction</code>. We talked about that in the previous post so check it out.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;15ffe0b3-8acf-40b7-b9e7-c1228da08cf7&quot;,&quot;caption&quot;:&quot;Apache Spark is a distributed computing engine. Its main feature is the ability to perform computations in memory.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Deep Dive into Spark Memory Management&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:29227863,&quot;name&quot;:&quot;luminousmen&quot;,&quot;bio&quot;:&quot;helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering, Machine Learning\n\n\n\n\n\n&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffead33a9-5e35-4522-b96e-c1a523419524_300x297.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-12-23T14:02:47.135Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!J1O0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://luminousmen.substack.com/p/deep-dive-into-spark-memory-management&quot;,&quot;section_name&quot;:&quot;Spark Under the Hood&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:182333466,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:41,&quot;comment_count&quot;:0,&quot;publication_id&quot;:1936637,&quot;publication_name&quot;:&quot;Blog | luminousmen&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!JtUF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b28cd70-157c-4b06-872e-a38fe5155009_297x297.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>And that means one thing: you&#8217;re always fighting for memory.</p><p>That means your cache competes directly with joins, aggregations, and shuffles for memory. And Spark will steal from the cache if an active task needs more execution memory. Spark will happily start evicting cached blocks to make room. Your nicely cached block you spent 5 minutes reading that data from S3? Gone. Evicted mid-job. No warning.</p><p>Spark uses a Least Recently Used (LRU) eviction strategy. If memory gets tight, it starts dumping the oldest or least-used blocks to make room. If you&#8217;re using a disk-backed storage level (like <code>MEMORY_AND_DISK</code>), Spark will spill evicted blocks to disk. If you&#8217;re not &#8212; say, you used <code>MEMORY_ONLY</code> &#8212; those blocks are just gone. And Spark will recompute them later if needed.</p><p>To make this worse, Spark doesn&#8217;t wait until your next action to evict things &#8212; blocks can be kicked out of memory before you ever reuse them, especially on busy clusters or long-running jobs. So just because you called <code>.cache()</code> doesn&#8217;t mean your data is still around.</p><p>In the best-case scenario, if you&#8217;re using something like <code>MEMORY_AND_DISK_SER</code>, Spark will spill those evicted blocks to disk, and pull them back in later. You&#8217;ll pay the disk I/O cost, but avoid recomputing from source.</p><p>In the worst case &#8212; say, you used <code>MEMORY_ONLY</code> &#8212; those blocks are gone for good. The next time you need them, Spark has to go all the way back to the source and rebuild the full DAG.</p><p>Here&#8217;s the kind of warning you might see when that happens:</p><pre><code>WARN org.apache.spark.sql.execution.datasources.SharedInMemoryCache: 
Evicting cached table partition metadata from memory due to size constraints 
(spark.sql.hive.filesourcePartitionFileCacheSize = 262144000 bytes).</code></pre><p>Translation: something else needed the memory more than you did, and your cached data got sacrificed.</p><p>So if you&#8217;re running a big job on a small cluster, or working with long pipelines where memory pressure fluctuates &#8212; don&#8217;t assume your cache is still intact. Spark gives you no guarantees. It&#8217;s a best-effort system. Useful? Absolutely. Predictable? Only if you know what&#8217;s happening behind the curtain.</p><h2>To Wrap It Up</h2><p>Caching is one of those Spark features that everyone uses, few people understand, and most overuse. It&#8217;s easy to reach for it &#8212; because at a glance, it feels like a no-brainer. Why <em>wouldn&#8217;t</em> you want to avoid recomputation? Why <em>wouldn&#8217;t</em> you want your pipeline to go faster?</p><p>But caching isn&#8217;t about calling <code>.cache()</code> and moving on. It&#8217;s about systems thinking &#8212; knowing what actually happens under the hood when you ask a cluster to keep data around for you. You have to know how it works. You have to test under pressure. You have to assume the cache will disappear at the worst possible time and make sure your system handles that gracefully. If caching is part of your performance strategy, make sure it&#8217;s backed by either disk or replication &#8212; and design your jobs assuming eviction <em>will</em> happen. Because sooner or later, it will.</p><h4>Additional materials</h4><ul><li><p><a href="https://amzn.to/4qh2QzT">Spark: The Definitive Guide by Bill Chambers, Matei Zaharia</a></p></li><li><p><a href="https://amzn.to/4ayam3F">Learning Spark by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Deep Dive into Spark Memory Management]]></title><description><![CDATA[The real reason your Spark cluster is burning money]]></description><link>https://luminousmen.substack.com/p/deep-dive-into-spark-memory-management</link><guid isPermaLink="false">https://luminousmen.substack.com/p/deep-dive-into-spark-memory-management</guid><dc:creator><![CDATA[luminousmen]]></dc:creator><pubDate>Tue, 23 Dec 2025 14:02:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!J1O0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J1O0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J1O0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!J1O0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!J1O0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!J1O0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J1O0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png" width="800" height="405" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f8265cc-822b-498b-ac77-948e175bc085_800x405.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:405,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32378,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J1O0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!J1O0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!J1O0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!J1O0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f8265cc-822b-498b-ac77-948e175bc085_800x405.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Apache Spark is a distributed computing engine. Its main feature is the ability to perform computations in memory.</p><p>That&#8217;s what the brochure says.</p><p>And while that&#8217;s technically true &#8212; Spark can crunch terabytes in-memory across a cluster of machines &#8212; what that brochure is not saying is that the exact same job that ran fine yesterday can blow up with an <code>OutOfMemoryError</code> today, and you won&#8217;t even know what changed.</p><p>Welcome to the wild west of memory management in Spark!</p><p>The concept of memory management is quite complex. Python has its arenas and pools, C has malloc/free hell, and Java has garbage collectors and metaphysical tuning flags. Spark adds another abstraction layer on top of all that, and most engineers &#8212; myself included (back when I was young and hopeful) &#8212; treat it like magic and hope for the best.</p><p>For better use of Spark and to achieve high performance with it, however, a deep understanding of its memory management model is important.</p><p>Hell yeah, we&#8217;re diving into Apache Spark memory management!</p><h2>High-level Overview</h2><p>The memory management system in Spark is built on layers of abstraction stacked on top of each other like a bad lasagna.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bLR_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bLR_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp 424w, https://substackcdn.com/image/fetch/$s_!bLR_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp 848w, https://substackcdn.com/image/fetch/$s_!bLR_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp 1272w, https://substackcdn.com/image/fetch/$s_!bLR_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bLR_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp" width="480" height="270" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:270,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:407520,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bLR_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp 424w, https://substackcdn.com/image/fetch/$s_!bLR_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp 848w, https://substackcdn.com/image/fetch/$s_!bLR_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp 1272w, https://substackcdn.com/image/fetch/$s_!bLR_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa880b0-40bd-42cb-a470-0dc75b796c3f_480x270.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here&#8217;s the rough breakdown of Spark&#8217;s memory management layers that you should keep in the back of your head:</p><ul><li><p>OS level &#8212; your Linux kernel doesn&#8217;t care about Spark. It just sees processes with memory requests.</p></li><li><p>JVM level &#8212; Java heap, off-heap, metaspace, GC overhead&#8230; the classics.</p></li><li><p>Cluster manager level &#8212; YARN, Kubernetes, or Mesos enforces resource limits.</p></li><li><p>Spark application level &#8212; where things get weird.</p></li></ul><p>We&#8217;re not going to deep dive into all of them, but you need to know that these layers exist because Spark memory errors often come from the boundaries between them, not just from the application layer</p><blockquote><p>&#128161; For example: you can allocate a giant heap in Spark, and still get killed by YARN for exceeding container memory.</p></blockquote><p>Let&#8217;s rewind for a second. What actually happens when you launch a Spark app?</p><p>When you submit a Spark job, the cluster spins up two kinds of processes  &#8212; <a href="https://luminousmen.com/post/spark-anatomy-of-spark-application">Drivers and Executors</a>:</p><p>The <strong>Driver</strong> is the control tower. It creates the <code>SparkContext</code>, parses your code, splits it into stages, schedules tasks, tracks metrics, fails things, retries things &#8230; you get the idea. The Driver is just a regular JVM process (usually with way too little memory), and its memory model isn&#8217;t special &#8212; so we&#8217;ll mostly ignore it here.</p><p><strong>Executors</strong> are the actual workers. They run your tasks, spill your data, and eat your memory. Each executor runs as a JVM inside a container (YARN, Kubernetes, etc), and this JVM is where Spark does both storage and compute &#8212; shuffles, caches, broadcasts, UDFs, Arrow memory, native code, GC, off-heap buffers&#8230; all fighting for the same scraps of RAM.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1SGb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1SGb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1SGb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1SGb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1SGb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1SGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg" width="1456" height="1091" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1091,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:417476,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!1SGb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1SGb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1SGb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1SGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bc60306-9921-4096-a3d6-bcaadfec87b2_2732x2048.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Spark tries to manage all of it with some hardcoded fractions and a few memory pools. It&#8217;s amazing that everything works at all despite that. Let&#8217;s go deeper into the Executor memory.</p><h2>Executor Container</h2><p>When you submit a Spark job in a cluster &#8212; whether it&#8217;s YARN, Kubernetes, or something weird and/or custom &#8212; Spark spins up Executor containers on worker nodes. These are the JVMs that will actually do the work: running your tasks, spilling data, caching results, and shuffling data back and forth across the network.</p><p>Now, let&#8217;s talk memory. Who&#8217;s in charge of giving it out?</p><p>Well, that depends on the <a href="https://luminousmen.com/post/hadoop-yarn-spark/">cluster manager</a>.</p><h3>YARN</h3><p><a href="https://luminousmen.com/post/hadoop-yarn-spark/#deep-dive-into-yarn">YARN</a> has two components that control resource allocation:</p><p>ResourceManager &#8212; the cluster-level boss, it decides how much memory you&#8217;re allowed to ask for based on <code>yarn.scheduler.maximum-allocation-mb</code>. If you ask for more than that, you get nothing or worse, a silent cap and unexpected crashes.</p><p>On a single node, it is done by NodeManager. It manages memory allocation on each individual node, based on <code>yarn.nodemanager.resource.memory-mb</code>, which sets the upper physical memory per node.</p><h3>Kubernetes</h3><p>On <a href="https://luminousmen.com/post/kubernetes-101">Kubernetes</a>, Spark uses the pod model:</p><ul><li><p>Each executor is a pod.</p></li><li><p>Pod memory is controlled via resource requests/limits, which map to container-level cgroups.</p></li><li><p>If your process (heap + off-heap + Python + native) exceeds the pod limit, Kubernetes won&#8217;t hesitate &#8212; <code>OOMKilled</code>.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://luminousmen.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Blog | luminousmen is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>So, when Spark talks about an Executor, it&#8217;s really talking about:</p><p><strong>A single JVM process running inside a container (YARN or Kubernetes pod), allocated a fixed amount of memory.</strong></p><p>That container is carved up into three main memory areas:</p><ul><li><p>Heap memory &#8212; the JVM heap, controlled via <code>--executor-memory</code>.</p></li><li><p>Off-heap memory &#8212; for native buffers like Arrow or Tungsten, outside the JVM.</p></li><li><p>Memory overhead &#8212; for JVM internals, Python workers, and native memory.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jB0C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jB0C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jB0C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jB0C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jB0C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jB0C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg" width="1456" height="1091" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1091,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:516884,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!jB0C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jB0C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jB0C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jB0C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2a469f8-6d8f-4623-804c-7295d5f1b15a_2732x2048.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;ll walk through each of these in the following sections.</p><h2>Dive Into The Heap</h2><p>This is the big one &#8212; the memory most people think of. This is the memory size specified by <code>--executor-memory</code> during submission of the Spark application or by setting <code>spark.executor.memory</code>.</p><p>Internally, this becomes your JVM heap size (<code>-Xmx</code>). Spark uses this memory to hold most of your cached data, shuffle buffers, task results, UDF outputs, etc and yes, it&#8217;s subject to garbage collector (GC), which means you&#8217;ll spend a lot of time debugging mysterious GC pauses if you&#8217;re not careful.</p><p>Spoiler: it&#8217;s <em>not</em> all available for your code. Spark slices and dices it further.</p><p>To understand this madness, we&#8217;ll need to look into Spark&#8217;s <code>MemoryManager</code> (source:<a href="https://github.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark/memory"> org/apache/spark/memory</a>). That&#8217;s the interface Spark uses to track and control heap usage.</p><p><code>UnifiedMemoryManager</code> is the default <code>MemoryManager</code> in Spark since 1.6 (<code>StaticMemoryManager</code> has been deleted in 3.0). <code>UnifiedMemoryManager</code> manages the executor heap in three main regions, plus a hard-reserved chunk that Spark needs for internal bookkeeping.</p><p>Let&#8217;s break this down.</p><h3>Reserved Memory</h3><p>Before Spark gives your code anything, it quietly grabs 300MB from the executor heap just for itself &#8212; logs, metrics, internal structures, and whatever else it thinks is important.</p><p>This is hardcoded (<code>RESERVED_SYSTEM_MEMORY_BYTES</code> in the source) and not configurable in production. You can only tweak it in tests via <code>spark.testing.reservedMemory</code>, but that&#8217;s irrelevant for real workloads.</p><pre><code>Usable memory = Executor memory - 300MB</code></pre><p>Everything else &#8212; execution, storage, user memory &#8212; comes out of this usable heap.</p><h3>User Memory</h3><p>This is the wild west of Spark memory. No tracking, no boundaries, no safety rails. Spark calls it, &#8220;User Memory&#8221;, but it&#8217;s really just a chunk of heap left over for:</p><ul><li><p>UDFs doing who-knows-what</p></li><li><p>Python wrappers and runtime glue code</p></li><li><p>Native ML libs like XGBoost allocating their own buffers</p></li><li><p>Arrow buffers (which Spark ignores, but GC doesn&#8217;t)</p></li></ul><p>It calculated as:</p><pre><code>User memory = Usable memory * (1 - spark.memory.fraction)</code></pre><p>By default, <code>spark.memory.fraction = 0.6</code> &#8212; so User Memory = 40%</p><p>Spark does not manage this memory. It doesn&#8217;t track what you put here. If you go overboard here, you&#8217;ll get GC pressure or random OOMs. Spark won&#8217;t even warn you, it just dies.</p><h3>Execution &amp; Storage Memory</h3><p>After Spark subtracts Reserved and User Memory, the remaining 60% of heap becomes a shared pool for:</p><ul><li><p><strong>Execution Memory</strong> &#8212; used for shuffle, joins, aggregations, sorts.</p></li><li><p><strong>Storage Memory</strong> &#8212; used for caching, broadcasts, and unrolled RDDs.</p></li></ul><p>They&#8217;re split like this:</p><pre><code>Execution memory =
  Usable memory
  * spark.memory.fraction
  * (1 - spark.memory.storageFraction)

Storage memory =
  Usable memory
  * spark.memory.fraction
  * spark.memory.storageFraction</code></pre><p>By default:</p><ul><li><p><code>spark.memory.fraction = 0.6</code></p></li><li><p><code>spark.memory.storageFraction = 0.5</code></p></li></ul><p>Which gives you:</p><ul><li><p>Execution = 30% of total heap</p></li><li><p>Storage = 30% of total heap</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vjTY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vjTY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vjTY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vjTY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vjTY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vjTY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg" width="1456" height="1091" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1091,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:457291,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!vjTY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vjTY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vjTY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vjTY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad474e17-4a86-4062-a09f-4116996203b7_2732x2048.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Execution and Storage live in a shared memory pool. They can borrow memory from each other. This is what Spark calls -</p><h3>Dynamic Occupancy Mechanism</h3><p>With <code>UnifiedMemoryManager</code>, execution and storage don&#8217;t have a hard wall between them. Their sizes are elastic, but not equal.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uIwE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uIwE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uIwE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uIwE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uIwE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uIwE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg" width="1456" height="1091" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1091,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:280671,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!uIwE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uIwE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uIwE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uIwE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa210d5ad-b22f-4413-9cad-3aa87b1b3326_2732x2048.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The creators of this mechanism decided that Execution memory always has priority over Storage memory. They had reasons to do so &#8212; the execution of the task is generally more important than the cached data, and the whole job can crash if there is an OOM in the execution.</p><p>So the rules are simple:</p><ul><li><p>If Execution needs memory, it takes it.</p></li><li><p>If Storage is using that space (cached RDDs, broadcasts), Spark starts evicting blocks.</p></li><li><p>If Execution is idle, Storage can grow into that space &#8212; until Execution comes back.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gv0Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gv0Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Gv0Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Gv0Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Gv0Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gv0Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg" width="1456" height="1091" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1091,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:305153,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Gv0Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Gv0Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Gv0Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Gv0Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa41e2a6b-ba41-43a4-92d6-0459a613c8c5_2732x2048.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Internally, Spark tracks the maximum Storage usage with a threshold called <code>onHeapStorageRegionSize</code>. By default, it&#8217;s equal to the full Storage region. The moment Execution asks for memory though, Spark shrinks that region and pushes Storage out of the way.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qmmr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qmmr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!qmmr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!qmmr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!qmmr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qmmr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg" width="1456" height="1091" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1091,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:302894,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!qmmr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!qmmr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!qmmr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!qmmr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367cf3c5-4c99-4501-b5de-3e498071638c_2732x2048.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> This leads to an important consequence. <em>Your cache is not safe.</em></p><p>If you persist something with <code>.persist(StorageLevel.MEMORY_ONLY)</code> and later run a <a href="https://luminousmen.com/post/spark-core-concepts-explained/#narrow-vs-wide-transformations">wide transformation</a> &#8212; shuffle, join, aggregation &#8212; Spark may silently evict cached blocks to feed Execution memory. And yes, you&#8217;ll pay the cost of recomputation later. We will talk about that in detail in the <a href="https://luminousmen.com/post/explaining-the-mechanics-of-spark-caching/), stay tuned">next blog post</a>.</p><blockquote><p>&#128161; AQE shifts the failure mode. Adaptive Query Execution has been on by default since Spark 3.2 <code>spark.sql.adaptive.enabled=true</code>). It coalesces small post-shuffle partitions and splits skewed ones at runtime &#8212; so the classic &#8220;one task OOMs on a billion-row key&#8221; used to be the failure mode, and now it&#8217;s &#8220;AQE flips a sort-merge into a broadcast join you didn&#8217;t expect, and Storage memory eats it&#8221;. If you tuned <code>spark.memory.storageFraction</code> upward in Spark 2.x to give caching more room, revisit it. AQE may have shifted the balance under your feet. Partition-side details in <a href="https://luminousmen.com/post/spark-tips-partition-tuning/">Spark Tips: Partition Tuning</a>.</p></blockquote><blockquote><p>&#128161; Everything in Spark is physically grouped and stored as <strong>blocks</strong> &#8212; serialized chunks of data. These can live:</p><ul><li><p>in memory (on-heap or off-heap),</p></li><li><p>on disk,</p></li><li><p>or on remote executors (for broadcast).</p></li></ul><p>Blocks are managed by <a href="https://luminousmen.substack.com/i/170748125/blockmanagermaster">BlockManager</a>, which acts as a distributed key-value store. BlockManager works as a local cache that runs on every node of the Spark application, i.e. driver and executors. It handles:</p><ul><li><p>Caching</p></li><li><p>Shuffle inputs/outputs</p></li><li><p>Broadcasts</p></li><li><p>Temporary files</p></li></ul><p>This block abstraction allows Spark to move, cache, and evict data efficiently. And more importantly: multiple executors can fetch the same block concurrently, boosting throughput.</p><p>But blocks aren&#8217;t small. And their memory footprint isn&#8217;t always predictable. So caching a large dataset means betting on your eviction policy not screwing you over.</p></blockquote><h2>Off-heap Memory</h2><p>Most Spark jobs live happily inside the JVM heap, running under the watchful eye of the GC. But Spark also has the ability to allocate memory off the heap, bypassing the JVM entirely.</p><p><em>Why would anyone want to do that?</em></p><p>Simple: <strong>GC can be the problem</strong>. Especially when you&#8217;re working with:</p><ul><li><p>Giant datasets</p></li><li><p>Millions of tiny objects</p></li><li><p>Low-latency jobs</p></li><li><p>Python UDFs backed by Arrow</p></li><li><p>Or you&#8217;re just tired of tuning G1GC with 18 flags</p></li></ul><p>GC is expensive. When you&#8217;re dealing with large datasets, the overhead of object tracking, allocation, and collection inside the JVM can kill performance. Spark&#8217;s answer is to use off-heap memory to store raw data buffers and skip the JVM&#8217;s babysitting.</p><p>Under the hood, Spark does this using <code>sun.misc.Unsafe</code>, a low-level Java API that lets you manually allocate and free memory &#8212; kind of like <code>malloc()</code> in C. It&#8217;s fast. It&#8217;s risky. It&#8217;s entirely unmanaged. Which is why it&#8217;s usually disabled unless you know what you&#8217;re doing. (On JDK 17+ this routes through <code>jdk.internal.misc.Unsafe</code>, which is why some Spark builds need <code>--add-opens</code> flags at startup).</p><p>To enable it, you&#8217;d set:</p><pre><code>spark.memory.offHeap.enabled=true
spark.memory.offHeap.size=1g</code></pre><p>Now your executors will be running with two memory pools: the traditional JVM heap, and a chunk of raw memory Spark can directly write into &#8212; no GC involved. But don&#8217;t get too excited: YARN (or Kubernetes) doesn&#8217;t care <em>where</em> you allocate memory. If your total memory usage (heap + off-heap + overhead) exceeds the container limit, the executor gets killed. Period.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tpqH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tpqH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tpqH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg" width="1456" height="1091" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1091,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:644656,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!tpqH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa92d4925-af83-4397-8140-9d0e836cec94_2732x2048.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But! Off-heap memory isn&#8217;t just some exotic optimization. It&#8217;s actually core to how modern Spark works, thanks to a project you might have heard of ...</p><h3>Tungsten</h3><p>Tungsten is Spark&#8217;s long-running effort to optimize physical execution with a heavy focus on memory. The idea is simple: Spark jobs shouldn&#8217;t pay the tax of Java objects. They&#8217;re heavy, they fragment the heap, and they make GC unpredictable. So Tungsten introduced:</p><ul><li><p>Custom encoders/decoders for rows (for tight binary layout)</p></li><li><p>Explicit memory management</p></li><li><p>CPU cache-aware execution</p></li><li><p>Vectorized processing</p></li><li><p>And yes &#8212; off-heap memory</p></li></ul><p>Now, when you turn on off-heap memory, you&#8217;re telling Tungsten to go full native. It will allocate all its memory pages directly off-heap. That gives you better GC performance and tighter control over memory &#8212; but it also means you&#8217;re now responsible for not blowing up your container limits. Spark won&#8217;t warn you and GC won&#8217;t save you. Dramatic pause.</p><blockquote><p>&#128161; Even when running in on-heap mode, Tungsten still tries to minimize object allocation. It uses byte arrays and memory pages to hold data in a compact format, allocating large chunks of memory only when needed. This is why Spark&#8217;s performance jumped massively around versions 1.6&#8211;2.0 &#8212; it stopped creating millions of tiny Java objects and started acting more like a proper database engine.</p></blockquote><blockquote><p>&#128161; If you&#8217;re on Databricks, Photon (their vectorized C++ engine) lives entirely off-heap &#8212; your <code>spark.executor.memory</code> mostly does bookkeeping while the real work happens in native land. The catch: Photon&#8217;s allocator sits in <code>memoryOverhead</code>, not heap. The default 10% overhead is almost never enough &#8212; Databricks K8s runtimes default <code>spark.executor.memoryOverheadFactor</code> to 0.40 for exactly this reason. Open-source equivalent: <a href="https://gluten.apache.org/">Apache Gluten</a> (Velox/ClickHouse backends). Same advice: bump overhead.</p></blockquote><p>Furthermore, container managers like YARN or Kubernetes don&#8217;t know how much off-heap memory you&#8217;re using. They monitor process-level memory (<a href="https://en.wikipedia.org/wiki/Resident_set_size">RSS</a>), not JVM heap. So if you go crazy with <code>spark.memory.offHeap.size</code> and forget to increase <code>spark.executor.memoryOverhead</code>, your job will be randomly killed with a vague container memory error.</p><p>Welcome to production.</p><blockquote><p>&#128161; Quick tip: If you&#8217;re using Arrow (e.g. in Pandas UDFs), you&#8217;re already in off-heap territory, even if you never explicitly enabled it. Arrow allocates native memory behind the scenes, and if your executor&#8217;s <code>memoryOverhead</code> is too low &#8212; &#128165;.</p></blockquote><p>Off-heap memory is conceptually simpler than on-heap: there&#8217;s no user memory, no GC, and Spark only divides it into Execution and Storage, just like it does for the heap. In <code>UnifiedMemoryManager</code>, the same borrowing rules apply &#8212; Execution has priority, Storage gets evicted.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Uz-r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uz-r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Uz-r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Uz-r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Uz-r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uz-r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg" width="1456" height="1091" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1091,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:421696,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Uz-r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Uz-r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Uz-r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Uz-r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ca5f94d-93e5-43a5-81e0-90404fbb0e37_2732x2048.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But don&#8217;t let that simplicity fool you. The second you turn it on, you&#8217;re on your own. No GC means no safety net. Which is great &#8212; until it&#8217;s not.</p><div class="pullquote"><p><strong>Hold on, hold on&#8230; you&#8217;re still here?</strong></p><p>So you think this was good?<br>Thank you &#8212; genuinely.</p><p>I&#8217;m a simple man, my favorite holiday is New Year. Since we&#8217;re around that time, here&#8217;s a small gift:</p><p>If you like this mix of deeply technical stuff, rants, and career advice for data engineers and want the paid posts too, there&#8217;s <strong>30% off a yearly plan</strong> right now: <strong><a href="https://luminousmen.substack.com/129bfd67">here</a></strong>.</p><p>I keep some work paid so I can go deeper instead of chasing clicks. As I said <a href="https://luminousmen.substack.com/i/171686028/whats-next">before</a>: gated knowledge is where we&#8217;re heading &#8212; I&#8217;m trying to keep the gate cheap and honest.</p><p>ho-ho-ho-ho &#127876;</p></div><h2>Overhead Memory</h2><p>Let&#8217;s talk about the part of memory no one thinks about &#8212; until their executors start randomly dying.</p><p>When you set <code>--executor-memory</code>, you&#8217;re only specifying the JVM heap size. But Spark executors also need memory for everything outside the heap &#8212; thread stacks, JIT buffers, metaspace, JNI, Python processes, native libs, and whatever else the JVM decides to hoard.</p><p>That&#8217;s what Overhead Memory is for.</p><p>By default, Spark sets this via <code>spark.executor.memoryOverhead</code>. If you don&#8217;t specify it, Spark will default to 10% of executor memory, or 384 MB, whichever is greater.</p><p>Since Spark 3.3, you can also use <code>spark.executor.memoryOverheadFactor</code> instead of an absolute size &#8212; <code>0.10</code> by default on YARN, <code>0.40</code> on Kubernetes. It scales with executor size, so set it once and forget it. The deprecated <code>spark.yarn.executor.memoryOverhead</code> was removed in Spark 3.0 &#8212; if you still have it lying around in old configs, delete it.</p><p>Now, you might be thinking: <em>&#8220;Fine, that&#8217;s just some buffer zone. I don&#8217;t need to touch it.&#8221; </em>Not true, lemme give you an example.</p><p>Let&#8217;s say you set <code>--executor-memory=8G</code>. Spark will default to 10% overhead, so:</p><pre><code>spark.executor.memoryOverhead = max(0.1 * 8192MB, 384MB) = 819MB</code></pre><p>Total memory requested from YARN or K8s:</p><pre><code>Executor memory + Memory overhead = 8192MB + 819MB = 9011MB</code></pre><p>Okay, not too bad.</p><p>But now, you enable off-heap memory:</p><pre><code>spark.memory.offHeap.enabled=true
spark.memory.offHeap.size=1g</code></pre><p>Guess what? That off-heap 1GB is not included in executor memory or overhead. So now your actual memory usage is:</p><pre><code>8192 MB (heap)
+ 819 MB (default overhead)
+ 1024 MB (off-heap)
= 10,035 MB</code></pre><p>But your container still only got 9011 MB, because Spark didn&#8217;t include off-heap in the overhead by default. It will be killed by YARN or OOMKilled by the K8s kubelet. Not a Spark error. No nice logs. Just dead.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZKRF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZKRF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp 424w, https://substackcdn.com/image/fetch/$s_!ZKRF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp 848w, https://substackcdn.com/image/fetch/$s_!ZKRF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZKRF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZKRF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp" width="434" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d58d8587-bfdf-49db-b231-dc6771189142_434x480.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:434,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1103662,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZKRF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp 424w, https://substackcdn.com/image/fetch/$s_!ZKRF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp 848w, https://substackcdn.com/image/fetch/$s_!ZKRF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZKRF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd58d8587-bfdf-49db-b231-dc6771189142_434x480.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>PySpark makes it worse</h3><p>And if you&#8217;re using PySpark, it&#8217;s even worse.</p><p>PySpark spins up a separate Python process per task, and it uses memory outside the JVM. If you don&#8217;t explicitly allocate space for it with <code>spark.executor.pyspark.memory</code>.</p><p>...then Spark will stuff it inside the overhead memory too. Which means now that 819MB has to cover:</p><ul><li><p>JVM internals (metaspace, thread stacks, JIT)</p></li><li><p>Off-heap allocations</p></li><li><p>Arrow buffers</p></li><li><p>Python workers</p></li></ul><p>That won&#8217;t end well.</p><p>So here&#8217;s how Spark actually<a href="https://github.com/apache/spark/blob/02c016fe4911a18b53212b25bc25f62dd5db3a06/core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala"> calculates</a> container memory:</p><pre><code>val totalMemMiB =
    executorMemoryMiB
  + memoryOverheadMiB
  + memoryOffHeapMiB
  + pysparkMemToUseMiB</code></pre><p>If you&#8217;re running on YARN, that&#8217;s the amount YARN allocates. If you&#8217;re on Kubernetes, this becomes your pod memory limit.</p><blockquote><p>&#128161; If your executors are randomly getting killed &#8212; especially with vague YARN &#8220;container killed&#8221; or K8s OOMKilled &#8212; don&#8217;t just throw more memory at it, check these first:</p><ul><li><p>Are you using PySpark, Pandas UDFs / Arrow, GPU libraries?</p></li><li><p>Did you enable off-heap memory?</p></li><li><p>Did you bump <code>spark.executor.memoryOverhead</code> accordingly?</p></li></ul><p>If the answer to any of these is &#8220;no&#8221; &#8212; that&#8217;s probably why you&#8217;re getting OOMed.</p><p>Real clusters often use 20&#8211;25% of executor memory as overhead for PySpark workloads. Monitor first, then adjust</p></blockquote><h2>TaskMemoryManager</h2><p>By now, we know that Spark&#8217;s <code>MemoryManager</code> slices executor memory into regions &#8212; execution, storage, user, and so on. But what happens when multiple tasks are running on the same executor and they all want a slice of execution memory?</p><p>They fight over it, and Spark plays referee using a component called <code>TaskMemoryManager</code>.</p><p>Tasks don&#8217;t talk to <code>MemoryManager</code> directly. Instead, each task works with its own <code>TaskMemoryManager</code> instance, which acts as a middleman for memory allocation &#8212; both for on-heap and off-heap memory. It keeps track of how much memory each task has used, how much it&#8217;s allowed to request, and whether it needs to wait or fail.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5oOT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5oOT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5oOT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5oOT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5oOT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5oOT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg" width="1456" height="1091" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1091,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:459921,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!5oOT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5oOT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5oOT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5oOT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55103ec-ea9a-4ad3-9667-6f2e08d0128b_2732x2048.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Why does this matter?</em></p><p>Because multiple tasks in the same executor <em>share</em> the execution memory pool. And there&#8217;s no hard isolation between them. That means:</p><ul><li><p>Task A can show up early and grab a big chunk of memory.</p></li><li><p>Task B shows up a few milliseconds later and finds the fridge empty.</p></li><li><p>Task B either blocks, spills to disk, or in worst cases, just dies.</p></li></ul><h3>So how does Spark try to make this fair?</h3><p>Spark tries to be &#8220;fair-ish&#8221; with memory by enforcing soft limits. The logic looks like this: if n tasks are running concurrently on an executor, each task is allowed to allocate between <code>1/2n</code> and <code>1/n</code> of the total execution memory.</p><p>That range is fuzzy by design. Spark starts by allowing each task up to <code>1/n</code> of the memory. But if multiple tasks start at different times, the first one gets a little more breathing room (up to <code>1/2n</code> for others). In practice, the first task often grabs more than its fair share, especially in uneven workloads or poorly partitioned data.</p><p>It&#8217;s not a strict cap either &#8212; Spark doesn&#8217;t stop a task immediately if it goes over. It just starts blocking other tasks, triggers spills, or (worst case) throws an OOM.</p><p><code>TaskMemoryManager</code> doesn&#8217;t prevent these issues &#8212; it just tries to contain the blast radius. And that&#8217;s where tuning comes in:</p><ul><li><p>You can reduce <code>spark.task.cpus</code> to run fewer concurrent tasks per executor.</p></li><li><p>Or allocate more <code>spark.executor.memory</code> to give the execution pool some breathing room.</p></li><li><p>Or go distributed and shrink partition sizes &#8212; fewer records per task = less memory pressure.</p></li></ul><p>But if you ignore this, your &#8220;optimized&#8221; job will randomly choke on 1 out of 200 stages &#8212; and then you&#8217;ll be back here, re-reading this post with tears in your eyes.</p><h2>Conclusion</h2><p>Today, with Spark 3.x and beyond, memory is more dynamic than ever &#8212; but that doesn&#8217;t mean it&#8217;s safer. Adaptive execution, columnar formats, off-heap buffers, native UDFs &#8212; all of it still shares the same physical RAM. If you don&#8217;t plan for that, you will get burned.</p><p>If you&#8217;re tired of manually tuning configs, tools like <a href="https://spark-configuration.luminousmen.com/">Spark Configuration Optimizer</a> can give you a decent starting point. But remember: no tool will save you if you don&#8217;t understand the system it&#8217;s tuning.</p><p>Spark is not magic. It just looks like it &#8212; right up until the OOM.</p><h3>Additional materials</h3><ul><li><p><a href="https://github.com/apache/spark">Apache Spark codebase</a></p></li><li><p><a href="https://spark.apache.org/docs/latest/">Apache Spark documentation</a></p></li><li><p><a href="https://amzn.to/4qcnf9r">Spark: The Definitive Guide by Bill Chambers, Matei Zaharia</a></p></li><li><p><a href="https://amzn.to/3KEbNVh">Learning Spark by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee</a></p></li><li><p><a href="https://amzn.to/4jfbnkT">High Performance Spark by Holden Karau, Adi Polak, Rachel Warren</a></p></li></ul><p>P.S. I think this also is a cool infographic to see the full picture of Spark memory management:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hlgo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hlgo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png 424w, https://substackcdn.com/image/fetch/$s_!hlgo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png 848w, https://substackcdn.com/image/fetch/$s_!hlgo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!hlgo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hlgo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png" width="430" height="537.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1280,&quot;width&quot;:1024,&quot;resizeWidth&quot;:430,&quot;bytes&quot;:1330766,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/182333466?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hlgo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png 424w, https://substackcdn.com/image/fetch/$s_!hlgo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png 848w, https://substackcdn.com/image/fetch/$s_!hlgo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!hlgo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F967fb6cf-5e29-4c07-a94b-090bbd234e02_1024x1280.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">By <a href="https://www.linkedin.com/in/albertcampillo/">Albert Campillo</a></figcaption></figure></div>]]></content:encoded></item><item><title><![CDATA[Spark Partitions]]></title><description><![CDATA[How partitioning shapes Spark performance, and what to do when it doesn&#8217;t]]></description><link>https://luminousmen.substack.com/p/spark-partitions</link><guid isPermaLink="false">https://luminousmen.substack.com/p/spark-partitions</guid><dc:creator><![CDATA[luminousmen]]></dc:creator><pubDate>Tue, 14 Oct 2025 13:03:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5Sj_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Sj_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Sj_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png" width="800" height="405" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:405,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Sj_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most engineers, even experienced ones, treat partitioning as a default &#8212; something that Spark, &#8220;just handles&#8221;. And when it breaks, it doesn&#8217;t throw a clean error. It just gets &#8230; slow. Or starts spilling. Or OOMs, or hits the network harder than it should. Or, it burns 5x the compute for the same workload.</p><p><a href="https://luminousmen.com/post/data-partitioning-slice-smart-sleep-better">Partitioning</a> isn&#8217;t just a Spark concept, it&#8217;s fundamental to data engineering in general. If you&#8217;re using Spark, Flink, Dask, or even writing your own distributed system &#8212; how you split the work <em>always</em> matters.</p><p>If you&#8217;re building data pipelines in Spark and you&#8217;re not controlling your partitions, you&#8217;re not in control of your performance. Spark is.</p><p>Let&#8217;s fix that.</p><h2>A Reminder on Spark and Partitions</h2><p>Let&#8217;s rewind for a second.</p><p>You may know this already, but Apache Spark is a distributed data processing engine. It lets you run computations across a cluster in parallel, splitting up work so you can chew through terabytes in minutes &#8212; not hours. But the real power here isn&#8217;t just Spark&#8217;s distributed magic, it&#8217;s how Spark structures distributed work.</p><p>When you launch a Spark job, you&#8217;re building a <a href="https://luminousmen.com/post/spark-core-concepts-explained#directed-acyclic-graph">DAG &#8212; a </a><strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#directed-acyclic-graph">D</a></strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#directed-acyclic-graph">irected </a><strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#directed-acyclic-graph">A</a></strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#directed-acyclic-graph">cyclic </a><strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#directed-acyclic-graph">G</a></strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#directed-acyclic-graph">raph</a> of transformations. Transformations such as <code>filter()</code>, <code>join()</code>, <code>groupBy()</code>, and <code>write()</code>. Spark compiles this <strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#logical-vs-physical-plans">logical</a></strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#logical-vs-physical-plans"> graph into a </a><strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#logical-vs-physical-plans">physical</a></strong><a href="https://luminousmen.com/post/spark-core-concepts-explained#logical-vs-physical-plans"> plan</a>. That plan gets broken down into <strong>stages</strong>, which are then sliced into <strong>tasks</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fs53!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fs53!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fs53!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fs53!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fs53!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fs53!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg" width="601" height="419.498" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:698,&quot;width&quot;:1000,&quot;resizeWidth&quot;:601,&quot;bytes&quot;:71233,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/175720553?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fs53!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fs53!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fs53!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fs53!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff50c5397-7d6c-41f6-8b7b-b888a1d1b1be_1000x698.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>No matter how complex your logic is, at the bottom of the stack, Spark is just executing <strong>tasks</strong>, and each task works on <strong>one partition</strong>. Period.</p><div class="pullquote"><p>Every partition = one task. One task = one thread.</p></div><p>Once your job starts running, each partition gets assigned to an <strong>executor</strong>. Executors run tasks one at a time. So if a task gets a huge partition, it&#8217;ll hog that executor for a while. If your partitions are too tiny, your cluster gets flooded with overhead. If they&#8217;re too big, a few slow tasks end up dragging down the whole job &#8212; causing memory pressure, long runtimes, or worse.</p><blockquote><p><strong>&#128161; </strong>Spark partitions have nothing to do with Hive partitions. Hive partitions are about how files are laid out on disk. Spark partitions are about how data is cut up <em>in memory</em> during execution. Same word, totally different game.</p></blockquote><p>That means your partition count directly controls how much parallelism Spark can squeeze out of your cluster.</p><p>Spark redefines partitions <em>multiple times</em> throughout the job. Read from files? That&#8217;s one way of partitioning. Shuffle after a <code>.groupBy()</code>? That&#8217;s another. </p><p>So let&#8217;s break it down.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://luminousmen.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Blog | luminousmen is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Input partitions</h2><p>As we already know, when you initially read data into Spark, the engine quietly breaks it up into partitions. A partition is a chunk of the dataset that Spark processes as a single unit of parallelism. Each task works on one partition, and together those partitions make up the full dataset. This is the bottom layer of Spark&#8217;s partitioning story.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VOch!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VOch!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VOch!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VOch!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VOch!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VOch!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:294469,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/175720553?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VOch!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VOch!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VOch!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VOch!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5dafcd3-8605-4ddb-8e63-58a1b0d8d6f5_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now, how Spark does this depends on a mix of file format and config values. And yeah, it&#8217;s a bit of a circus.</p><p>For structured formats like Parquet, ORC, or Avro, Spark actually reads the metadata &#8212; <a href="https://luminousmen.substack.com/p/why-parquet-is-the-go-to-format-for">footers, row groups</a>, that kind of thing &#8212; and tries to slice the file in a way that makes sense. Often you&#8217;ll see one partition per row group, though Spark may merge or split depending on file sizes and configs. Makes sense, right?</p><p>For plain-text formats like CSV, JSON, or raw text, Spark partitions data based on file size and the <code>spark.sql.files.maxPartitionBytes</code> setting (128MB by default). It won&#8217;t split a row in half, but it does chunk files using that size threshold. JSON can be tricky: if it&#8217;s multi-line JSON, each file becomes a single partition (since it&#8217;s not splittable). But line-delimited JSON is splittable.</p><p>If you&#8217;re dealing with storage backends like HDFS or S3, you get even more layers of complexity. Block size starts playing a role (<code>fs.blocksize</code>), which you can&#8217;t always control, especially if you&#8217;re on managed cloud infra. So now your partitions are dictated by whatever the storage layer felt like doing that day.</p><p>Let me throw a couple examples at you. Say you&#8217;re reading a single Parquet file &#8212; 30GB total, and it&#8217;s got 300 row groups. Spark will split that into exactly 300 partitions. Predictable.</p><p>Now imagine you&#8217;re reading 100 tiny JSON files, each around 2MB. Spark&#8217;s going to turn that into over a thousand partitions. Why? Because it won&#8217;t split those files &#8212; one file, one partition, every time. Congratulations, you just bought yourself some overhead.</p><p>And then there&#8217;s Kafka and Cassandra, which are different beasts entirely. Partitioning here follows their own logic: Kafka will give you one partition per topic-partition, Cassandra will do it based on token ranges. </p><blockquote><p><strong>&#128161; </strong>For example if you have, say topic: <code>user_events</code> with 12 Kafka partitions. Cool, now you connect Spark to this topic. You get... 12 Spark tasks, one per Kafka partition. Doesn&#8217;t matter if your Spark job has 100 executors or 500 cores available. You&#8217;re reading from 12 partitions? You&#8217;re doing 12 parallel reads.</p></blockquote><p>None of that matches how you probably want to process the data in Spark.</p><p>So, how Spark handles input partitioning depends on the data source. And if you don&#8217;t control this layer, you might end up with 3 partitions for your entire dataset &#8212; or 10,000. Either way, don&#8217;t expect efficient CPU usage.</p><h2>Shuffle partitions</h2><p>Now let&#8217;s talk about that <em>glorious</em> moment in Spark when everything slows down, fans spin up, and your cluster suddenly turns into a toaster. That moment is called <em>the shuffle</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!omI7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!omI7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!omI7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!omI7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!omI7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!omI7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:489168,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/175720553?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!omI7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!omI7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!omI7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!omI7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd75f09ab-9efd-4c35-9650-143144b5fe39_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Shuffles happen during wide transformations &#8212; things like <code>groupBy()</code>, <code>join()</code>, <code>distinct()</code>, or <code>orderBy()</code>. Basically, whenever Spark needs to move data across executors so that rows with the same key land on the same node. It can&#8217;t just power through &#8212; it has to reshuffle everything. That&#8217;s when new partitions are born.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zgiE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zgiE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png 424w, https://substackcdn.com/image/fetch/$s_!zgiE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png 848w, https://substackcdn.com/image/fetch/$s_!zgiE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png 1272w, https://substackcdn.com/image/fetch/$s_!zgiE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zgiE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png" width="366" height="490.46963562753035" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:662,&quot;width&quot;:494,&quot;resizeWidth&quot;:366,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Understanding Wide vs. Narrow Transformations in Apache Spark: Why It  Matters for Performance&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Understanding Wide vs. Narrow Transformations in Apache Spark: Why It  Matters for Performance" title="Understanding Wide vs. Narrow Transformations in Apache Spark: Why It  Matters for Performance" srcset="https://substackcdn.com/image/fetch/$s_!zgiE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png 424w, https://substackcdn.com/image/fetch/$s_!zgiE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png 848w, https://substackcdn.com/image/fetch/$s_!zgiE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png 1272w, https://substackcdn.com/image/fetch/$s_!zgiE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe835cc13-25f7-4b54-b903-469c0e3549d1_494x662.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And guess what the default is? 200 shuffle partitions.</p><p>Why? Because someone, somewhere at Spark HQ, once set the default:</p><pre><code>spark.sql.shuffle.partitions = 200</code></pre><p>It doesn&#8217;t matter if you&#8217;re dealing with 20MB or 500GB &#8212; Spark will happily slice your post-shuffle data into exactly 200 chunks unless you tell it otherwise. Hilarious.</p><p>If your data is tiny, say a few megabytes, then those 200 partitions will each get like ten rows. Tasks become microscopic. Most of your CPUs will just be sitting there doing nothing, wasting cluster hours while you pretend it&#8217;s &#8220;distributed computing&#8221;.</p><p>On the other hand, when you&#8217;re working with hundreds of gigabytes, and still only have 200 shuffle partitions it causes fewer tasks at a time to be processed in executors, but it increases the load on each individual executor and often leads to memory errors. </p><p>Also, if you increase the size of the partition larger than the available memory in an executor, you <em>will</em> get disk spills. Spills are the slowest thing you can probably be able to do. Essentially, during disk spills Spark operations place part of the data into a disk if it does not fit in memory, allowing the Spark job to run well on any size of dataset. But, even though it won&#8217;t break your pipeline, it still makes it super inefficient because of the additional overhead of disk I/O and increased garbage collection.</p><blockquote><p>&#128161; Sure, AQE (Adaptive Query Execution) helps a bit. It can adjust the number of shuffle partitions <em>after</em> Spark starts executing &#8212; but it&#8217;s not magic. And it won&#8217;t save you if you never tuned your config in the first place. We will talk about that a bit later.</p></blockquote><h2>Output partitions</h2><p>So you&#8217;re done with all your joins and filters. Now, it&#8217;s time to finally write the results somewhere. But what happens when you write? Spark always writes <strong>one file per output partition</strong>. Period.</p><p>Got 200 partitions? You&#8217;re getting 200 files.</p><p>Got 3000? Congratulations, you just wrote 3000 files.</p><p>Even if half of those files contain like, three rows &#8212; Spark doesn&#8217;t care. It&#8217;ll dump them all anyway.</p><p>You think you just saved your dataset. In reality, you might have just killed your S3 bucket. Or your BigQuery load job. Or your downstream Spark job that now has to read ten thousand tiny files scattered like confetti across cloud storage.</p><p>So yeah &#8212; writing is not the end of the pipeline. It&#8217;s part of the pipeline.</p><h2>How to Actually Control Them</h2><p>Spark defaults to 200 shuffle partitions. Not because that&#8217;s a good number &#8212; but because it had to pick <em>something</em>. Because Spark has no clue what your data looks like. It doesn&#8217;t know that your <code>customer_id</code> is heavily skewed. It doesn&#8217;t know your files are a mess. It doesn&#8217;t know that your filters drop 95% of the rows, or that one of your columns is 99% null and turns into a bomb when you group by it.</p><p>All it sees is your code.</p><p>You write <code>groupBy(&#8221;country&#8221;)</code>, and Spark nods along: &#8220;HashPartitioner. 200 partitions. Carry on.&#8221; Except 90% of your data is <code>US</code>, and now one task is hauling a boulder while the rest twiddle their thumbs. That&#8217;s not parallelism. That&#8217;s just dumb.</p><p>And that&#8217;s why you have to take control.</p><h3>Shuffle partitions</h3><p><code>spark.sql.shuffle.partitions</code>, this little config controls how many partitions Spark creates during wide stages &#8212; <code>join()</code>, <code>groupBy()</code>, aggregations. If you don&#8217;t touch it, Spark sticks with 200. Doesn&#8217;t matter if you&#8217;re joining 5MB or 5TB.</p><p>You can crank it up or down with:</p><pre><code>spark.conf.set(&#8221;spark.sql.shuffle.partitions&#8221;, &#8220;600&#8221;)</code></pre><p>How do you pick the right number? There&#8217;s no magic formula, but if your tasks are processing 3GB each and spilling all over the place, you probably need <em>more</em> partitions. If they&#8217;re running for 2 seconds each and writing tiny files, probably <em>less</em>. Somewhere around 100&#8211;200MB per task tends to work well &#8212; but again, <strong>you know your data. Spark doesn&#8217;t.</strong></p><h3>Repartition</h3><p>This is Spark&#8217;s &#8220;start over&#8221; button. It forces a full-on shuffle and redistributes the data however you tell it to &#8212; by number of partitions, or by key. It&#8217;s a full-blown network operation, with serialization, temp files, and a whole lot of I/O. You&#8217;re throwing the data into the air and letting Spark reshuffle the deck.</p><p>It&#8217;s expensive. It&#8217;s also sometimes necessary.</p><p>For example, after a heavy filter.</p><p>Let&#8217;s say you start with a massive dataset &#8212; Spark gives you 2000 partitions. Then you apply a filter and keep 5% of the rows. Spark doesn&#8217;t care. It keeps the same 2000 partitions, now mostly empty. You end up with thousands of tiny tasks doing nothing, or thousands of micro-files dumped into S3.</p><p>Fix it:</p><pre><code>val cleaned = df.filter(...).repartition(100)</code></pre><p>Now you have real partitions again &#8212; enough to parallelize, not enough to waste resources.</p><p>Another popular example, when you&#8217;re sending data downstream &#8212; to a model trainer, a multi-node write, a batch load.</p><p>If your partitioning is uneven, downstream parallelism collapses. One node gets 10GB, others get peanuts. Spark can&#8217;t fix this for you. If you want balance, you have to force it:</p><pre><code>val prepared = df.repartition(200)</code></pre><p>In both cases, <code>repartition()</code> gives you control where Spark won&#8217;t. You&#8217;re paying for predictability &#8212; and that&#8217;s usually a fair trade.</p><h3>Coalesce</h3><p>At the other end of the scale is <code>coalesce()</code>.</p><p>Where <code>repartition()</code> is a full reshuffle, <code>coalesce()</code> is a quiet merge. It takes existing partitions and fuses them &#8212; no data movement, no shuffle. It&#8217;s almost always used at the end of a pipeline, right before writing to disk.</p><p>Say Spark left you with 3000 partitions. You write that to S3, and congrats &#8212; now you&#8217;ve got 3000 tiny files. Great if your goal is to make downstream analysts hate you.</p><p>So you do this:</p><pre><code>df.coalesce(100).write.parquet(&#8221;s3://bucket/output/&#8221;)</code></pre><p>Simple. Now you&#8217;ve got 100 reasonably sized files. S3 thanks you. Presto thanks you. Your cloud bill thanks you.</p><p>But don&#8217;t push it too far. People see <code>coalesce(1)</code> and think, &#8220;Nice &#8212; one output file&#8221;. Except now you&#8217;ve killed parallelism &#8212; one executor is doing all the work. The rest are idle.</p><p>Even worse, <code>coalesce()</code> doesn&#8217;t rebalance data. It just stacks partitions on top of each other. So if your original partitions were uneven, they still are &#8212; just lumped together.</p><p>There is a trick though:</p><pre><code>df.coalesce(50, shuffle=True)</code></pre><p>This acts more like a repartition, but <em>leaning</em> toward reduction. It&#8217;s not free &#8212; you pay for the shuffle cost &#8212; but you get better distribution and fewer partitions.</p><h3>Custom Partitioning</h3><p>Now, there&#8217;s also <code>partitionBy()</code> &#8212; but don&#8217;t confuse it with actual Spark partitions. This one is about <em>physical layout on disk</em>. It&#8217;s what decides how your data gets saved into folders.</p><p>If you write:</p><pre><code>df.write.partitionBy(&#8221;region&#8221;).parquet(...)</code></pre><p>Spark splits the output into folders like <code>region=US/</code>, <code>region=EU/</code>, and so on. But this has <em>nothing</em> to do with memory partitions or parallelism. If you want those files to be cleanly separated &#8212; one file per region &#8212; you still need to do:</p><pre><code>df.repartition(&#8221;region&#8221;).write.partitionBy(&#8221;region&#8221;).parquet(...)</code></pre><p>Otherwise, you&#8217;ll get a mess: dozens of tiny files per region, randomly scattered. Which defeats the whole point.</p><h2>What about Adaptive Query Execution?</h2><p>Since Spark 3.0, there&#8217;s been this thing called <a href="https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution">Adaptive Query Execution (AQE)</a> &#8212; a smarter layer on top of the regular SQL planner. It can change the physical plan <em>while</em> your query is running, based on what&#8217;s actually happening.</p><p>Sounds amazing, right?</p><p>It is &#8212; up to a point.</p><p>Well, it can be. But you still have to know how it works, and when it doesn&#8217;t.</p><p>AQE can merge tiny shuffle partitions, split skewed ones, and even switch join strategies on the fly. It&#8217;s Spark learning as it goes. For open-source Spark &#8804;3.3, you need to turn it on explicitly; for newer versions and most Databricks environments, it&#8217;s already enabled:</p><pre><code>spark.conf.set(&#8221;spark.sql.adaptive.enabled&#8221;, &#8220;true&#8221;)
spark.conf.set(&#8221;spark.sql.adaptive.coalescePartitions.enabled&#8221;, &#8220;true&#8221;)
spark.conf.set(&#8221;spark.sql.adaptive.skewJoin.enabled&#8221;, &#8220;true&#8221;)
spark.conf.set(&#8221;spark.sql.adaptive.shuffle.targetPostShuffleInputSize&#8221;, &#8220;64MB&#8221;)</code></pre><p>Once enabled, Spark starts adapting <em>at runtime</em> &#8212; not during planning, but while the job is running.</p><p>You filter 2TB down to 50MB, and instead of blindly running 200 tasks and writing 5000 files, Spark notices and gives you 3 tasks. You group by a skewed key and Spark slices that one overloaded partition into smaller pieces so your stage doesn&#8217;t hang for an hour. It&#8217;s the first time Spark feels like it&#8217;s paying attention.</p><p>But AQE only kicks in <strong>after the first shuffle</strong>. It won&#8217;t help you with garbage input partitioning, bad file layouts, or anything you&#8217;ve explicitly told Spark to do. If your source data has wrong stats &#8212; say, compressed JSON or Kafka streams &#8212; AQE might even make things worse. And since it adjusts plans at runtime, <code>explain()</code> might show you one thing, and the Spark UI might do something else entirely.</p><p>So no, AQE isn&#8217;t magic. It&#8217;s a <em>smart assistant</em>, not a mind reader.<br>It helps polish your plan &#8212; it doesn&#8217;t design it for you. You still need good partitioning fundamentals. AQE just makes the edges smoother.</p><h2>Wrapping it up</h2><ul><li><p><strong>Input partitions</strong>: Spark guesses how to split files based on format, size, and config. Sometimes it gets it right. Often it doesn&#8217;t. Especially with JSON, Kafka, and weird file layouts. You need to <em>know</em> what&#8217;s happening under the hood.</p></li><li><p><strong>Shuffle partitions</strong>: 200 is not a sacred number, it&#8217;s a default. And like all defaults, it&#8217;s wrong 80% of the time.</p></li><li><p><strong>Output partitions</strong>: Every partition = one output file. You write 3000 partitions? You get 3000 files. Coalesce or repartition before writing.</p></li><li><p><strong>Repartition()</strong>: Use it when your data got filtered/skewed and you want a fresh, even split. It&#8217;s expensive but sometimes necessary.</p></li><li><p><strong>Coalesce()</strong>: Use it when you want <em>fewer</em> files, and you trust your data is already nicely distributed. Use it before writes. Don&#8217;t overdo it.</p></li><li><p><strong>partitionBy()</strong>: Controls how data is <em>written to disk</em>. Not how it&#8217;s processed in memory. Don&#8217;t confuse the two.</p></li><li><p><strong>AQE</strong>: It&#8217;s smart. Smarter than the old planner. But it&#8217;s not a mind reader. Think of it like auto-correct &#8212; helpful, but you still need to know how to spell.</p></li></ul><h3>Materials</h3><ul><li><p><a href="https://spark.apache.org/docs/latest/index.html">Apache Spark docs</a></p></li><li><p><a href="https://amzn.to/3YVgKw3">Spark: The Definitive Guide by Bill Chambers, Matei Zaharia</a></p></li><li><p><a href="https://amzn.to/3YD2Pu8">Learning Spark by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Anatomy of Apache Spark Application]]></title><description><![CDATA[Apache Spark job autopsy]]></description><link>https://luminousmen.substack.com/p/anatomy-of-apache-spark-application</link><guid isPermaLink="false">https://luminousmen.substack.com/p/anatomy-of-apache-spark-application</guid><dc:creator><![CDATA[luminousmen]]></dc:creator><pubDate>Tue, 12 Aug 2025 13:03:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5Sj_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Sj_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Sj_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png" width="800" height="405" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:405,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32378,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/170748125?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Sj_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!5Sj_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa94bef8-db9d-4024-8d7c-cc3f9c37a559_800x405.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Where are we?</h2><p>Let's pause and take stock of where we are in this series.</p><p><a href="https://luminousmen.substack.com/p/cluster-managers-for-apache-spark">In the first post</a> we talked about the <em>playground</em> Spark runs on &#8212; clusters, resource managers - all that infrastructure you need before a single line of code moves data.</p><p><a href="https://luminousmen.substack.com/p/apache-spark-core-concepts-explained">In the second post</a> we talked about how Spark <em>thinks</em> &#8212; it's core abstractions like <a href="https://luminousmen.com/post/spark-core-concepts-explained/#rdd--the-core-of-spark">RDDs (Resiliant Distributed Dataset)</a> , DataFrames, transformations, and actions. Basically, the theory.</p><p>Now we finally get to the fun part: what actually happens when your job runs: <em>Spark job's life cycle</em>. This is where all that abstract stuff suddenly becomes very real. Tasks start flying, executors spin up, and data shuffles across the cluster.</p><h2>Spark Application Architecture Essentials</h2><p>If you strip Spark down to its moving parts, it's basically a distributed to-do&#8209;do list with a coordinator. Your code says, "I want to crunch this giant pile of data", and Spark figures out who does <em>what</em>, <em>where</em>, and <em>when</em>.</p><p>If you've ever stared at the Spark UI wondering why stage 2 has 400 tasks and stage 3 has 12, these are the components that are making that happen:</p><ul><li><p><strong>Spark Driver</strong> &#8212; the brain of the operation. It turns your nice, innocent <code>map</code> and <code>reduceByKey</code> calls into a DAG (<a href="https://luminousmen.com/post/spark-core-concepts-explained/#directed-acyclic-graph">Directed Acyclic Graph</a>, if you don't recall) of tasks, sends them to the cluster, and keeps tabs on who's still alive.</p></li><li><p><strong>Executors</strong> &#8212; the workers. They're JVM processes living on the cluster nodes, running tasks in parallel, caching RDDs, and shuffling data around. Spark scales because you can have hundreds of these chewing through partitions at the same time.</p></li><li><p><strong>Cluster Manager / Resource Manager</strong> &#8212; the landlord that owns the machines (the land of the cluster). Popular ones are <a href="https://luminousmen.com/post/hadoop-yarn-spark/">YARN, Kubernetes, or Spark's standalone manager</a>.</p></li><li><p><strong>Spark Context</strong> &#8212; the entry point to all of Spark's features. It's the Spark Driver's (see below) hotline to the cluster, letting it request executors, create RDDs, and manage shared variables. You only get one per JVM.</p></li></ul><p>Each of these pieces is part of the pipeline that turns your "just run <code>df.groupBy().count()</code>" code into a swarm of distributed tasks.</p><p>Let's dive deeper into these components to understand their roles.</p><h2>Spark Driver</h2><p>The Driver is where a Spark application truly begins. Once the Driver spins up and takes charge, everything that happens in the cluster is dictated by this single process.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HdH1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HdH1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HdH1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HdH1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HdH1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HdH1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:430469,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/170748125?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HdH1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HdH1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HdH1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HdH1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7bdb0c1-69a8-4a66-88a5-c3afc6f07cfe_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the hierarchy of a Spark application (application &#8594; job &#8594; stage &#8594; task), the Driver is at the top &#8212; it manages the entire process from the top down in real time, orchestrating how work flows down from jobs to individual tasks on executors. Everything else in Spark exists to serve the plan that the Driver creates.</p><p>As your code runs and you define transformations, the Driver incrementally assembles them into a <strong><a href="https://luminousmen.com/post/spark-core-concepts-explained/#logical-vs-physical-plans">logical DAG</a></strong> of operations. Spark is lazy &#8212; these transformations only describe how to compute the data, not execute it. No actual computation happens until you call an action like <code>collect</code> or <code>count</code>. At that point, the Driver turns the logical DAG into a <strong>physical execution plan</strong> and starts running tasks on executors.</p><p>Execution isn't a single step. The Driver cuts the DAG into <strong>stages</strong> wherever a wide dependency appears. Each stage is then broken into <strong>tasks</strong>, one per data partition. These tasks are the actual work units that will eventually land on executors somewhere in the cluster.</p><p>Once the tasks exist, the Driver takes on the role of a scheduler. It talks to the Cluster Resource Manager to request resources, hands batches of tasks to available executors, and keeps a close eye on their progress. If an executor dies, the Driver knows which tasks were running there and can resubmit them elsewhere. This constant awareness is what allows Spark to tolerate node failures without losing your entire job.</p><p>At the same time, the <a href="https://luminousmen.com/post/spark-history-server-and-monitoring-jobs-performance/">Driver is the historian of the application</a>. It keeps lineage information for RDDs and partitions, maintains the logical-to-physical mapping of operations, and exposes this through the Spark UI.</p><blockquote><p>&#128161; If you've ever stared at a "Jobs" tab watching stages light up in green or red, that's the Driver narrating the story of your computation in real time.</p></blockquote><p>Everything in your application &#8212; jobs, stages, and tasks &#8212; flows downward from the Driver. It is the single point from which high&#8209;level logic becomes executable reality, quietly controlling the chaos of a distributed system.</p><p>In material terms, the Driver is just a JVM process. Where the Driver lives depends on how you launch the job. <em>In client mode</em>, it stays on your laptop or submission machine, orchestrating everything remotely. <em>In cluster mode</em>, it moves into the cluster itself, running on a dedicated node. In both cases, its existence is non&#8209;negotiable &#8212; kill the Driver, and the application is gone. There's no Spark without the Driver.</p><p>How can the Driver can do so much? The Driver itself consists of several managers. Let's describe them one by one.</p><h3>DAGScheduler</h3><p>If the Driver is the brain, the <strong>DAGScheduler</strong> is the part that thinks in big pictures. It doesn't care about CPUs, threads, or executors. Instead, it stares at your logical plan &#8212; the DAG of RDD transformations &#8212; and figures out how to break it into stages.</p><p>One stage can run as long as its input partitions are available locally. The DAGScheduler's job is to translate logical operations into <strong><a href="https://luminousmen.com/post/spark-core-concepts-explained/#dag-construction">stage boundaries</a></strong>. It looks at your <strong>RDD lineage</strong>, identifies wide dependencies like <code>groupByKey</code> or <code>join</code>, and decides where data shuffles must occur. Every shuffle creates a stage boundary because data has to move between nodes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_r9F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_r9F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_r9F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_r9F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_r9F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_r9F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/edfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:297864,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/170748125?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_r9F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_r9F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_r9F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_r9F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedfbcd26-f2f2-4a85-b512-549bf2fca8b9_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>From there, the DAGScheduler breaks each stage into a set of tasks literally called <strong>TaskSets</strong>, one per partition, and sends them to the <strong>TaskScheduler</strong>. But it doesn't just generate tasks &#8212; it also tracks their lineage. This lineage allows Spark to recompute lost partitions in case of failure instead of starting your job from scratch.</p><p>DAGScheduler is also the one who decides job success or failure. If a stage can't make progress after repeated attempts &#8212; say, due to corrupted input data &#8212; it's the DAGScheduler that ultimately declares the job to have failed.</p><p>It is important to note that, the DAGScheduler makes <em>dynamic</em> decisions. It doesn't pre-schedule all stages at once. Instead, it reacts to stage completions and unlocks child stages on-the-fly.</p><h3>TaskScheduler</h3><p>Once the DAGScheduler has carved the DAG into stages and handed off a TaskSet for the next stage, the <strong>TaskScheduler</strong>. This is the part of Spark that stops thinking in lineage and starts thinking in terms of machines and slots.</p><p>The TaskScheduler doesn't care about RDDs or shuffles. Its job is mechanical:</p><ul><li><p>Take each TaskSet from the DAGScheduler</p></li><li><p>Work with the SchedulerBackend to request executors and figure out where tasks can run</p></li><li><p>Assign tasks to executors, trying to keep them close to their data (see below) whenever possible.</p></li><li><p>Retry failed tasks when necessary</p></li></ul><p>If an executor dies or a task throws an exception, the TaskScheduler resubmits the task to another executor, respecting Spark's task locality preferences &#8212; keeping tasks close to where the data resides whenever possible.</p><p>This handoff &#8212; DAGScheduler to TaskScheduler &#8212; is what converts a <em>logical</em> plan into a <em>physical</em> reality running across the cluster. Where the DAGScheduler thinks in stages, the TaskScheduler works in terms of executors and available CPU cores &#8212; the slots where tasks will run. It speaks the language of threads, resources, and task placement.</p><blockquote><p>&#128161; A slot is not a class, API, or config setting. It's just a term for a <strong>single task execution thread</strong> on an executor. If an executor has 4 cores, it has 4 slots &#8212; meaning it can run 4 tasks in parallel. Slots are Spark's way of describing how many tasks can run concurrently across the cluster. More cores = more slots = more parallelism. It's that simple.</p></blockquote><p>All that being said, TaskScheduler itself does not launch tasks. It hands them to the SchedulerBackend.</p><h3>SchedulerBackend</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hOGy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hOGy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hOGy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hOGy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hOGy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hOGy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg" width="1456" height="558" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:558,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:489231,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/170748125?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hOGy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hOGy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hOGy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hOGy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c4282fc-4c2d-4d56-b63e-3c09ab684dfc_4350x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Underneath the TaskScheduler lives the <strong>SchedulerBackend</strong>. The SchedulerBackend is the messenger between Spark and the cluster manager. It doesn't understand DAGs or stages &#8212; it just deals in resource requests and executor lifecycle management.</p><p>Its responsibilities include:</p><ul><li><p>Talking to the cluster manager to request new executors</p></li><li><p>Launching and killing executors as needed</p></li><li><p>Reporting executor availability back to the TaskScheduler</p></li></ul><p>When you see Spark dynamically scale executors up and down during a job, that's the SchedulerBackend negotiating with the cluster manager behind the scenes.</p><h3>BlockManagerMaster</h3><p>While above schedulers focus on <em>when</em> and <em>where</em> tasks run, the <strong>BlockManager</strong> is all about <em>the data those tasks need.</em></p><p>Spark processes everything as partitions, and each partition becomes a <strong>block</strong> stored either in memory or on disk. And every RDD partition, every shuffle file, every broadcast variable goes through the BlockManager. When an executor needs a block of data, the BlockManager either hands it over locally or fetches it from a neighbor.</p><blockquote><p>&#128161; You might be wondering, <em>are we talking about a shuffle operation here?</em> Mostly, yes &#8212; this is what happens during a shuffle read. When a wide transformation like `reduceByKey` runs, tasks often need to fetch intermediate data (shuffle blocks) produced by other tasks on different executors. The BlockManager handles these lookups and transfers. But we're not just talking about shuffle here &#8212; the same mechanism is also used when tasks fetch cached RDDs or broadcast variables from remote nodes. If data isn't local, the BlockManager knows where to find it, and how to pull it across the network.</p></blockquote><p>Every executor has its own BlockManager, but the Driver hosts the <strong>BlockManagerMaster</strong>, which keeps a global map of all block locations across the cluster. Whenever a task needs a block, the BlockManager can serve it directly if it's local, or fetch it from a remote executor if necessary. This is the mechanism that makes shuffles, caching, and recomputation possible.</p><p>BlockManager handles:</p><ul><li><p>Storage and retrieval of blocks in memory or on disk</p></li><li><p>Replication for fault tolerance, so losing one executor doesn't mean losing the only copy of a partition</p></li><li><p>Data exchange between executors during shuffles or when <a href="https://luminousmen.com/post/explaining-the-mechanics-of-spark-caching/">fetching cached partitions</a></p></li></ul><p>It also powers Spark's <strong>caching and persistence</strong>. When you call <code>cache</code> or <code>persist</code>, the BlockManager decides whether the block stays in memory, spills to disk, or is evicted to make room for new data. This constant juggling is what keeps future stages fast without running out of memory.</p><p>If a BlockManager disappears, Spark may need to recompute lost partitions using RDD lineage or fetch replicas from surviving executors. Without this system, Spark would have no awareness of where its data lives, and every shuffle or cached computation would fall apart. Sadly.</p><div><hr></div><p>Together, these four components turn the Driver from a passive coordinator into a full blown distributed engine. And when they all do their jobs well, you never have to think about them &#8212; you just see your Spark job finish.</p><h3>Spark Context</h3><p>The <strong>Spark Context</strong> serves as the primary entry point for all Spark operations, acting as a <em>bridge</em> between the Spark Driver and the cluster's resources. Through the Spark Context, the Driver communicates with the <strong>Cluster Resource Manager</strong>, requests executors, and coordinates the execution of distributed computations. It's also the mechanism for creating RDDs, managing shared variables and tracking the status of executors via regular heartbeat messages.</p><p>Every Spark application operates with its own dedicated Spark Context, instantiated by the Driver when the application is submitted. This context remains active throughout the application's lifecycle, serving as the glue that holds together the distributed components. Once the application completes, the context is terminated, releasing the associated resources.</p><blockquote><p>&#128161; A critical limitation of Spark Context is that only <em>one active context is allowed per JVM</em>. If you need to initialize a new Spark Context, you must explicitly call `stop` on the existing one. That way we have only one resource management entity which prevents conflicts within the single application.</p></blockquote><p>The modern way to start a Spark application is through a <code>SparkSession</code>, which wraps around the lower-level <code>SparkContext</code>. While the <code>SparkSession</code> gives you access to DataFrames, SQL, and config management in a friendly API, it's the <code>SparkContext</code> underneath that actually drives the core engine &#8212; requesting executors, creating RDDs, and coordinating cluster resources. You can think of <code>SparkSession</code> as the front desk, and SparkContext as the operations center.</p><h3>Cluster Resource Manager</h3><p>In a distributed Spark application, the <strong>Cluster Resource Manager</strong> (or Resource Manager for short) is the component responsible for allocating compute resources across the cluster. Depending on your deployment, this could be YARN, <a href="https://luminousmen.com/post/kubernetes-101">Kubernetes</a>, or Spark's own standalone cluster manager.</p><p>When a Spark application starts, the <strong>Driver</strong> requests executor resources via the <strong>SchedulerBackend</strong>, which then communicates with the Cluster Resource Manager. Based on available capacity and scheduling policies, the cluster manager launches executor processes on worker nodes &#8212; in containers, pods, or raw processes, depending on the setup.</p><p>Once executors are up, they register back with the Driver, and the Driver begins assigning tasks to them. From that point on, the cluster manager's job is mostly done &#8212; it keeps the executors running and watches for node failures, while Spark handles the actual data processing.</p><p>The SparkContext is the Driver's interface to the cluster manager. It abstracts the differences between backends and lets your application scale across environments without changing your code.</p><h3>Executors</h3><p><strong>Executors</strong> are the backbone of a Spark application, running on worker nodes to execute tasks assigned by the Driver. These processes handle the actual data processing and data storage.</p><p>By default, executors are allocated statically, meaning their number remains fixed for the duration of the application. However, Spark also supports <em>dynamic allocation</em>, where executors can be added or removed to adapt to workload changes. While this flexibility can optimize resource utilization, it may affect other running applications and introduce <a href="https://en.wikipedia.org/wiki/Resource_contention">contention</a> with other applications sharing the same cluster.</p><p>In addition to task execution, executors manage data storage. They use the BlockManager to store intermediate RDD data, which can be cached <em>in-memory</em> for quick access or spilled to <em>disk</em> when memory is insufficient (using <code>localCheckpoint</code>).</p><h2>Spark Application Running Steps</h2>
      <p>
          <a href="https://luminousmen.substack.com/p/anatomy-of-apache-spark-application">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Apache Spark Core Concepts Explained]]></title><description><![CDATA[Let's deep dive into Apache Spark core abstractions]]></description><link>https://luminousmen.substack.com/p/apache-spark-core-concepts-explained</link><guid isPermaLink="false">https://luminousmen.substack.com/p/apache-spark-core-concepts-explained</guid><dc:creator><![CDATA[luminousmen]]></dc:creator><pubDate>Tue, 22 Jul 2025 13:03:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!TrJw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TrJw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TrJw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!TrJw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!TrJw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!TrJw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TrJw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png" width="800" height="405" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:405,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32378,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/165644813?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TrJw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!TrJw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!TrJw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!TrJw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e4581d9-0eb4-4cbc-859b-782cc8bae696_800x405.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you've spent any time wrangling data at scale, you've probably heard of Apache Spark. Maybe you've even cursed at it once or twice &#8212; don't worry, you're in good company.</p><p>Spark has become the go-to framework for big data processing, and for good reason: it's fast, versatile, and (once you get the hang of it) surprisingly elegant. But mastering it? That's a whole other story. Spark is packed with features and an architecture that feel simple on the surface but quickly grow complex. If you've ever struggled with long shuffles, weird partitioning issues, or mysterious memory errors, you know exactly what I mean.</p><p>This article is the second in a series on Apache Spark, created to help you get past the basics and into the real nuts and bolts of how it works &#8212; and how to make it work for you.</p><h2>Apache Spark Key Abstractions</h2><p>At its core, Spark is built on two fundamental abstractions:</p><ol><li><p><strong>Resilient Distributed Dataset (RDD)</strong></p></li><li><p><strong>Directed Acyclic Graph (DAG)</strong></p></li></ol><p>These are the gears under the hood of everything Spark does. Let's unpack each one.</p><h3>RDD &#8212; The Core of Spark</h3><p>The beating heart of Apache Spark is the <strong>Resilient Distributed Dataset (RDD)</strong> &#8212; a collection of objects spread across cluster nodes for parallel processing. RDDs were Spark's original abstraction and still remain central to its design, even as DataFrames and Datasets have taken center stage.</p><blockquote><p>&#128161; While RDDs are still there under the hood &#8212; especially when it comes to low-level fault-tolerant mechanics &#8212; you shouldn't write modern Spark code directly with RDDs. Unless you're doing something super low-level (like custom serialization or working with graphs), <a href="https://luminousmen.com/post/spark-core-concepts-explained/#beyond-rdds-dataframes-and-datasets">stick with DataFrames or Datasets</a>. New releases of Apache Spark pushes even harder on DataFrames and Datasets.</p></blockquote><p>Physically, an RDD lives as an object in the JVM, pointing to data from external sources (HDFS, S3, Cassandra, etc). Every RDD carries its own metadata to enable fault tolerance and distributed execution. Key components include:</p><ul><li><p><strong>Partitions</strong> &#8212; chunks of data distributed across the cluster nodes. One partition = one unit of parallelism.</p></li><li><p><strong>Dependencies</strong> &#8212; lineage information, a list of parent RDDs and transformation history, forming a <strong>lineage graph</strong>. This lets Spark recompute lost data.</p></li><li><p><strong>Computation</strong> &#8212; the computation function applied to parent RDDs.</p></li><li><p><strong>Preferred Locations</strong> &#8212; hints where partitions are stored, enabling data-local execution.</p></li><li><p><strong>Partitioner</strong> &#8212; defines how data is split into partitions (like default <code>HashPartitioner</code> or <code>RangePartitioner</code>, etc).</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3mwN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3mwN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3mwN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3mwN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3mwN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3mwN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:424946,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/165644813?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3mwN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3mwN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3mwN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3mwN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75126615-ca86-4d3b-918d-8dec08ff013a_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>RDDs are <strong><a href="https://luminousmen.com/post/why-apache-spark-rdd-is-immutable/">immutable</a></strong> &#8212; you can never change an RDD in place, you simply derived from an existing RDD using transformation. That new RDD contains a pointer to the parent RDD and Spark keeps track of all the dependencies between these RDDs via lineage graph. In case of data loss, Spark replays the lineage to regenerate it from the original data source making the pipelines fault tolerant.</p><p>Here's a quick example:</p><pre><code><code>&gt;&gt;&gt; from pyspark import SparkContext
&gt;&gt;&gt; sc = SparkContext.getOrCreate() # sc is the SparkContext, the entry point to Spark
&gt;&gt;&gt; rdd = sc.parallelize(range(20)) # Create RDD with numbers from 0 to 19
&gt;&gt;&gt; rdd
ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:195
&gt;&gt;&gt; rdd.collect() # Collects all data from nodes and returns it to the driver node
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]</code></code></pre><blockquote><p>&#128161; The Spark Driver orchestrates Spark jobs, manages tasks, and collects results from executors (like with <code>collect()</code>). For a deeper content, check out <a href="https://luminousmen.com/post/spark-anatomy-of-spark-application#spark-driver">the next blog post</a>.</p></blockquote><p>RDDs can be <strong><a href="https://luminousmen.com/post/explaining-the-mechanics-of-spark-caching/">cached</a></strong> to optimize repeated computations. This is huge when your pipeline hits the same RDD multiple times. Without caching, Spark must traverse the lineage graph to recompute RDDs from scratch every time they used in a computation.</p><p>You can <a href="https://luminousmen.com/post/spark-partitions/">control partitioning</a> manually to balance workloads or rely on Spark's defaults (typically based on cluster configuration). More, smaller partitions often mean better resource utilization (though there's a point of diminishing returns).</p><p>Let's inspect the number of partitions and their contents:</p><pre><code><code>&gt;&gt;&gt; rdd.getNumPartitions() # Returns the number of partitions (default behavior depends on cluster settings)
4
&gt;&gt;&gt; rdd.glom().collect() # Collects partitioned data as lists back to the driver node (this is Spark's standard behavior)
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]]</code></code></pre><p>All operations on RDDs fall into two categories:</p><ul><li><p><strong>Transformations</strong> (lazy)</p></li><li><p><strong>Actions</strong> (eager)</p></li></ul><h3>Transformations</h3><p>Transformations are the backbone of RDD processing in Spark. They're how you take a dataset and do something meaningful with it &#8212; filter it, map over it, join it, whatever. But there's a catch: <strong>transformations are lazy</strong>.</p><p>When you call a transformation (e.g. <code>map</code> or <code>filter</code>), Spark doesn't immediately execute it. It just takes notes &#8212; <em>"Okay, when someone asks for the result, I'll know what to do"</em>. And actual execution only kicks in when you trigger an <strong>action</strong> like <code>collect</code> or <code>count</code>. This approach &#8212; known as <strong>lazy evaluation</strong> &#8212; lets Spark analyze your entire workflow, optimize execution plans, and minimize data movement before doing any heavy lifting.</p><blockquote><p>&#9888;&#65039; Side effects in transformations? Bad idea. Since transformations are lazy, they might get evaluated multiple times or in weird orders. If you're relying on side effects (like writing to a file in <code>map()</code>), you're setting yourself up for trouble. Just don't.</p></blockquote><p>But not all transformations are created equal. Spark draws a very important line in the sand.</p><h3>Narrow vs Wide Transformations</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GqbK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GqbK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GqbK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GqbK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GqbK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GqbK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:523474,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/165644813?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GqbK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GqbK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GqbK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GqbK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2e3b63d-99ef-4ca7-ba92-2e914479307c_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Narrow transformations</strong> (e.g. <code>map</code>, <code>filter</code>, <code>flatMap</code>) don't require moving data between partitions. Each partition works with its own local data, independently, making these operations fast and highly parallelizable (aka <em>embarrassingly parallel</em>).</p></li><li><p><strong>Wide transformations</strong> (e.g. <code>groupByKey</code>, <code>reduceByKey</code>, <code>join</code>) require <strong>shuffling</strong> &#8212; a fancy way of saying Spark needs to move data between executors to regroup things to meet the transformation's requirements. These are slow, expensive, and should be treated with suspicion unless absolutely necessary.</p></li></ul><blockquote><p>&#128161;<strong>Note on shuffling (aka data shuffle)</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VFNi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VFNi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VFNi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VFNi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VFNi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VFNi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg" width="510" height="356.2293956043956" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:510,&quot;bytes&quot;:237561,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/165644813?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VFNi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VFNi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VFNi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VFNi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18ed473f-55a5-4766-8b56-e527d69935f5_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><blockquote><p>A shuffle happens when Spark has to move data between partitions across the cluster &#8212; usually to regroup, sort, or aggregate it. This is required when the output of a transformation depends on records from <em>multiple</em> partitions &#8212; for example, summing all values or grouping records by some key.</p></blockquote><blockquote><p>During a shuffle, Spark performs a full-on data exchange: it reads from all partitions, redistributes data according to some partitioning logic, and writes new partitions &#8212; often on different executors. That's expensive, as it involves network I/O, data serialization, and disk writes.</p></blockquote><blockquote><p><strong>Minimizing unnecessary shuffling is key to optimizing Spark jobs.</strong></p></blockquote><p><em>Narrow transformations</em> operate entirely within a single partition, avoiding the overhead of data shuffling.</p><p>For example:</p><pre><code><code>&gt;&gt;&gt; filteredRDD = rdd.filter(lambda x: x &gt; 10) # Filter elements greater than 10
&gt;&gt;&gt; print(filteredRDD.toDebugString()) # Print out the lineage/DAG
(4) PythonRDD[1] at RDD at PythonRDD.scala:53 []
|  ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:195 []
&gt;&gt;&gt; filteredRDD.collect() # Only local partition filtering, no shuffle
[11, 12, 13, 14, 15, 16, 17, 18, 19]</code></code></pre><p>There is no shuffle here &#8212; Spark doesn't move any data &#8212; each partition filters its own elements. That's what makes narrow transformations cheap and parallelizable. In fact, Spark often fuses a chain of narrow transformations into a single block (a stage), optimizing them behind the scenes.</p><p><em>Wide transformations</em> by contrast involve dependencies across multiple partitions, requiring Spark to shuffle data around. For example:</p><pre><code><code>&gt;&gt;&gt; groupedRDD = filteredRDD.groupBy(lambda x: x % 2) # Group data based on mod
&gt;&gt;&gt; print(groupedRDD.toDebugString()) # Triggers shuffle, costly but necessary for grouping
(4) PythonRDD[6] at RDD at PythonRDD.scala:53 []
|  MapPartitionsRDD[5] at mapPartitions at PythonRDD.scala:133 []
|  ShuffledRDD[4] at partitionBy at NativeMethodAccessorImpl.java:0 []
+-(4) PairwiseRDD[3] at groupBy at &lt;ipython-input-5-a92aa13dcb83&gt;:1 []
|  PythonRDD[2] at groupBy at &lt;ipython-input-5-a92aa13dcb83&gt;:1 []
|  ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:195 []</code></code></pre><p>Here, Spark had to shuffle data based on the grouping key (in this case, odd/even values). Notice the <code>ShuffledRDD</code>? That's your performance tax right there.</p><blockquote><p>&#128161; So when do you <em>have</em> to use wide transformations? If you're grouping, aggregating, or joining across partition boundaries or your logic inherently needs a global view of the dataset. That's fine &#8212; just <strong>be intentional about it</strong>. Know when you're paying the shuffle tax, and make sure it is worth it.</p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://luminousmen.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Blog | luminousmen is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3>Actions</h3><p>Unlike transformations, actions are eager. They will compute things. If transformations are the instructions, actions are the "Go!" button. They're what finally force Spark to roll up its sleeves, stop being lazy, and actually execute the computation.</p><p>Calling an action triggers execution of the entire DAG you've been building with transformations. Spark spins up tasks, shuffles data if needed, and materializes the result &#8212; whether that means dumping it to disk, loading it into a database, or just printing a sample to console.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QpuV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QpuV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QpuV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QpuV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QpuV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QpuV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:308751,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/165644813?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QpuV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QpuV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QpuV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QpuV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe3b1f48-4ff4-4930-a634-ffb0eff3d349_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Common actions include:</p><ul><li><p><code>collect</code>: Pulls all the data to the driver. Great for debugging, terrible for large datasets. Don't use it unless you know your data fits in memory.</p></li><li><p><code>reduce</code>: Aggregates all elements using a user-defined function.</p></li><li><p><code>count</code>: Counts all the elements in the RDD.</p></li></ul><p>For example:</p><pre><code><code>&gt;&gt;&gt; filteredRDD.reduce(lambda a, b: a + b)
135</code></code></pre><p>This line kicks off a full DAG execution. All previous transformations get evaluated, partitioned tasks are distributed across the cluster, and the final result comes to the Spark driver.</p><h3>Beyond RDDs: DataFrames and Datasets</h3><p>While RDDs form the foundation of Spark, modern APIs like <strong>DataFrames</strong> and <strong>Datasets</strong> are preferred for handling structured data due to their optimizations and simplicity. Here's a quick comparison:</p><ul><li><p><strong>RDDs</strong>: Provide low-level control over distributed data but lack query optimizations. Best for unstructured data and fine-grained transformations.</p></li><li><p><strong>DataFrames</strong>: Represent data in a table-like format with named columns (like a SQL table). They leverage Spark's Catalyst Optimizer for efficient query planning and execution.</p></li><li><p><strong>Datasets</strong>: Offer the benefits of DataFrames with type safety, making them ideal for statically typed languages like Scala or Java.</p></li></ul><p>Since Spark 3.x, DataFrames aren't just preferred &#8212; they're practically mandatory for performance-sensitive or production Spark workloads. While RDDs give more control, if you're using RDDs for anything besides niche low-level operations, you're fighting the engine.</p><h2>Directed Acyclic Graph</h2><p>One of the biggest innovations that sets Apache Spark ahead of Hadoop MapReduce &#8212; aside from being 100x faster and 10x less painful &#8212; is the way it models your computation. Instead of rigid, hardcoded pipelines of Map &#8594; Shuffle &#8594; Reduce, Spark builds a <strong>Directed Acyclic Graph (DAG)</strong> as it ingests your transformations: a blueprint of operations and dependencies that spells out exactly how your job will execute.</p><h3>What Is a DAG, Anyway?</h3><p>At its core, a DAG in Spark is just a graph with two key properties:</p><ul><li><p><strong>Directed</strong>: every edge points one way &#8212; there's a clear "before" and "after"</p></li><li><p><strong>Acyclic</strong>: there are no cycles &#8212; once you move forward, you cannot loop back</p></li></ul><p>Why does this matter? Because it guarantees that your data processing pipeline has a clear beginning and end. There's no chance of infinite loops, no ambiguous backtracking &#8212; just a straight&#8208;line (albeit branching and merging) flow from raw data to final results.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9aWE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9aWE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9aWE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9aWE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9aWE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9aWE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg" width="1456" height="1192" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1192,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:236177,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/165644813?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9aWE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9aWE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9aWE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9aWE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1261736-9d62-4a38-b376-d2dec3ef5eb7_2038x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Nodes</strong> in this graph are your RDD transformations and actions: <code>map</code>, <code>filter</code>, <code>flatMap</code>, <code>reduceByKey</code>, and so on.</p></li><li><p><strong>Edges</strong> represent the dependencies between those operations: "I can't run this <code>reduceByKey</code> until the map and filter that precede it have finished."</p></li></ul><p>That's it. Simple and elegant.</p><h3>Logical vs Physical Plans</h3><p>Before we get into the DAG construction, let's clarify two concepts:</p><ul><li><p><strong>Logical DAG</strong>: The abstract plan Spark builds when you define transformations but haven't yet called an action. It's a high&#8208;level blueprint that says "do <code>map</code>, then <code>filter</code>, then <code>reduceByKey</code>", but it hasn't decided how or where to execute anything.</p></li><li><p><strong>Physical execution plan</strong>: The concrete realization of that DAG once you call an action. Here, Spark chooses the actual algorithms, the number of partitions, which nodes to run on, and so forth.</p></li></ul><p>Spark keeps these two worlds separate to maintain laziness &#8212; it only pays the execution cost when you truly need a result.</p><h3>DAG Construction</h3><p>When you define a chain of transformations nothing executes yet. Spark's driver program records each call, building up that logical DAG in memory. These nodes and edges stack up until you finally call an action - the green light for Spark to convert the logical DAG into a physical execution plan and start processing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0X1G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0X1G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0X1G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0X1G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0X1G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0X1G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg" width="1456" height="646" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:646,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:222542,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/165644813?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0X1G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0X1G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0X1G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0X1G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F865f10f3-7f0a-4d69-a45e-f15777cd61ab_2302x1022.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But before Spark can start processing, the logical DAG must be split into stages. <strong>A stage</strong> is a sequence of operations that can be executed <em>without</em> a shuffle. Narrow transformations like <code>map</code>, <code>filter</code>, and <code>flatMap</code> &#8212; process data locally on each partition, so Spark fuses them into the same stage. The moment a wide transformation appears - such as <code>reduceByKey</code>, <code>groupByKey</code>, or <code>join</code> &#8212; a shuffle is required, and Spark cuts the DAG, starting a new stage.</p><p>Check out this classic word-count example:</p><pre><code><code>sc = SparkContext.getOrCreate()
text = sc.textFile("hdfs://...") # Read text file
words = text.flatMap(lambda line: line.split(r"\s+")) # Split lines into words
pairs = words.map(lambda word: (word, 1)) # Map each word to (word, 1)
counts = pairs.reduceByKey(lambda a, b: a + b) # Reduce by key (word) to get counts</code></code></pre><p>Here:</p><ul><li><p>Stage 1 will contain: <code>textFile</code> &#8594; <code>flatMap</code> &#8594; <code>map</code> &#8594; <code>filter</code></p></li><li><p>Stage 2 will contain: <code>reduceByKey</code></p></li></ul><p>That's two stages, each of which Spark can schedule independently, as soon as its dependencies are met.</p><h3>Why Spark Builds a DAG</h3><p>Imagine you're using classic Hadoop MapReduce. You write one Map job, it writes to disk, then a Reduce job reads from disk, does its work, and writes to disk again. On each step, the framework forces you to materialize data to HDFS, incurring heavy I/O and disk seeks. To make matters worse, you have zero understanding into the global structure of your computation &#8212; you only see one stage at a time.</p><p>Spark's approach &#8212; <em>don't run a thing until you have to</em>. Record your intentions, compile them into a DAG, then optimize across the whole graph before you fire off any tasks. That gives Spark two massive advantages:</p><ol><li><p><strong>Global optimization</strong>. The DAG allows Spark to treat your code as a full graph, not just a sequence of steps. It sees the entire dependency chain and can make both global and local optimizations &#8212; like reordering operations, collapsing stages, or adjusting partitioning dynamically (especially with <a href="https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution">AQE</a> enabled).</p></li><li><p><strong>Fault tolerance via lineage</strong>. Because Spark knows the entire history of your operations, if a partition goes missing, it only recomputes the lost data by re&#8208;applying the minimal needed transformations &#8212; not full job restarts.</p></li></ol><p>The DAG-based execution model is a big reason why Spark can handle complex workloads across massive datasets while minimizing the overhead typically associated with distributed processing. Without the DAG, Spark would lose much of its flexibility, speed, and fault tolerance.</p><h2>Conclusion</h2><p>By now you've seen how Spark's core abstractions &#8212; RDDs, transformations and actions, and the DAG execution model &#8212; work together to deliver both performance and resilience. You know now that lazy evaluation lets Spark optimize across your entire workflow, that narrow transformations avoid costly shuffles, and that lineage&#8208;based fault tolerance keeps your jobs reliable.</p><h4>Additional materials</h4><ul><li><p><a href="https://amzn.to/3YVgKw3">Spark: The Definitive Guide by Bill Chambers, Matei Zaharia</a></p></li><li><p><a href="https://amzn.to/3YD2Pu8">Learning Spark by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Cluster Managers for Apache Spark: from YARN to Kubernetes]]></title><description><![CDATA[Deep dive into machinery that orchestrates Spark]]></description><link>https://luminousmen.substack.com/p/cluster-managers-for-apache-spark</link><guid isPermaLink="false">https://luminousmen.substack.com/p/cluster-managers-for-apache-spark</guid><dc:creator><![CDATA[luminousmen]]></dc:creator><pubDate>Tue, 08 Jul 2025 13:02:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UisY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UisY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UisY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!UisY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!UisY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!UisY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UisY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png" width="800" height="405" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:405,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32378,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/167761990?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UisY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png 424w, https://substackcdn.com/image/fetch/$s_!UisY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png 848w, https://substackcdn.com/image/fetch/$s_!UisY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png 1272w, https://substackcdn.com/image/fetch/$s_!UisY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc6af0d-dd04-4d74-887e-af2358d60f45_800x405.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you've spent any time wrangling data at scale, you've probably heard of Apache Spark. Maybe you've even cursed at it once or twice &#8212; don't worry, you're in good company. Spark has become the go-to framework for big data processing, and for good reason: it's fast, versatile, and (once you get the hang of it) surprisingly elegant. But mastering it? That's a whole other story.</p><p>Spark is packed with features and an architecture that feels simple on the surface but gets complex real quick. If you've ever struggled with long shuffling, weird partitioning issues, or mysterious memory errors, you know exactly what I mean.</p><p>This article is the first in a series on Apache Spark, which I put together: to help you get past the basics and into the real nuts and bolts of how it works and how to make it work for you.</p><h2><strong>What is Spark?</strong></h2><p>Think of splitting your laptop into a thousand tiny replicas, each with its own RAM, then running <code>.groupBy().sum()</code> across billions of events. </p><p>That, in spirit, is Apache Spark.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yAyl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yAyl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yAyl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yAyl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yAyl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yAyl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:389542,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/167761990?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!yAyl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yAyl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yAyl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yAyl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6233fa65-40e9-444a-8291-58a3f603d351_2388x1668.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Apache Spark is an open&#8209;source cluster&#8209;computing engine built to handle both batch and streaming workloads with an unapologetic focus on speed. It keeps data in memory whenever possible, spilling to disk only when datasets outgrow RAM. Add concise APIs in Scala, Python, Java, and SQL, and engineers get a tool that feels as productive as a notebook but scales to a data&#8209;center.</p><p>Beyond language support and in&#8209;memory execution, Spark brings two other pillars that matter day&#8209;to&#8209;day:</p><ol><li><p><strong>Lazy evaluation.</strong> Transformations are recorded as a <a href="https://luminousmen.com/post/spark-core-concepts-explained/#directed-acyclic-graph">directed acyclic graph (DAG) </a>and executed only when an action forces a result, letting Spark optimize the entire plan.</p></li><li><p><strong>Resilience.</strong> The same lineage graph allows partitions to be recomputed on another machine if a node dies, keeping long jobs from collapsing.</p></li></ol><p>Yet for all its intelligence, Spark intentionally stays out of one debate: <strong>where</strong> your code should run. It simply declares, &#8220;Give me 80 executors, each with 4&#8239;vCPU and 8&#8239;GB RAM, plus a driver&#8221;. Choosing the actual hosts, allocating resources, and recovering from failure is the job of a <strong>cluster manager</strong>. But before we meet those managers, let&#8217;s rewind to the platform that made space for them.</p><h2>The Evolution of Hadoop</h2><p>In 2006, Yahoo! turned Google&#8217;s MapReduce research project into an open-source reality and called the result <strong>Hadoop</strong>. Release 0.x shipped two core pieces:</p><ul><li><p><strong>HDFS (Hadoop Distributed File System)</strong> &#8211; a block-based, write-once distributed file system tuned for high-throughput, sequential I/O on clusters of commodity servers.</p></li><li><p><strong>MapReduce</strong> &#8211; a batch-processing engine that runs every job through three canonical phases: map &#8594; distributed shuffle &#8594; reduce.</p></li></ul><p>For the Big Data world, this was a revelation &#8212; terabytes could now be processed on racks of cheap servers instead of specific premium hardware. </p><p>The next major revision, Hadoop 2 introduced a third component: <strong>YARN (Yet Another Resource Negotiator)</strong>. YARN split resource management out of MapReduce and became a cluster&#8209;wide operating system, handing out CPU and memory to any engine that asked politely. With YARN in place, processing engines like Spark, Tez, and Flink could run and flourish side-by-side on the same platform without touching Hadoop&#8217;s storage layer. That decoupling is what let Spark catch fire inside existing Hadoop deployments.</p><h2>Why Spark Won</h2><p>So, why did Spark become such a go-to tool for big data processing?</p><p>One word: <em>speed</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Flx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Flx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png 424w, https://substackcdn.com/image/fetch/$s_!1Flx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png 848w, https://substackcdn.com/image/fetch/$s_!1Flx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png 1272w, https://substackcdn.com/image/fetch/$s_!1Flx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Flx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png" width="570" height="350" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a47a2000-741c-4563-ae53-a5a12faddc95_570x350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:570,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:66121,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/167761990?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Flx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png 424w, https://substackcdn.com/image/fetch/$s_!1Flx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png 848w, https://substackcdn.com/image/fetch/$s_!1Flx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png 1272w, https://substackcdn.com/image/fetch/$s_!1Flx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa47a2000-741c-4563-ae53-a5a12faddc95_570x350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While MapReduce relies heavily on disk I/O &#8212; writing data to disk between each processing phase &#8212; Spark uses in-memory computing to keep data in RAM throughout the computation and spill to disk only when necessary. By explicitly<a href="https://luminousmen.com/post/explaining-the-mechanics-of-spark-caching/"> caching or persisting</a> an RDD/DataFrame, Spark dramatically increased processing speed, especially for iterative algorithms that need to access the same data multiple times. This shift from disk-based to in-memory processing makes Spark not just faster, but also more suitable for applications that require near real-time analytics or iterative machine learning models, which MapReduce struggles to handle efficiently.</p><p>On top of that, Spark&#8217;s friendy <em>API</em> (all cool stuff like Scala&#8217;s chained functions, Python&#8217;s lazy DataFrames, ANSI&#8209;compliant SQL) makes it a framework that&#8217;s not just useful, but one data engineers want to use. We prefer tools that minimize cognitive overhead so we can focus on solving problems, not wrestling with syntax, right?</p><p>Lastly, Spark is <a href="https://en.wikipedia.org/wiki/Polymodal">poly-modal</a> &#8212; batch ETL is built in, real-time pipelines can ride on Spark Structured Streaming, and machine-learning workloads plug straight into MLlib &#8212; all powered by the same execution engine.</p><p>Power and usability made engineers reach for Spark &#8212; as long as their jobs could find a slot to run. Leaving the comfort of a single machine means you still need a data&#8209;center operating system that decides <em>where</em> each executor lands and how to restart it after failure. That&#8217;s the job for cluster managers &#8212; YARN in the Hadoop world, Kubernetes in the container era, and a few others along the way.</p><h2>Cluster Managers 101</h2><p>To understand where YARN &#8212; and later Kubernetes &#8212; fit into the picture, let&#8217;s borrow a metaphor from everyday computing.</p><p>A single machine&#8217;s operating system does roughly two big jobs:</p><ul><li><p><strong>File System</strong>: Manages data storage and retrieval (FAT32, ext4, NTFS, etc).</p></li><li><p><strong>Kernel &amp; Scheduler</strong>: Decide which processes gets CPU, RAM, and I/O, and when.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h5PN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h5PN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h5PN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h5PN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h5PN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h5PN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:319443,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/167761990?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!h5PN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h5PN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h5PN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h5PN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd751b537-04fa-41d5-8d3c-6281bc1030fa_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When we scale this concept to a cluster level and implement it within Hadoop, we see a similar division. But instead of a single-node file system, Hadoop uses <strong>HDFS</strong> (Hadoop Distributed File System). <strong>YARN</strong> provides cluster-level resource negotiation (CPU cores and memory) and launches containers for the application execution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8xjR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8xjR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8xjR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8xjR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8xjR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8xjR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:293877,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/167761990?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!8xjR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8xjR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8xjR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8xjR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bd2862d-3c02-4d8a-9433-952105a45a11_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So Spark outsources the &#8220;OS&#8221;-type management tasks &#8212; it&#8217;s focusing only on the actual data processing.</p><p>Out of the box Spark can speak to:</p><ul><li><p><strong>Standalone &#8212; </strong>lightweight scheduler for laptops, CI pipelines, and small proofs&#8209;of&#8209;concept</p></li><li><p><strong>Apache Mesos &#8212; </strong>influential but largely retired now</p></li><li><p><strong>YARN</strong> &#8212; the default in Hadoop&#8209;centric stacks thanks to tight HDFS and Kerberos integration.</p></li><li><p><strong>Kubernetes</strong> &#8212; the default in container&#8209;first organizations which as of now is probably everybody</p></li></ul><blockquote><p>&#128161; We used &#8220;container&#8221; term a lot here, let me clarify if there is a confusion. In <strong>YARN</strong> terminology, a <em>container</em> is simply a slice of resources &#8212; CPU cores, memory, and (optionally) GPUs &#8212; allocated on a worker node. It shares the host&#8217;s operating system and filesystem. <strong>Kubernetes</strong>, on the other hand, schedules <strong>Docker/OCI containers</strong>, which bundle an application&#8217;s entire filesystem, runtime libraries, and process isolation via namespaces and cgroups. Same word, two very different abstractions.</p></blockquote><p>We deep dive into YARN because it paved the runway for Spark&#8217;s early success &#8212; and still powers petabytes of production data where HDFS and Hadoop security are non&#8209;negotiable.</p><h2><strong>Deep Dive into YARN</strong></h2><p>At the heart of YARN sits the <strong>ResourceManager</strong>. Think of it as the cluster&#8217;s traffic controller: every request for CPU or memory, from every application, flows through the ResourceManager first. Nothing starts, scales, or restarts without its say-so.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IajJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IajJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IajJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IajJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IajJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IajJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:324030,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/167761990?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!IajJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IajJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IajJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IajJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbcc1991-2780-4368-a692-af0300d10fe5_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>ResourceManager consists of two key services:</p><ul><li><p><strong>ApplicationManager:</strong></p><ul><li><p>Accepts and registers new applications</p></li><li><p>Manages the application state&#8209;machine (NEW &#8594; RUNNING &#8594; FINISHED/FAILED/KILLED)</p></li><li><p>Requests the first container to launch the ApplicationMaster (described below)</p></li></ul></li><li><p><strong>Scheduler:</strong></p><ul><li><p>Maintains an up&#8209;to&#8209;date view of free CPU, memory, and GPUs on every node (via heartbeats)</p></li><li><p>Decides <em>where</em> and <em>when</em> to place containers</p></li><li><p>Enforces multi&#8209;tenant policies (<a href="https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity</a>, <a href="https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair</a>, or FIFO)</p></li><li><p>Preempts lower&#8209;priority work</p></li></ul></li></ul><p>Each physical or virtual host in the cluster runs a <strong>NodeManager</strong> daemon, which reports to the ResourceManager.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rmCX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rmCX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rmCX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rmCX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rmCX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rmCX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:447579,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/167761990?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!rmCX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rmCX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rmCX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rmCX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b53a42-57c7-448f-b38e-7177d392cd63_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Its jobs are straightforward but crucial:</p><ul><li><p><strong>Launch containers</strong> on local Linux cgroups/namespaces</p></li><li><p><strong>Monitor</strong> the health and resource usage of those containers</p></li><li><p><strong>Report</strong> status back to the ResourceManager every few seconds (heartbeats)</p></li></ul><p>Each cluster node hosts several <strong>Containers </strong>(YARN containers), a Container is YARN&#8217;s atomic chunk of compute: a slice of memory, a share of vCPUs, plus the environment variables and files the task needs. </p><p>Containers are allocated to execute specific tasks for user applications. Their size and configuration can be adjusted based on workload requirements. The size of each container is negotiated at run time.</p><blockquote><p>&#128161; Going a bit deeper &#8212; when a Spark job starts, the ResourceManager asks a NodeManager to create a container; the NodeManager forks a JVM, attaches it to the cgroup, and hands the container ID back. If that executor tries to exceed its quota, the kernel&#8217;s cgroup OOM-killer terminates it, protecting the rest of the node.</p></blockquote><p>The NodeManager daemon oversees the containers on its respective node. When a new application is submitted to the cluster, the ResourceManager allocates a container for the <strong>ApplicationMaster</strong>. The ApplicationMaster job is to negotiate resources with the ResourceManager and coordinates with the NodeManagers to execute and monitor tasks.</p><p>Once launched, the ApplicationMaster takes charge of the application lifecycle. Its first task is to send resource requests to the ResourceManager to acquire additional containers for running the application's tasks. Each resource request typically includes:</p><ul><li><p>The amount of resources needed, specified in terms of memory and CPU shares</p></li><li><p>Preferred container locations (like hostname, rack name)</p></li><li><p>Priority within the application</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yt2n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yt2n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yt2n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yt2n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yt2n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yt2n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg" width="1456" height="1017" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:588341,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://luminousmen.substack.com/i/167761990?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!yt2n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yt2n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yt2n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yt2n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F866e6873-b949-4fa8-be2b-2d98427493d5_2388x1668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I want to hightlight &#8212; the ApplicationMaster runs in a container just like other components of the application. If it crashes or becomes unavailable, the ResourceManager can allocate a new container and restart it, ensuring <a href="https://luminousmen.com/post/architecturally-significant-requirements/#high-availability-vs-fault-tolerance">high availability</a>. The ResourceManager maintains metadata about running applications and task progress in HDFS, so when the ApplicationMaster restarts, it has all the context it needs. This allows the system to recover and restart only incomplete tasks if the application fails.</p><p>In this architecture, the ResourceManager, NodeManagers, and containers are framework-agnostic &#8212; they don't need to understand the specifics of the application. Instead, the ApplicationMaster handles all framework-specific logic, enabling any distributed framework to run on YARN as long as it provides a suitable ApplicationMaster. We are focused on Spark, but other frameworks can run with YARN as well. Spark executors, Flink tasks, even legacy MapReduce tasks &#8212; all of them can run inside YARN containers. This design allows YARN to support a diverse workloads efficiently and flexibly.</p><h2><strong>Submitting an Application to YARN</strong></h2><p>When you <code>spark-submit --master yarn</code>, three daemons and a handful of containers jump into action. Here&#8217;s the workflow:</p><ol><li><p><strong>Client</strong> (your laptop or CI agent) sends a tiny &#8220;launch request&#8221; to the cluster&#8217;s ResourceManager. The payload says, essentially, <em>&#8220;Please start Spark&#8217;s ApplicationMaster &#8212; here&#8217;s the JAR, here&#8217;s the memory budget&#8221;.</em></p></li><li><p><strong>ResourceManager</strong> scans its bookkeeping tables, picks a free host, and asks that host&#8217;s <strong>NodeManager</strong> to carve out one fresh container for the <strong>ApplicationMaster.</strong></p></li><li><p>Inside that container, Spark&#8217;s <strong>ApplicationMaster</strong> starts the JVM, loads your job&#8217;s classpath, and registers itself back with the <strong>ResourceManager</strong>. From this point on, the <strong>ApplicationMaster</strong> is the single source of truth for everything your job will do.</p></li><li><p>Next, the <strong>ApplicationMaster</strong> requests executor containers from the <strong>ResourceManager</strong>: &#8220;Give me 50 containers, each 4 vCPU/8 GB, preferably close to data on rack A or B&#8221;. The request can contain multiple sizes and priorities if your job mixes lightweight and heavyweight tasks which is kinda cool.</p></li><li><p>The <strong>ResourceManager</strong> grants containers as capacity frees up. Each chosen <strong>NodeManager</strong> spins up a new process under cgroups/namespaces, sets environment variables, and hands control to Spark&#8217;s executor script.</p></li><li><p>Every executor reports back to the <strong>ApplicationMaster</strong> (which is also the Spark driver) to say, &#8220;I&#8217;m alive; send work&#8221;. The <strong>ApplicationMaster</strong> runs tasks &#8212; <code>map</code>, <code>shuffle</code>, <code>reduce</code>, whatever &#8212; while the NodeManagerMs monitor CPU, RAM, and exit codes.</p></li><li><p>Meanwhile, your spark-submit process (or a web UI) polls the <strong>ApplicationMaster</strong> for progress: completed stages, failed tasks, shuffle size, you name it. The <strong>ApplicationMaster</strong> keeps the <strong>ResourceManager</strong> updated with heartbeat traffic so the job stays visible in cluster dashboards.</p></li><li><p>When the application&#8217;s last task finishes, the <strong>ApplicationMaster</strong> tells the <strong>ResourceManager</strong>, &#8220;All done&#8221;. It writes final counters to HDFS, de-registers, and exits. The <strong>ResourceManager</strong> then marks remaining containers for garbage collection, and <strong>NodeManagers</strong> delete logs once retention rules say it&#8217;s safe to do so.</p></li></ol><p>In roughly a dozen round-trips and a few seconds of coordination, your Spark job goes from a CLI command to a fully managed, distributed application &#8212; ready to process terabytes until the last partition is joined, aggregated, or filtered away all with the help of YARN. Sweet.</p><h2><strong>Deep Dive into Kubernetes</strong></h2><p>A decade ago YARN was the obvious companion for Spark &#8212; both lived inside the Hadoop ecosystem, both assumed Java everywhere, and both were happiest when HDFS was the only distributed storage. Since then, the center of gravity has shifted. Modern platforms package everything &#8212; front-end apps, REST gateways, model servers &#8212; into containers and schedule them with <strong>Kubernetes</strong>.</p>
      <p>
          <a href="https://luminousmen.substack.com/p/cluster-managers-for-apache-spark">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>