<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Neel Somani's Blog]]></title><description><![CDATA[Former Citadel quant Neel Somani publishes research & essays ranging from machine learning to longevity. Neel previously founded Eclipse (raised $65M) and has incubated several blockchain infrastructure projects.]]></description><link>https://www.neelsomaniblog.com</link><image><url>https://substackcdn.com/image/fetch/$s_!hQpT!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe6002d-878f-494b-abd0-78f82f7d3d87_412x412.png</url><title>Neel Somani&apos;s Blog</title><link>https://www.neelsomaniblog.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 04 Apr 2026 06:15:49 GMT</lastBuildDate><atom:link href="https://www.neelsomaniblog.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Neel Somani]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[njs@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[njs@substack.com]]></itunes:email><itunes:name><![CDATA[Neel Somani]]></itunes:name></itunes:owner><itunes:author><![CDATA[Neel Somani]]></itunes:author><googleplay:owner><![CDATA[njs@substack.com]]></googleplay:owner><googleplay:email><![CDATA[njs@substack.com]]></googleplay:email><googleplay:author><![CDATA[Neel Somani]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Autoformalization and the Future of Math Research]]></title><description><![CDATA[Formal methods reveal which informal concepts we rely on most.]]></description><link>https://www.neelsomaniblog.com/p/autoformalization-and-the-future</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/autoformalization-and-the-future</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Fri, 30 Jan 2026 04:50:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!l2e0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last week, I gathered a group of bright undergraduate students to construct <a href="https://www.ocf.berkeley.edu/~neel/erdos.html">GPT-Erdos</a>. We ran a uniform procedure across the open Erd&#337;s problems, evaluating GPT-5.2 Pro, Deep Research, and when possible, Aristotle by Harmonic. This produced 3 accepted solutions, 3 partial results, and 4 previously undocumented rediscoveries, all <a href="https://github.com/neelsomani/gpt-erdos">open-sourced</a>.</p><p>What I found is that the value of autoformalization goes beyond the raw tech. (By &#8220;autoformalization,&#8221; I mean the tooling that takes a human-readable math proof and converts it into a machine-checkable format like Lean or Coq.) By formalizing our work, we expose underspecified concepts that implicitly guide research, like novelty, progress, and correctness.</p><h2>The Most Common Failure is Underspecification</h2><p>Early experiments made it clear that failure cases can be nuanced. For example, GPT 5.2 Pro <a href="https://www.erdosproblems.com/397">produced an accepted solution</a> for problem #397, but shortly thereafter Terence Tao used <a href="https://chatgpt.com/share/69632932-7308-800e-81de-fa5ea2432d62">Deep Research</a> to find a partial result that was closely related. The solution took a different path than GPT 5.2 Pro, but it could likely be extended to solve the same problem. Depending on who&#8217;s the judge, this could be described as a novel result, a rediscovery, or an extension of existing literature.</p><p>So I became interested in what the failure modes were when GPT 5.2 Pro fails to give a novel result, or when it might succeed. The methodology was simply pasting the exact LaTeX problem statement into GPT 5.2 Pro, and asking Deep Research to surface any previous solutions, resembling the process that #397 followed. No human intervention was allowed during generation. Each response was independently reviewed and classified according to predefined categories. Our open-source <a href="https://github.com/neelsomani/gpt-erdos/blob/main/README.md">results</a> are consistent with what others have found:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l2e0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l2e0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!l2e0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!l2e0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!l2e0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l2e0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png" width="1200" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Sankey Diagram of GPT-Erdos Results&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sankey Diagram of GPT-Erdos Results" title="Sankey Diagram of GPT-Erdos Results" srcset="https://substackcdn.com/image/fetch/$s_!l2e0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!l2e0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!l2e0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!l2e0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91404284-112b-4f28-9d6e-08ac9555a135_1200x1200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sankey Diagram of ChatGPT Responses to Erdos Problems. (Classification requires human judgment. Boundaries are debatable, and anyone is welcome to re-label the dataset.)</figcaption></figure></div><p>What stands out is that if you&#8217;re only looking at the non-trivial attempts beyond pure literature recitation, underspecification causes at least as many failures as outright errors.</p><p>Relatedly: Finding existing literature, expanding on it, and producing structurally new proofs are useful results, but I found that psychologically, people want there to be a clear &#8220;novel&#8221; result attributed to an LLM, with no historical literature or partial results. Disagreements over novelty aren&#8217;t purely epistemic. Novelty functions as a proxy for intellectual contribution, and it&#8217;s also a measure of which proving system is the most advanced. That creates pressure to draw clean delineations around what&#8217;s &#8220;actually novel.&#8221;</p><h2>Informal Goals Guiding Formal Methods</h2><p>Top mathematicians disagree on whether a result is novel. For example, in <a href="https://www.erdosproblems.com/forum/thread/652">problem #652</a>, Tao classifies the GPT 5.2 Pro response as novel, but the result heavily relies on Mathialagan&#8217;s bipartite theorem. One solution (#281) was interesting enough that Nat Sothanaphan wrote a great <a href="https://drive.google.com/file/d/1uejEN0DBX_BjUePNOddCW8w8k82n1B5a/view?usp=sharing">write-up</a> on it and Tao wrote about the method on his <a href="https://terrytao.wordpress.com/2026/01/19/rogers-theorem-on-sieving/">blog</a>.</p><p>Of course, in math almost everything in some way builds upon previous results. The question is whether there was a non-trivial insight, perspective, or approach applied to a problem. One extreme interpretation is that nothing is novel or that everything is. Neither reconciles with our intuitions around novelty.</p><p>Maybe the correct definition is something closer to the minimum complexity of expressing a proof, while being allowed to reference existing results with constant cost. In other words, if a proof is an existing theorem with new parameters, that&#8217;s pretty simple to express, but if a proof requires several new non-trivial theorems and there&#8217;s no way around expressing it that way, that&#8217;s more complex and probably novel.</p><p>Or another possible formalism might take inspiration from zero-knowledge research, where we show it&#8217;s possible to be convinced of the truth of a statement while not being able to reconstruct the underlying witness given polynomial time computation. In an analogous sense, we can define mathematical &#8220;knowledge&#8221; as the ability to reconstruct a proof using existing results in polynomial time. So retrieving an existing theorem and substituting new parameters wouldn&#8217;t count.</p><p>A formalism like that helps if we give the LLM the problem that we want to solve. But at some point the LLM needs to have some sense for which problems are the most &#8220;interesting&#8221; to solve, too. And that seems harder to formalize. Interestingness is sometimes a proxy for utility (does this help us solve other problems in math/physics?) but sometimes the utility is unclear.</p><p>The critique obviously extends beyond just math. The LLM has no way of knowing what business ideas are most interesting or novel, or which art pieces are the most meaningful. None of that is a part of the training process, and the post-training process is so heuristic that I wouldn&#8217;t bet all the properties we care about are emergent. So maybe these are important concepts to formalize. Applying formal methods has a way of revealing which informal concepts we rely on most.</p><h2>Where&#8217;s the Space Heading?</h2><p>I can&#8217;t find the tweet, but somewhere Daniel Litt says something along the lines of: &#8220;Even if LLM progress completely stalled, the existing technology will substantially impact the practice of mathematics.&#8221; I think that&#8217;s very true. Mathematicians spend their time verifying proofs by hand that can be checked by Aristotle; people spend time on &#8220;open&#8221; problems that are sometimes already solved in the literature; quickly getting up-to-speed on existing approaches to a problem can save significant time. Of course, I think the models are going to get way better at math. Though we might need new techniques.</p><p>Regardless, people keep asking me where this is going. What good is proving? Sometimes you hear answers like quant finance, but we don&#8217;t really formally prove things in quant finance. But we definitely develop models that you&#8217;d want to be provably correct.</p><p>For example, can we say with certainty that a particular C/C++ CUDA kernel has no memory safety violations? Can we say that a program handles all exceptions gracefully? Are there outputs that a statistical model provably cannot internally represent?</p><p>These are all well-studied questions prior to autoformalization, but no one wants to use a super sophisticated type system, so I see autoformalization as a way of applying formal methods at scale. In that same vein, we have so much slop code being produced by LLMs that no one&#8217;s checking that carefully. Provable guarantees become a lot more valuable when there&#8217;s no close supervision.</p><p>The other thing that I hope emerges from autoformalization research is some concept of &#8220;closeness&#8221; to completion. No such metric is widely accepted to my knowledge. I think Terence Tao has some old essay or video where he points out that experienced mathematicians make errors too, but the errors tend to &#8220;cancel out&#8221; since the intuition is sound. Conversely, a single error can throw a junior mathematician completely off course. In short, there&#8217;s a difference between a proof that&#8217;s trivially reparable vs. one that&#8217;s fatally flawed, but final formal verification is binary: either the proof verifies or it doesn&#8217;t. In an ideal world, that closeness function would be a differentiable surrogate so we can optimize it directly.</p><p>The concept of closeness to completion matters in a bunch of domains. For one, it serves as a search oracle when you&#8217;re trying to find the right solution for something. But second there&#8217;s a ton of domains where the binary &#8220;proves vs. doesn&#8217;t prove&#8221; would fail. The Einstein field equations were famously discovered via a variety of heuristics and metaphors, and they were only cleanly formalized by later physicists. In general, the process for discovering truth is heuristic, not formal. Coincidentally, that resembles the state-of-the-art in autoformalization today; the gold standard is GPT 5.2 Pro for finding the proof, then Aristotle by Harmonic for verifying it. Almost all of the AI-solved Erdos problems were derived this way to my knowledge.</p><p>There are lots of other cool domains to apply autoformalization and formal methods to, so I&#8217;m curious to hear other people&#8217;s ideas!</p>]]></content:encoded></item><item><title><![CDATA[The Endgame for Mechanistic Interpretability]]></title><description><![CDATA[The endgame for mechanistic interpretability is formal methods.]]></description><link>https://www.neelsomaniblog.com/p/the-endgame-for-mechanistic-interpretability</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/the-endgame-for-mechanistic-interpretability</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Mon, 05 Jan 2026 23:53:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/278caba2-f5bf-462c-9f98-02c207a1c21a_4000x3003.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Mechanistic interpretability is currently pulled between competing visions. On one side, Neel Nanda argues for <a href="https://www.alignmentforum.org/posts/StENzDcD3kpfGJssR/a-pragmatic-vision-for-interpretability">pragmatic interpretability</a>: grounding work in real models and judging progress by empirical feedback, even when the underlying understanding is partial. On the other, Leo Gao defends <a href="https://www.alignmentforum.org/posts/Hy6PX43HGgmfiTaKu/an-ambitious-vision-for-interpretability">ambitious interpretability</a>: a long-term bet on building circuits that are necessary and sufficient for behavior, on the view that deeper mechanistic understanding is what will generalize across model changes. This disagreement is treated as a question of methods, but the deeper divide lies elsewhere.</p><p>The disagreement persists because mechanistic interpretability lacks an agreed-upon end goal. What, exactly, would count as success?</p><h2>The Telos of Mechanistic Interpretability</h2><p>Today, feature labeling, circuit discovery, probing, activation patching, and causal interventions coexist as productive methods, yet they are only loosely coordinated. The field lacks a shared ideal for what interpretability methods are ultimately meant to deliver.</p><p>One possible telos is legibility: producing explanations that are intelligible to humans. On this view, interpretability succeeds when it tells a coherent story about why a model behaves as it does. But explanations that fail under counterfactual intervention may sound plausible, yet provide no reliable handle for control. Even advocates of curiosity-driven interpretability typically hope their findings will eventually support some downstream use, and legibility alone does not guarantee this.</p><p>A second telos is scientific understanding. Intervention is used to test hypotheses, identify causal structure, and build general explanations. Mechanistic interpretability often operates successfully under this ideal. But LLMs are not natural objects. They are engineered software artifacts. Focusing on understanding alone leaves a powerful affordance unused. Interventions need not merely reveal structure, they can permanently modify the system while preserving formally specified properties. A purely scientific telos does not demand patchability, certification, or correctness under modification.</p><p>A third telos is capability enhancement. From this perspective, interpretability is valuable only insofar as it accelerates optimization. The natural equilibrium of this ideal favors systems that are maximally effective and maximally opaque.</p><p>These ideals are not mutually exclusive, but no single one provides a stable foundation for the field. From this perspective, mechanistic interpretability ought to orient itself toward debuggability: the ability to localize failures to specific mechanisms, intervene on those mechanisms predictably, and certify that the intervention preserves desired behavior on bounded domains. This telos subsumes legibility and scientific understanding, while resisting the drift toward opacity implicit in a purely capability-driven view.</p><p>In what follows, I make this notion precise.</p><h2>Desired Goals for LLM Debuggability</h2><p>In an idealized setting, debugging an LLM would proceed from localization, to intervention, to certification. Each stage places strictly stronger demands on our mechanistic understanding, and each rules out large classes of explanations.</p><h3>Localization: Identifying the Responsible Mechanism</h3><p>The first requirement of debuggability is localization: the ability to identify which internal mechanisms are responsible for a given behavior, and to distinguish mechanisms that generalize from those that merely correlate with it.</p><p>In the strongest form, a debuggable localization supports counterexample search. For a bounded LLM input domain D, this means being able to determine whether the behavior can occur without the mechanism being active, or whether the mechanism can be active without producing the behavior (and, when possible, to surface concrete inputs that witness such cases).</p><h3>Intervention: Surgical, Mechanism-Level Debugging</h3><p>Localization is only meaningful if it admits intervention. Once a failure is traced to a mechanism, we&#8217;d like to modify that mechanism in a way that is predictable and targeted:</p><ol><li><p>The responsible head, MLP, or subspace can be modified or constrained.</p></li><li><p>The intervention removes the undesired behavior on a specified domain.</p></li><li><p>The intervention does not induce collateral damage elsewhere in that domain.</p></li></ol><h3>Certification: Domain-Bounded Safety Guarantees</h3><p>The final goal of debuggability is certification: the ability to make exhaustive, falsifiable claims about model behavior on bounded domains.</p><p>For a formally specified domain D, this can mean proving that no harmful token is produced for any input in D, a claim that is in principle achievable for sufficiently constrained settings. Certification may also take the form of subcircuit-level bounds, for example structural invariants that rule out entire classes of behavior by construction, such as proving that a circuit cannot bypass a guard layer unless a specific feature is active.</p><h2>What a Debuggable Explanation Can (and Can&#8217;t) Promise</h2><p>First, some anti-goals:</p><ol><li><p>Debuggability does not imply that a trained Transformer can be cleanly de-compiled into a single symbolic program. Transformer models are not limited to discrete algorithmic control flow. They exploit continuous geometry in high-dimensional embedding spaces, superposition, and distributed representations. Any realistic abstraction must preserve this expressive freedom rather than erase it.</p></li><li><p>Debuggability does not entail identifying a single, privileged &#8220;cause&#8221; of an output or phenomenon. Model behavior is mediated by deep, branching causal pathways with redundancy, overlap, and compensatory mechanisms. A debuggable explanation need not be unique. What matters is that the identified mechanisms are enough to explain and control the behavior within scope, and that alternative bypasses would be surfaced by counterexample search rather than hidden by storytelling.</p></li><li><p>Debuggability does not aim at global safety proofs of the form &#8220;this model will never produce harmful output.&#8221; Even if &#8220;harmful&#8221; were formally defined, the unconstrained input space of frontier LLMs is beyond the reach of any existing or foreseeable verification technique. Any agenda that predicates success on global guarantees is doomed to either vacuity or false confidence.</p></li></ol><p>Instead, debuggability is about constructing a family of verified, compositional abstractions that faithfully reproduce model behavior on bounded domains and support predictable intervention. These abstractions are partial, local, and plural, but they are exact where they apply.</p><p>The relevant analogy is debugging a large, safety-critical software system. One cannot prove &#8220;Chrome will never crash.&#8221; But one can prove that specific routines are memory-safe, that sandboxing prevents certain classes of process escape, that critical invariants are preserved across refactors, and that a given patch eliminates a vulnerability without introducing regressions.</p><p>The same logic applies to LLMs. Meaningful debuggability consists in guarantees such as:</p><ul><li><p>This subcircuit cannot activate a forbidden feature on domain D.</p></li><li><p>This intervention removes a failure mode while preserving all other behaviors in scope.</p></li><li><p>This pathway is structurally incapable of bypassing a guard unless a specific internal condition is met.</p></li></ul><h2>The Necessity of Formal Methods</h2><p>Note that the above are universal claims over bounded domains. They are not distributional or probabilistic, and they do not rest on sampling. This is why a debuggability-oriented interpretability agenda is necessarily coupled to formal methods. SMT solvers, abstract interpretation, and neural verification frameworks are not optional add-ons. They are the only frameworks in which claims of impossibility, preservation, or closure under intervention can be made precise.</p><p>This vision does not require that today&#8217;s frontier LLMs be fully verifiable end-to-end. What matters is that the debuggability of Transformer models has precedent:</p><ul><li><p>Sparse circuit extraction shows that models contain relatively isolated, algorithmic subcircuits that remain stable under targeted intervention.</p></li><li><p>Symbolic Circuit Distillation is an early example of automated extraction, where neural mechanisms can be proven formally equivalent to symbolic programs.</p></li><li><p>Neural verification work (e.g. Reluplex and Marabou) establishes that exhaustive reasoning is possible once models are reduced to verification-friendly components on bounded domains.</p></li><li><p>Alternative attention mechanisms suggest that standard attention (the dominant barrier to SMT verification in Transformers) is an architectural choice rather than a theoretical necessity, opening the door to verification-aware model design with comparable performance.</p></li></ul><p>Taken together, these results shift the problem from conceptual impossibility to engineering integration and scale. Debuggability is about being able to say, with confidence: &#8220;This mechanism, on this domain, behaves this way, and if it didn&#8217;t, we would know.&#8221;</p><h2>What De-compiling Actually Looks Like</h2><p>Here&#8217;s what a possible &#8220;de-compilation&#8221; pipeline might look like:</p><h3>1. Identify stable linear regions (local programs)</h3><p>The smallest unit of analysis is a particular mechanism that has a stable branch structure on a bounded domain. Many verification-friendly components (affine maps, threshold gates, max/Top-K selection) behave like ordinary programs. Once you know which branch you&#8217;re in (e.g. which segment of a piecewise function is active, which items win a Top-K) the remaining computation is just affine arithmetic.</p><p>A &#8220;local program&#8221; is a region of inputs defined by explicit guard conditions (linear inequalities) together with the affine map executed under those guards. The stability part matters, because you want margins on the guards (e.g. thresholds not near zero, Top-K winners separated from runners-up) so that small perturbations or permissible interventions don&#8217;t flip the branch decisions and invalidate the explanation.</p><p>Here&#8217;s what a concrete example might look like:</p><blockquote><p>Head 31.2: Helps break text into paragraphs</p><p>Empirically verified: attention weight exceeds &#949; when a token from set {&#8216;\n&#8217;, &#8216;\n\n&#8217;} or a discourse marker token occurs at position t&#8722;1.</p></blockquote><h3>2. Factor into meaningful subspaces</h3><p>Factoring into meaningful subspaces is the step where you decompose a mechanism&#8217;s activations into low-dimensional directions that have stable semantics across inputs and contexts, such as syntactic markers, sentiment, or safety-relevant features. A single local program may operate over several subspaces, and the same subspace may participate in many different local programs.</p><p>Without subspaces, interventions are blunt (ablating whole heads or MLPs). With them, interventions can be surgical (editing or bounding specific directions while leaving others untouched).</p><p>Ideally, these subspaces exhibit &#8220;functional coherence,&#8221; where moving along the subspace produces predictable, monotonic changes in model behavior on a bounded domain.</p><h3>3. Extract formally verifiable causal circuits</h3><p>In this step, we compose local programs and subspaces into a single object that supports global, counterfactual claims about behavior on a bounded domain. Formally, this means specifying an interface, a domain, and a set of admissible interventions, and then proving that the neural subcircuit is equivalent to (or soundly approximated by) a symbolic specification on that domain.</p><p>My project <a href="https://github.com/neelsomani/symbolic-circuit-distillation">Symbolic Circuit Distillation</a> builds in this direction by providing formally verified functional abstractions on bounded domains. Achieving this level of debuggability places strong pressure to redesign core Transformer components that are poorly suited to formal verification.</p><p>Multiple circuits may implement overlapping or reconstructed features elsewhere in the model. What matters is that, within scope, the abstraction is correct and closed under counterfactuals. If a bypass exists, formal search will find a concrete counterexample. This is the point where mechanistic interpretability becomes robust to patching and refactoring, and where safety-relevant guarantees become possible. Circuits stop being explanatory stories and become objects you can edit, reason about, and certify.</p><p>How these verified control abstractions are surfaced to human operators (whether through explicit query languages, automated tooling, or learned interfaces) is an important but orthogonal problem, and not required for the core claim of debuggability.</p><h2>Interpretability as Control</h2><p>The central question is whether mechanistic insight can support reliable, bounded, and verifiable control over systems that matter.</p><p>The result is a patchwork of verified abstractions: local programs, meaningful subspaces, and formally specified circuits. Many such decompositions may exist. Verification removes arbitrariness not by enforcing uniqueness, but by enforcing sufficiency.</p><p>I am curious to hear thoughts from other researchers. You can reach me on X: <a href="https://x.com/neelsomani">@neelsomani</a></p><p><em>Thanks to <a href="https://x.com/mmaaz_98?s=21">Maaz</a> for giving feedback prior to posting.</em></p>]]></content:encoded></item><item><title><![CDATA[Intro to Routing: Mixture-of-Experts and Expert Choice]]></title><description><![CDATA[I derive MoE and EC from first-principles.]]></description><link>https://www.neelsomaniblog.com/p/intro-to-routing-mixture-of-experts</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/intro-to-routing-mixture-of-experts</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Fri, 14 Nov 2025 21:19:02 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/53e47187-8153-43b4-b683-3f673d2a97be_5359x3976.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this post, I&#8217;ll cover routing mechanisms for large language models. People often talk about Mixture-of-Experts and Expert Choice, so my goal is to give a first-principles walkthrough that explains how these methods arise naturally. You can think of this as how I would have derived them myself, or as an ex post facto explanation that clarifies the logic behind their design.</p><h2>Mixture-of-Experts (MoE)</h2><h3>Historical Roots of MoE</h3><p>The engineering motivation behind MoE is straightforward. The goal is to take N expert functions f<sub>i</sub> and compute a weighted average of their outputs based on how confident the model is in each expert.</p><p>The basic idea is simple. First, compute logits for each expert:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;z_i = (W_g x + b)_i&quot;,&quot;id&quot;:&quot;KJVHTFSAFX&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then convert these logits into a probability distribution:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;g_i(x) = \\text{softmax}(z(x))_i&quot;,&quot;id&quot;:&quot;TUYKGLZBTT&quot;}" data-component-name="LatexBlockToDOM"></div><p>Finally, compute the convex combination of expert outputs:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y = \\sum_i g_i(x) * f_i(x)&quot;,&quot;id&quot;:&quot;NJTDQOJCMA&quot;}" data-component-name="LatexBlockToDOM"></div><p>Training proceeds exactly as it does for a standard feedforward layer. This formulation matches the original work of Jordan and Jacobs (1991).</p><p>The drawback is that this approach requires evaluating every expert for every token, including experts with very low probability. This becomes expensive as N grows.</p><h3>Top-1 Gating</h3><p>Let&#8217;s say we want to perform &#8220;Top-1 gating,&#8221; where we select only the highest scoring expert. Specifically, we want to compute all g<sub>i</sub>(x), pick the top expert s, run only that expert, and set:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y = g_s(x) * f_s(x)&quot;,&quot;id&quot;:&quot;MGPIKWVRPQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>This has a property you might find unexpected. Even if all f<sub>i</sub> point in roughly the same direction, the magnitude of y is smaller than the raw output of f<sub>s</sub>(x), since f<sub>s</sub> is scaled by g<sub>s</sub>. You can argue that this is sometimes desirable, because lower confidence often corresponds to smaller updates in many other ML architectures. But this is a post hoc justification, and the real reason is just that attempts to renormalize y tend to perform worse in practice.</p><p>Backpropagation follows from the product rule. For the expert parameters &#952;<sub>f</sub>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{dL}{d\\theta_{f_s}} = \\frac{dL}{dy} g_s(x) \\frac{df_s(x)}{d\\theta_{f_s}}&quot;,&quot;id&quot;:&quot;LZTCGMXCLP&quot;}" data-component-name="LatexBlockToDOM"></div><p>By symmetry, the gradient for the router parameters &#952;<sub>g</sub> is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{dL}{d\\theta_{g_s}} = \\frac{dL}{dy} f_s(x) \\frac{dg_s(x)}{d\\theta_{g_s}}&quot;,&quot;id&quot;:&quot;RWPHLJBGTJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>For i != s, we have dL/d&#952;<sub>f_i</sub> = 0, since those experts do not run. But dL/d&#952;<sub>g_i</sub> is not zero, because the softmax couples all logits z<sub>i</sub>, so each g<sub>i</sub> influences the scaling of f<sub>s</sub>.</p><p>The gradients are undefined at the exact boundaries where the identity of the top expert changes. But that&#8217;s typically fine, just like it&#8217;s not an issue for ReLU or other piecewise differentiable functions.</p><p>The major problem is that unused experts never improve. Training collapses to a solution where g<sub>s</sub> ~= 1 for whichever expert happened to win early in training. In an ideal setting, we would want all experts f<sub>j</sub> for j != s to receive at least some tokens so they continue to learn.</p><p>So we define the proportion of tokens routed to expert i in a batch:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p_i = \\frac{1}{|B|} \\sum_{t \\in B} \\mathbf{1}[s_t = i]&quot;,&quot;id&quot;:&quot;FMHAVQZJQT&quot;}" data-component-name="LatexBlockToDOM"></div><p>You might attempt to regularize this by optimizing something like:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L&#8217; = L + \\lambda * \\mathrm{KL}(p || U)&quot;,&quot;id&quot;:&quot;GFTGOLHVEO&quot;}" data-component-name="LatexBlockToDOM"></div><p>where U is a uniform distribution. This would flatten the token allocation, and in principle could prevent collapse.</p><p>But since p<sub>i</sub> above depends on an argmax (through s<sub>t</sub>), the gradient of p<sub>i</sub> is zero almost everywhere. The model receives no useful signal from this penalty.</p><p>As a result, we need some other differentiable penalty that discourages any single expert from dominating the routing. The goal is to reduce the confidence g<sub>i</sub> for experts that receive too many tokens and increase it for experts that receive too few.</p><h3>Flattening the Argmax Distribution</h3><p>Since p<sub>i</sub> isn&#8217;t useful for differentiation, we try using the Gumbel max trick, which provides a way to make the argmax behave like a soft, differentiable sampling process. As a general rule, if z<sub>j</sub> are logits and &#949;<sub>j</sub> ~ Gumbel(0, 1), then:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\arg\\max_j { z_j + \\varepsilon_j } \\sim \\text{Categorical}(\\text{softmax}(z))&quot;,&quot;id&quot;:&quot;ZZXJCHOKSX&quot;}" data-component-name="LatexBlockToDOM"></div><p>This distribution gives us Pr[s<sub>t</sub> = i] for each expert i. (In principle, we could even use the noisy sampling in the forward pass.) More importantly, this lets us compute an expected load for each expert and attempt to flatten it.</p><p>In our case, let z<sub>j</sub> be the logit that produces the gating probability g<sub>j</sub>(x<sub>t</sub>) = softmax(z(x<sub>t</sub>))<sub>j</sub>. Then we assume:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;s_t = \\arg\\max_j { z_j(x_t) + \\varepsilon_j }&quot;,&quot;id&quot;:&quot;GIHATVRCYR&quot;}" data-component-name="LatexBlockToDOM"></div><p>This implies:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\Pr[s_t = i] = \\text{softmax}(z(x_t))_i&quot;,&quot;id&quot;:&quot;SSXAJAPHUJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>and therefore the expected load for expert i is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{E}[\\text{load}_i] = \\sum_{t \\in B} \\text{softmax}(z(x_t))_i = \\sum_{t \\in B} g_i(x_t)&quot;,&quot;id&quot;:&quot;AJPEGLEBYU&quot;}" data-component-name="LatexBlockToDOM"></div><p>With this expected load vector, we can now try to flatten it. Possible approaches include:</p><ul><li><p>KL(E[load] || U)</p></li><li><p>L2 distance to uniform</p></li><li><p>Entropy maximization</p></li></ul><p>A common alternative is to minimize the coefficient of variation (or a similar quantity), which flattens the distribution without the numerical instability of KL or entropy when some components get small:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{CV}(\\text{E}[\\text{load}]) = \\frac{\\text{std_dev}(\\text{E}[\\text{load}])}{\\text{mean}(\\text{E}[\\text{load}])}&quot;,&quot;id&quot;:&quot;UDPVBYXHGH&quot;}" data-component-name="LatexBlockToDOM"></div><p>We add an auxiliary term:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L_{\\text{aux}} = \\text{CV}(\\text{E}[\\text{load}])^2&quot;,&quot;id&quot;:&quot;ZBFPSVAJIO&quot;}" data-component-name="LatexBlockToDOM"></div><p>where CV is squared for optimization convenience. So the full loss becomes:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L&#8217; = L + \\lambda * L_{\\text{aux}}&quot;,&quot;id&quot;:&quot;HAEJNXNYIG&quot;}" data-component-name="LatexBlockToDOM"></div><h3>Practical Implementation Today</h3><p>In modern MoE implementations, a simpler surrogate auxiliary loss is used. It&#8217;s not statistically derived or theoretically clean, but it works well in practice:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L_{\\text{aux}} = \\sum_i \\frac{1}{|B|} \\left[ p_i(B) \\sum_{t \\in B} g_i(x_t) \\right]&quot;,&quot;id&quot;:&quot;CGPBVLMCQN&quot;}" data-component-name="LatexBlockToDOM"></div><p>In theory, you might reach this form by starting from the objective:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_i {p_i}^2&quot;,&quot;id&quot;:&quot;LMEWLGHMRZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Since the p<sub>i</sub> sum to 1, minimizing this objective encourages a uniform allocation via the method of Lagrange multipliers.</p><p>Then, assuming we really are sampling with Gumbel noise, then given a sufficiently large batch, the empirical load fraction can be approximated via the soft probabilities g<sub>i</sub>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p_i \\approx q_i = \\left( \\frac{1}{|B|} \\sum_{t \\in B} g_i(x_t) \\right)&quot;,&quot;id&quot;:&quot;TEKEINCQVZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>where q<sub>i</sub> is a differentiable surrogate for the observed load proportion. Using this approximation on one of the factors:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_i {p_i(B)}^2  \\approx \\sum_i p_i(B) * \\left( \\frac{1}{|B|} \\sum_{t \\in B} g_i(x_t) \\right)&quot;,&quot;id&quot;:&quot;RYQOPBHJZN&quot;}" data-component-name="LatexBlockToDOM"></div><p>which is the surrogate used above.</p><p>The effect is straightforward. If an expert receives too many tokens (that is, if p<sub>i</sub> is too large), the loss increases, which pushes the model to reduce g<sub>i</sub>.</p><p>The key detail is that the derivative of p<sub>i</sub>(B) is locally zero, since p<sub>i</sub> depends on an argmax that does not change in a small neighborhood. For this reason, implementations mark p<sub>i</sub>(B) as a &#8220;no-grad&#8221; quantity.</p><p>Compared to this heuristic approach, the formulation based on Gumbel noise and the coefficient of variation is mathematically cleaner. The stochastic forward pass aligns naturally with the probabilistic reasoning used in the backward pass, and the load penalties follow from that framework without ad hoc constructions. Despite this conceptual clarity, the surrogate loss above remains more widely used.</p><h3>Generalizing to Top-K</h3><p>If we want to use the Top-K experts rather than Top-1, first we need a constructive definition of the statistical process. As before, we compute g<sub>i</sub>(x<sub>t</sub>) for each expert i. But this time, we select the Top-K experts instead of a single one.</p><p>We&#8217;ll try to take the same approach as before. To compute the expected load for expert i, we need</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;q_i(x_t) = \\Pr[i \\in S]&quot;,&quot;id&quot;:&quot;LOEQQEDXBZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>where S is the set of the K selected experts. The expected load is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{E}[\\text{load}_i] = \\sum_t q_i(x_t)&quot;,&quot;id&quot;:&quot;ACHBSXRFUL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Just like in the Top-1 case, we can minimize CV(E[load<sub>i</sub>])<sup>2</sup>. In theory, E[load<sub>i</sub>] is differentiable because Top-K sampling follows a Plackett-Luce distribution. But in practice, the gradient is computationally intractable. So we end up using the same surrogate as in the previous section.</p><p>Finally, it&#8217;s worth noting that the original formulation in Shazeer et al. (2017) used a different approach. Instead of Gumbel noise, the authors added normal noise to the logits, and they renormalized the weights g<sub>i</sub> after selecting the Top-K elements. The methodology and derivation differ from the conceptual framework presented above.</p><h2>Expert Choice (EC)</h2><p>Expert Choice (Zhou et al. 2022) observes that the biggest pitfall of MoE is that an expert can get overloaded. If too many tokens want the same expert, that expert overloads and gets a huge share of the gradient. If too few tokens go to an expert, that expert&#8217;s weights collapse and it fails to train. That is why we needed that hacky regularization term earlier.</p><p>Imagine you&#8217;re Google serving inference at massive scale. You don&#8217;t care about routing every token perfectly. You care about keeping all experts busy, avoiding hot spots, and guaranteeing predictable latency.</p><p>Rather than computing g<sub>i</sub>(x<sub>t</sub>) for each token and selecting argmax<sub>i</sub> g<sub>i</sub> (letting the tokens pick the experts), you can let the <em>experts</em> pick which tokens they want to serve. Each expert receives a fixed budget of M tokens and selects the M tokens for which it believes it is most useful.</p><p>You still evaluate g<sub>i</sub>(x<sub>t</sub>) for all experts i and all tokens t in the batch. For each expert i, you select:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;B_i = \\text{TopM}_{t \\in B} (g_i(x_t))&quot;,&quot;id&quot;:&quot;CIEYPZVGZU&quot;}" data-component-name="LatexBlockToDOM"></div><p>Each expert receives exactly M tokens (or up to M if the batch is small). How this affects backpropagation depends on what you do when multiple experts select the same token. For simplicity, assume the output is the sum:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y = \\sum_{i : x_t \\in B_i} g_i(x_t)* f_i(x_t)&quot;,&quot;id&quot;:&quot;NNXEOHVESQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>(Note that real EC implementations use more complicated aggregations.) In this case, backpropagation for the expert parameters is exactly the same as in MoE. If an expert is not selected, it receives no gradient. If it is selected, then</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{dL}{d\\theta_{f_i}} = \\frac{dL}{dy} g_i(x_t) \\frac{\\partial f_i(x_t)}{\\partial \\theta_{f_i}}&quot;,&quot;id&quot;:&quot;ONKGSOWARE&quot;}" data-component-name="LatexBlockToDOM"></div><p>The gradient with respect to the router parameters is surprisingly simple:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{dL}{d\\theta_{g_i}} = \\sum_{x_t \\in B_i} \\frac{dL}{dy} f_i(x_t) \\frac{\\partial g_i(x_t)}{\\partial \\theta_{g_i}}&quot;,&quot;id&quot;:&quot;UEACOHWIOE&quot;}" data-component-name="LatexBlockToDOM"></div><p>Notice that we didn&#8217;t have to differentiate through the Top-M operator at all, since the gradients only flow through g<sub>i</sub> for the tokens actually selected by expert i.</p><p>There&#8217;s a glaring pitfall here. What if a token isn&#8217;t selected by any of the experts? In practice, implementations handle this by increasing M or by routing those stray tokens to the expert with the largest g<sub>i</sub>(x<sub>t</sub>).</p><h2>Future Directions</h2><p>I hope this post was informative. In the future, I plan to cover other routing mechanisms such as Mixture-of-Depths (MoD) or <a href="https://arxiv.org/pdf/2101.03961">Switch Transformers</a>, a paper by authors I respect a lot.</p><p>Conceptually, MoD pushes sparsity in a more radical direction. Instead of selecting which expert network should process a token, the router selects which layers of the transformer the token should flow through. The resulting interactions make MoD a significantly harder routing problem than MoE or Expert Choice.</p><p>When the research landscape matures further, I plan to revisit this topic with a dedicated analysis.</p><p><em>Note: I cover the content of this blog post in a <a href="https://www.youtube.com/watch?v=gnHNom6yokQ">YouTube explainer</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[A Minimal Route to Transformer Attention]]></title><description><![CDATA[Is attention inevitable?]]></description><link>https://www.neelsomaniblog.com/p/a-minimal-route-to-transformer-attention</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/a-minimal-route-to-transformer-attention</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Thu, 30 Oct 2025 00:25:32 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7820a78a-af15-4e9b-8577-7cbedfdd5fa4_600x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this post, I&#8217;ll show how a small set of reasonable assumptions can recover the Transformer attention mechanism. Some parts of attention are theoretically motivated, while others are arbitrary choices. I&#8217;ll explicitly call out which is which.</p><p>To see why attention exists, it helps to recall its predecessor: the recurrent neural network (RNN). Classic encoder-decoder RNNs process a sequence token by token. Each new hidden state incorporates the current token and the previous hidden state, producing a vector you can think of as an &#8220;accumulator&#8221; of everything seen so far. After ingesting the final token, that accumulated vector is repeatedly fed to the decoder, which predicts output tokens until it emits a STOP symbol.</p><p>The problem is long-range dependence. If an important token appeared far earlier in the sequence (say, the first of 10,000 tokens), its influence becomes diluted as the RNN processes additional tokens. The model simply forgets.</p><p>Ideally, the model should use all previously seen tokens to compute the information needed to predict the next token, weighting each earlier token by how relevant it is for that prediction. That suggests computing a relevance score between a position i and each other position j, and then combining some function of the embeddings accordingly.</p><p>Formally, define a scalar relevance function that takes the embeddings X along with indices i and j:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;u(i, j | X_1, &#8230;, X_n)&quot;,&quot;id&quot;:&quot;RTZWKQBXDB&quot;}" data-component-name="LatexBlockToDOM"></div><p>We work in embedding space rather than raw token IDs to avoid meaningless geometric assumptions (e.g., token 9 is not inherently &#8220;closer&#8221; to token 10). One-hot encodings would work, but are much more sparse.</p><p>Then the model&#8217;s output vector at position i can be written as:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_i = G(\\{x_j, u(i, j | X)\\}_j)&quot;,&quot;id&quot;:&quot;PZFHKXLNUK&quot;}" data-component-name="LatexBlockToDOM"></div><p>where G aggregates some function of the embeddings x<sub>j</sub> alongside their relevance to position i. (During autoregressive generation, i corresponds to the most recently produced token.) We don&#8217;t yet know the form of G or u. Our goal is to characterize the simplest constraints that lead directly to Transformer-style attention.</p><h2>Observation: Enforce permutation symmetry</h2><p>We want to constrain the space of possible functions for G and u.</p><p>Once we have the relevance scores u(i, j), the output y<sub>i</sub> should not depend on the order in which the pairs (x<sub>j</sub>, u(i, j)) are provided. In other words, if we reorder the elements indexed by j, the result should remain the same. This requires G to be &#8220;permutation-invariant&#8221; over the set {(x<sub>j</sub>, u(i, j))}<sub>j</sub>.</p><p>The Deep Sets theorem (Zaheer et al., 2017) tells us that any such function can be written as:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;G(\\{x_j, u(i, j)\\}_j) = &#961;\\left(\\sum_j &#966;(x_j, u(i, j))\\right)&quot;,&quot;id&quot;:&quot;KBWDGHMNQP&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here &#961; and &#966; are arbitrary differentiable functions. We fix the index i, since the invariance only applies over j. Differentiability ensures that the overall model can be trained with gradient-based methods.</p><p>At this point, &#961; and &#966; are still completely general, and we also need to define u. We will impose further assumptions to narrow down their form.</p><h2>Assumption 1: &#961; is the identity function</h2><p>&#961; could output many different types of objects. For example:</p><ul><li><p>It could output a scalar, but that would discard most of the information from the embeddings.</p></li><li><p>It could output an O(N)-dimensional vector, with one component per input element, but that would make the output scale with sequence length and defeat the purpose of summarizing information.</p></li><li><p>It could output a vector in some intermediate dimension, or even map into a different space/manifold entirely.</p></li></ul><p>All of these are technically possible. In practice, Transformers set &#961; to be the identity function, so:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;G(X) = \\sum_j &#966;(x_j, u(i, j))&quot;,&quot;id&quot;:&quot;RZLEKDPYLD&quot;}" data-component-name="LatexBlockToDOM"></div><p>This simplifies the structure of G and lets us focus on constraining &#966; and u.</p><h2>Assumption 2: Relevance-contribution proportionality</h2><p>Even with &#961; set to the identity, &#966; could be any function of the embedding x<sub>j</sub> and the relevance score u(i, j). To simplify the form, we assume that if a token&#8217;s relevance is scaled by a constant k, its contribution scales by the same factor:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#966;(x_j, k * u(i, j)) = k * &#966;(x_j, u(i, j))&quot;,&quot;id&quot;:&quot;POGEWWYKET&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is not the only possible relationship. For example, we could have chosen a quadratic or some other monotonic transformation in u(i, j). The key requirement is simply that &#966; should separate into:</p><ul><li><p>A scalar measuring how important x<sub>j</sub> is</p></li><li><p>A vector capturing what x<sub>j</sub> contributes</p></li></ul><p>Under the linear version of this assumption, we get:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#966;(x_j, u(i, j)) = u(i, j) * &#966;(x_j, 1)&quot;,&quot;id&quot;:&quot;OJDDWNNAGW&quot;}" data-component-name="LatexBlockToDOM"></div><p>Define v(x<sub>j</sub>) = &#966;(x<sub>j</sub>, 1), yielding:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#966;(x_j, u(i, j)) = u(i, j) * v(x_j)&quot;,&quot;id&quot;:&quot;KBEYTHXWYU&quot;}" data-component-name="LatexBlockToDOM"></div><p>This makes &#966; explicitly separable, where u(i, j) purely controls magnitude (relevance), and v(x<sub>j</sub>) determines the content being contributed.</p><h2>Assumption 3: Linear change of coordinates</h2><p>At this point, v(x<sub>j</sub>) could be any function of x<sub>j</sub>. To simplify the model and keep it efficient to compute, we assume v is a linear transformation of x<sub>j</sub>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;v(x_j) = W_V x_j&quot;,&quot;id&quot;:&quot;ENUTBTKFEQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Substituting this into the previous expression gives:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;G(X) = \\sum_j {u(i, j) * W_V x_j}&quot;,&quot;id&quot;:&quot;VBGILWDWMH&quot;}" data-component-name="LatexBlockToDOM"></div><p>This means each token contributes a linearly transformed version of its embedding, weighted by its relevance score u(i, j).</p><h2>Observation: Constrain u for efficient parallel computation</h2><p>We want u(i, j) to be computable efficiently on hardware like GPUs. Here, &#8220;efficient&#8221; refers to low sequential depth in the computational graph, not necessarily a low number of arithmetic operations. GPUs can execute many multiplications in parallel, but long chains of dependent operations create bottlenecks. For example, a recurrent computation with O(N) sequential steps is slow for long sequences, but a matrix multiply has O(1) sequential depth and is highly parallelizable.</p><p>If we allowed a fully general relevance function such as:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;u(i, j | X) = g_\\theta(x_i, x_j, \\text{context}(X))&quot;,&quot;id&quot;:&quot;WNVJNGRTLN&quot;}" data-component-name="LatexBlockToDOM"></div><p>where context(X) examines all tokens at once, we would need to evaluate this network O(N<sup>2</sup>) times for a single layer, which is too slow.</p><p>Alternatively, we could define a single model:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;g_&#952;: X &#8594; &#8477;^{n \\times n}&quot;,&quot;id&quot;:&quot;UFQMMJUQLC&quot;}" data-component-name="LatexBlockToDOM"></div><p>that outputs all pairwise relevance values directly. But that would require storing and training parameters of size O(N<sup>2</sup>), which locks the model to a fixed input length and scales poorly.</p><p>To keep computation parallelizable and scalable, we restrict u to be built from tensor operations such as:</p><ul><li><p>Linear projections</p></li><li><p>Element-wise functions</p></li><li><p>Inner products</p></li><li><p>Reductions like sums</p></li></ul><p>and avoid control flow or long sequential recurrences.</p><h2>Assumption 4: Dot product similarity for u</h2><p>A simple way to score the interaction between x<sub>i</sub> and x<sub>j</sub> is with a dot product. However, we don&#8217;t necessarily want similarity in the embedding space - we want similarity in a space optimized for relevance.</p><p>So, as we did for v(x<sub>j</sub>), we first apply learned linear projections:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\nq_i &amp;= W_Q x_i \\\\\nk_j &amp;= W_K x_j\n\\end{aligned}&quot;,&quot;id&quot;:&quot;NJCWYXQOAD&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then we define the relevance score as:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;u'(i, j | X) = &#10216;q_i, k_j&#10217;&quot;,&quot;id&quot;:&quot;SRYKSKIIZT&quot;}" data-component-name="LatexBlockToDOM"></div><p>We denote this version as u&#8217; because additional modifications will be applied later.</p><h2>Assumption 5: Pick a normalization for u</h2><p>Next, we want the relevance scores u&#8217;(i, j) to measure relative importance. If the same constant were added to all scores, or if they were scaled uniformly, the ranking of tokens should not change. This motivates applying a differentiable normalization function over j.</p><p>There are several possibilities (e.g., softmax, Gumbel-Softmax). In practice, Transformers use softmax.</p><p>One final issue: the dot product &#10216;q<sub>i</sub>, k<sub>j</sub>&#10217; tends to grow in magnitude with the key/query dimension d<sub>k</sub>. To prevent extremely large values from dominating the softmax, we scale the logits:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;u&#8217;(i, j) = \\frac{&#10216;q_i, k_j&#10217;}{\\sqrt{d_k}}&quot;,&quot;id&quot;:&quot;ZAHOXENOPA&quot;}" data-component-name="LatexBlockToDOM"></div><p>Applying softmax normalization over j then gives:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;G(x_i) = \n\\sum_j\n\\operatorname{softmax}_j\\!\\left(\n\\frac{\\langle W_Q x_i, W_K x_j \\rangle}{\\sqrt{d_k}}\n\\right)\n\\, W_V x_j&quot;,&quot;id&quot;:&quot;QXJXAAVCNM&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is exactly the scaled dot-product attention used in Transformers.</p><h2>Does a better attention mechanism exist?</h2><p>So there you have it. If we impose the following assumptions:</p><ol><li><p>&#961; is the identity function</p></li><li><p>Each token&#8217;s contribution scales proportionally with its relevance score</p></li><li><p>A linear transformation maps embeddings to the value, key, and query vectors</p></li><li><p>Relevance is based on a dot product</p></li><li><p>Relevance scores are normalized with a softmax</p></li></ol><p>we obtain the exact scaled dot-product attention used in Transformers.</p><p>While some of the choices were forced, they weren&#8217;t all theoretically required. There may be better options for &#961;, for the similarity measure, or for the normalization function. Even more fundamentally, the Deep Sets form at the beginning was from imposing permutation-invariance, but we end up reinjecting positional encodings in practice. Exploring these variations could reveal new attention mechanisms with different computational or modeling advantages.</p><p><em>Note: I cover the content of this blog post in a <a href="https://www.youtube.com/watch?v=_Q57Ff_NNw4">YouTube explainer</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Killing the GIL: How To Use Python 3.14's Free-Threading Upgrade]]></title><description><![CDATA[The global interpreter lock (GIL) has been interfering with true parallelism in Python. That ends with Python 3.14.]]></description><link>https://www.neelsomaniblog.com/p/killing-the-gil-how-to-use-python</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/killing-the-gil-how-to-use-python</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Tue, 14 Oct 2025 23:14:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kWj7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For almost three decades, Python&#8217;s Global Interpreter Lock (GIL) has been the single mechanism standing between your CPU cores and real parallelism.</p><p>That changes with Python 3.14.</p><p>The new free-threaded build removes the GIL, allowing multiple threads to execute Python bytecode simultaneously. No multiprocessing, no pickle, no hacks.</p><p>In this post, I&#8217;ll:</p><ol><li><p>Explain why the GIL existed and what it was protecting</p></li><li><p>Compare Python&#8217;s old concurrency models (threading, multiprocessing, asyncio)</p></li><li><p>Build Python 3.14 with the GIL disabled</p></li><li><p>Run a short multithreaded benchmark that finally scales with cores</p></li><li><p>Explain the results</p></li></ol><h2>Why the GIL existed in the first place</h2><p>The GIL is a global mutex that historically allowed only one thread to execute Python bytecode at a time. It was there to protect CPython, the C implementation of the interpreter.</p><p>Every Python object in CPython lives on the heap as a C struct with a reference count. Each assignment or function call increments and decrements those counters constantly. If two threads updated the same object&#8217;s reference count simultaneously, you could get memory corruption or premature frees that crash the interpreter.</p><p>Adding locks around every Python object would have been complex and slow, so early CPython took the simple route: wrap the entire interpreter in one global lock. That made single-threaded execution safe, but prevented true multithreading for CPU-bound workloads.</p><h2>How concurrency previously worked</h2><p>Before 3.14, Python offered three main concurrency models, each with trade-offs:</p><ul><li><p>threading (old): uses real OS threads, but only one can execute Python bytecode at a time because of the GIL. Good for I/O, useless for parallel CPU work.</p></li><li><p>multiprocessing: spawns multiple processes, each with its own interpreter and GIL. True parallelism, but expensive, requiring separate memory, pickling overhead, and slower process startup.</p></li><li><p>asyncio (green threads): runs everything cooperatively on one thread. Excellent for high-concurrency I/O, but it never uses more than one core.</p></li></ul><p>With Python 3.14&#8217;s free-threaded build, threading becomes the best of all worlds: true parallelism across cores, shared memory without serialization, and minimal overhead.</p><h2>Building Python 3.14 without the GIL</h2><p>Compile it yourself:</p><pre><code><code>git clone https://github.com/python/cpython
cd cpython
git checkout v3.14.0
./configure --prefix=$HOME/.py-314-ft --disable-gil
make -j &amp;&amp; make install
$HOME/.py-314-ft/bin/python3 -V</code></code></pre><p>Or with pyenv:</p><pre><code><code>pyenv uninstall -f 3.14.0 || true
PYTHON_CONFIGURE_OPTS=&#8221;--disable-gil&#8221; pyenv install 3.14.0
pyenv local 3.14.0</code></code></pre><p>Verify that you&#8217;re running a free-threaded build:</p><pre><code><code>python3 - &lt;&lt;&#8217;PY&#8217;
import sys
print(&#8221;Free-threaded build:&#8221;, not sys._is_gil_enabled())
PY</code></code></pre><p>You want: <code>Free-threaded build: True</code>.</p><h2>Running a realistic multithreaded benchmark</h2><p>Here&#8217;s a bit-optimized N-Queens solver. You can also download the <a href="https://github.com/neelsomani/python-freethreading">repo</a> on GitHub. It&#8217;s already efficient and CPU-bound, a good test of whether threads can finally scale.</p><pre><code><code># nqueens.py
import threading, time

def solve_row(n, cols=0, diags1=0, diags2=0, row=0):
    if row == n: return 1
    count = 0
    free = (~(cols | diags1 | diags2)) &amp; ((1 &lt;&lt; n) - 1)
    while free:
        bit = free &amp; -free
        free -= bit
        count += solve_row(
            n, cols|bit, (diags1|bit)&lt;&lt;1, (diags2|bit)&gt;&gt;1, row+1
        )
    return count

def solve_threaded(n, n_threads):
    first_row = [(1 &lt;&lt; c) for c in range(n)]
    chunks = [first_row[i::n_threads] for i in range(n_threads)]
    total = 0
    lock = threading.Lock()

    def work(chunk):
        nonlocal total
        local = 0
        for bit in chunk:
            local += solve_row(
                n, cols=bit, diags1=bit&lt;&lt;1, diags2=bit&gt;&gt;1, row=1
            )
        with lock:
            total += local

    threads = [threading.Thread(target=work, args=(c,)) for c in chunks]
    for t in threads: t.start()
    for t in threads: t.join()
    return total

if __name__ == &#8220;__main__&#8221;:
    for threads in (1, 2, 4, 8):
        t0 = time.perf_counter()
        solve_threaded(14, threads)
        dt = time.perf_counter() - t0
        print(f&#8221;threads={threads:&lt;2}  time={dt:.2f}s&#8221;)</code></code></pre><p>Run it once with standard CPython 3.14 (GIL on) and once with your free-threaded build. With the GIL, all runs take about the same time. With the free-threaded build, performance improves almost linearly with thread count.</p><h2>Results</h2><p>Example benchmark:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kWj7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kWj7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png 424w, https://substackcdn.com/image/fetch/$s_!kWj7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png 848w, https://substackcdn.com/image/fetch/$s_!kWj7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png 1272w, https://substackcdn.com/image/fetch/$s_!kWj7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kWj7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png" width="1456" height="403" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:403,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58709,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.neelsomaniblog.com/i/176187677?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kWj7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png 424w, https://substackcdn.com/image/fetch/$s_!kWj7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png 848w, https://substackcdn.com/image/fetch/$s_!kWj7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png 1272w, https://substackcdn.com/image/fetch/$s_!kWj7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1573cb29-49ff-49d2-a469-7cb512969a5c_1656x458.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">14-Queens on an M1 Pro, Python 3.14</figcaption></figure></div><p>That&#8217;s an ~8x speed-up without changing a single line of logic - just running under the free-threaded interpreter.</p><p>Caveats:</p><ul><li><p>C extensions: any binary package must be recompiled for free-threading, or it might quietly re-enable the GIL.</p></li><li><p>Thread safety: without the GIL, race conditions are real. Protect shared state with locks, queues, or immutable data.</p></li><li><p>Single-thread overhead: expect a 5-10% slowdown for purely single-threaded scripts due to atomic ops and internal locks.</p></li></ul><h2>Closing thoughts</h2><p>The GIL made CPython simple and safe to implement, but it locked Python to a single core.</p><p>With Python 3.14, that trade-off is gone. For the first time, standard Python threads can run in true parallel on modern CPUs.</p><p>So go ahead and kill the GIL, and let me know how it works for you.</p>]]></content:encoded></item><item><title><![CDATA[What You Didn't Learn in Berkeley CS 188 — Part 4]]></title><description><![CDATA[Is GRPO broken?]]></description><link>https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-242</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-242</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Sat, 11 Oct 2025 00:36:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-DOA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is the fourth and final piece in my series on reinforcement learning. Previously, we covered <a href="https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley">classical RL</a>, <a href="https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-b29">continuous control</a>, and <a href="https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-9b3">off-policy methods</a>. The topic of LLM post-training is discussed all over X, so this primer should help anyone get up to speed.</p><p>Here&#8217;s how I like to think about post-training methodologies:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-DOA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-DOA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-DOA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-DOA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-DOA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-DOA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg" width="1456" height="933" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:933,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:172997,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.neelsomaniblog.com/i/175844591?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-DOA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-DOA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-DOA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-DOA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ece17-5323-41aa-bea0-5fb9506a49a9_2498x1601.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">2x2 Quadrant of Post-Training Methods</figcaption></figure></div><p>SFT is simple. It&#8217;s just applying additional training iterations like the pre-training stage, but on a curated set of ideal (prompt, response) pairs. You might make this more efficient with a LoRA adapter.</p><p>In this post, we&#8217;ll focus on quadrant 2: DPO and offline GRPO. Along the way, I&#8217;ll point out how methods like online PPO and online GRPO fit in. Historically, online PPO came first, so understanding it helps explain DPO.</p><h2>Theory of Relative Scoring</h2><p>Before getting into the objective function of direct preference optimization (DPO), we need to motivate the idea of relative scoring.</p><p>We&#8217;re given lists of prompts x and pairwise responses a<sub>+</sub> and a<sub>-</sub>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;(x, a_{+}, a_{-})&quot;,&quot;id&quot;:&quot;ZZIOCQBZSL&quot;}" data-component-name="LatexBlockToDOM"></div><p>All we know is that a<sub>+</sub> is preferred to a<sub>-</sub>. A human may have rated them that way, or another signal might imply it (e.g. code that compiles &gt; code that fails).</p><p>That setup doesn&#8217;t immediately lend itself to the methods we&#8217;ve seen so far. On-policy methods don&#8217;t work because neither response may be likely under the current policy. Off-policy methods still don&#8217;t work because we lack a defined reward.</p><p>You might try to make the model more likely to output a<sub>+</sub> than a<sub>-</sub> by optimizing Pr[&#960;(a<sub>+</sub> | x) &gt; &#960;(a<sub>-</sub> | x)]. But that expression makes no sense. &#960;(a | x) are constants given the model. Unless we add tunable parameters (say, some &#947; where f<sub>&#947;</sub>(&#960;) produces a new model), those probabilities don&#8217;t change.</p><p>You could define f<sub>&#947;</sub>(&#960;)(x, a) and optimize Pr[f<sub>&#947;</sub>(&#960;)(x, a<sub>+</sub>) &gt; f<sub>&#947;</sub>(&#960;)(x, a<sub>-</sub>)], or build an even more general model f<sub>&#947;</sub>(&#960;, x, a<sub>+</sub>, a<sub>-</sub>) that directly outputs the likelihood that a<sub>+</sub> is better. But f is still abstract. It&#8217;s unclear how to parameterize it.</p><p>Instead, DPO (and the original online PPO post-training) take a simpler route by introducing a latent reward. The assumption is that if a human preferred a<sub>+</sub> to a<sub>-</sub>, then there exists some implicit reward function r such that</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;r(a_{+} | x) + &#949;_{+} > r(a_{-} | x) + &#949;_{-}&quot;,&quot;id&quot;:&quot;HXOCDHEXHZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>where &#949; represents human noise or ambiguity. If we can learn that reward function, we can optimize the model accordingly.</p><h2>Learning the Reward Function</h2><p>One approach is maximum likelihood estimation. We denote a<sub>+</sub> &#8827; a<sub>-</sub> if a<sub>+</sub> is preferred. We&#8217;d like a function g such that:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;g(r_\\phi(a_{+} | x), r_\\phi(a_{-} | x)) = Pr[a_{+} &#8827; a_{-}]&quot;,&quot;id&quot;:&quot;HHUMANBTYW&quot;}" data-component-name="LatexBlockToDOM"></div><p>and then optimize &#966; to maximize:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\ \\prod Pr[a_{+} &#8827; a_{-}]\\ &quot;,&quot;id&quot;:&quot;OSPVKVGWTF&quot;}" data-component-name="LatexBlockToDOM"></div><p>Let&#8217;s try to define g. Notice:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align*}\nPr[a_{+} &#8827; a_{-}] &amp;= Pr[r(a_{+} | x) + &#949;_{+} > r(a_{-} | x) + &#949;_{-}] \\\\\n&amp;= Pr[r(a_{+} | x) &#8722; r(a_{-} | x) > &#949;_{-} &#8722; &#949;_{+}]\n\\end{align*}&quot;,&quot;id&quot;:&quot;WCAPQQRNJF&quot;}" data-component-name="LatexBlockToDOM"></div><p>So preference depends only on the difference between rewards. That implies translational invariance: g(u, v) = g(u + c, v + c). That property implies that g must be f(r(a<sub>+</sub> | x) - r(a<sub>-</sub> | x)) for some function f, since g(u, v) = g(u &#8722; v, 0) = f(u &#8722; v), where the first equality follows by translational invariance.</p><p>Second, if r(a<sub>++</sub> | x) &gt; r(a<sub>+</sub> | x) &gt; r(a<sub>-</sub> | x), the higher-reward response should never be less preferred. In other words, f must be non-decreasing: f&#8217;(x) &gt;= 0</p><p>Finally, f(r(a<sub>+</sub> | x) - r(a<sub>-</sub> | x)) + f(r(a<sub>-</sub> | x) - r(a<sub>+</sub> | x)) = 1, which along with the previous condition, implies f(0) = &#189;, lim <sub>t&#8594;&#8734;</sub> f(t) = 1, and lim <sub>t&#8594;-&#8734;</sub> f(t) = 0.</p><p>Many functions f satisfy these conditions. The choice depends on what noise distribution you assume. In practice, DPO uses the logistic sigmoid, which assumes Gumbel noise:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\n&#963;(x) &amp;:= \\frac{1}{1 + e^{-x}} \\vphantom{\\frac{1}{1 + e^{-x}}} \\\\\nU(a) &amp;= r(a) + &#949;,\\quad &#949; &#8764; \\text{Gumbel}(0, 1) \\vphantom{\\frac{1}{1 + e^{-x}}} \\\\\n\\implies Pr[U(a_{+}) > U(a_{-})] &amp;= &#963;(r(a_{+}) &#8722; r(a_{-})) \\vphantom{\\frac{1}{1 + e^{-x}}}\n\\end{aligned}&quot;,&quot;id&quot;:&quot;ERJADFCGRK&quot;}" data-component-name="LatexBlockToDOM"></div><p>If noise were Gaussian, you&#8217;d recover the probit model instead.</p><p>The final objective to optimize r<sub>&#966;</sub> is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align*}\n\\max J(\\phi) &amp;= \\sum \\log \\sigma(r_{\\phi}(a_{+}) &#8722; r_{\\phi}(a_{-})) \\\\[4pt]\n\\nabla J(\\phi) &amp;= \\sum (1 &#8722; \\sigma(r_{\\phi}(a_{+}) &#8722; r_{\\phi}(a_{-}))) [\\nabla r_{\\phi}(a_{+}) &#8722; \\nabla r_{\\phi}(a_{-})]\n\\end{align*}\n&quot;,&quot;id&quot;:&quot;RXKYWFYHOD&quot;}" data-component-name="LatexBlockToDOM"></div><h2>The KL Divergence Penalty &amp; PPO</h2><p>Now we have a reward function. Just like the traditional methodology for REINFORCE, you can optimize your policy with respect to the objective function:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\max_\\theta J(\\theta) = E_{\\pi_{\\theta}}[r(x, a)]&quot;,&quot;id&quot;:&quot;WRMDNWJJEM&quot;}" data-component-name="LatexBlockToDOM"></div><p>It&#8217;s a bit different from REINFORCE since there&#8217;s no discounted sum of rewards across a trajectory. Instead, it&#8217;s just a single-step reward that we&#8217;re optimizing with respect to. The problem with this approach is that it&#8217;s going to completely alter your model. The optimization will force the policy to output a<sub>+</sub> with very high probability, at the cost of everything else.</p><p>So the actual optimization for online PPO and DPO actually adds a constraint to prevent the policy from diverging too much from the original policy, &#960;<sub>ref</sub>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\max_&#952; E_{\\pi_{\\theta}}[r(x, a)] \\ \\text{s.t. } KL(\\pi_\\theta | \\pi_\\text{ref}) < &#948;&quot;,&quot;id&quot;:&quot;BWQUCKHCWN&quot;}" data-component-name="LatexBlockToDOM"></div><p>That KL divergence constraint might make you think of PPO. But that similarity is completely superficial. Recall that the KL divergence constraint for PPO came from rewriting the objective function:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align*} \\max_\\theta J(&#952;) &amp;= E_\\tau[g(\\tau)] = E_{x&#8764;d^{&#960;_{\\text{old}}},a&#8764;&#960;_{\\text{new}}}[A^{&#960;_{\\text{old}}}(x, a)] \\end{align*}&quot;,&quot;id&quot;:&quot;AHFMGIEDFY&quot;}" data-component-name="LatexBlockToDOM"></div><p>We needed to constrain d<sup>&#960;_old</sup> &#8776; d<sup>&#960;_new</sup> so we didn&#8217;t have to re-sample, and the best we could do was penalize KL(&#960;<sub>new</sub> || &#960;<sub>old</sub>) and establish an upper bound on the divergence of the state distributions.</p><p>The KL divergence constraint for online PPO and DPO is not fundamentally justified in the same way. It is simply the heuristic notion that we want &#960;<sub>new</sub> to be not too different from &#960;<sub>ref</sub>. You could theoretically derive this constraint if you think the true model follows a Boltzmann distribution, and you impose &#960;<sub>ref</sub> as a prior. This leads to the same objective function as above. But that&#8217;s not really where this KL divergence constraint comes from.</p><p>If you&#8217;re running online PPO, you&#8217;ll see the KL divergence penalty in the objective function to keep the policy &#960;<sub>new</sub> close to &#960;<sub>ref</sub>, and a clipping mechanism to keep the policy &#960;<sub>new</sub> within the trust region of &#960;<sub>old</sub>.</p><p>To finish the derivation for online PPO, we add the constraint that &#960;<sub>&#952;</sub>(a | x) must sum to 1:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align*}\nJ(&#952;) &amp;= \\mathbb{E}_{\\pi_\\theta}[r(x, a)] \n    - \\frac{1}{&#946;} \\, KL(\\pi_\\theta \\,\\|\\, \\pi_\\text{ref}) \n    - &#955; \\left[\\sum_a \\pi_\\theta(a | x) - 1\\right] \\\\[6pt]\nJ(&#952;) &amp;= \\sum_a \\pi_\\theta(a | x) \n    \\left[r(x, a) - \\tfrac{1}{&#946;}\\big(\\log \\pi_\\theta(a | x) - \\log \\pi_\\text{ref}(a | x)\\big) - &#955;\\right] \n    - &#955;\n\\end{align*}\n&quot;,&quot;id&quot;:&quot;TWXFCDYHTC&quot;}" data-component-name="LatexBlockToDOM"></div><p>Taking the gradient:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align*}\n\\nabla J(&#952;) &amp;= \\sum_a \\nabla \\pi_\\theta(a | x)\n    \\left[r(x, a) - \\tfrac{1}{&#946;}\\big(\\log \\pi_\\theta(a | x) - \\log \\pi_\\text{ref}(a | x)\\big) - &#955;\\right] \\\\[6pt]\n&amp;\\quad - \\tfrac{1}{&#946;} \\sum_a \\pi_\\theta(a | x) \\, \\nabla \\log \\pi_\\theta(a | x)\n\\end{align*}\n&quot;,&quot;id&quot;:&quot;RZPJBZNNAG&quot;}" data-component-name="LatexBlockToDOM"></div><p>You can go ahead and optimize &#960;<sub>&#952;</sub> using this gradient, and that&#8217;s exactly where methods like online PPO (or as we cover later, online GRPO) fit in.</p><h2>Direct Preference Optimization (DPO)</h2><p>DPO, on the other hand, attempts to turn this into a supervised learning problem, eliminating the need for rollouts or trajectories altogether. DPO starts by solving for the closed form solution of &#960;<sub>&#952;</sub>. Note that &#960;<sub>&#952;</sub>(a | x) * &#8711;log(&#960;<sub>&#952;</sub>(a | x)) = &#8711;&#960;<sub>&#952;</sub>(a | x) by the log-gradient trick, so that second summation sums to 1:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\sum &#8711;&#960;_&#952;(a|x)[r(x,a) - \\tfrac{1}{&#946;}(\\log &#960;_&#952;(a|x) - \\log &#960;_{\\text{ref}}(a|x)) - &#955;] - \\tfrac{1}{&#946;} = 0&quot;,&quot;id&quot;:&quot;WOUEKDTACK&quot;}" data-component-name="LatexBlockToDOM"></div><p>Simplifying,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\beta r(x,a) - [\\log &#960;_&#952;(a|x) - \\log &#960;_{\\text{ref}}(a|x)] - &#955; - 1 = 0&quot;,&quot;id&quot;:&quot;PIVOJUJVHO&quot;}" data-component-name="LatexBlockToDOM"></div><p>so:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\log &#960;_&#952;(a|x) = \\beta r(x,a) + \\log &#960;_{\\text{ref}}(a|x) + &#955; - 1&quot;,&quot;id&quot;:&quot;NLGJRVXDNT&quot;}" data-component-name="LatexBlockToDOM"></div><p>and exponentiating gives:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#960;_&#952;(a|x) = \\frac{&#960;_{\\text{ref}}(a|x) e^{\\beta r(x,a)}}{C(x)}&quot;,&quot;id&quot;:&quot;FRYYCKDIRM&quot;}" data-component-name="LatexBlockToDOM"></div><p>where C(x) is the normalization constant ensuring probabilities sum to one.</p><p>Then, DPO moves in the reverse direction, substituting this definition of &#960;<sub>&#952;</sub> to express r:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\ r(x, a)=\\frac{1}{&#946;}[ \\log C(x)+\\log &#960;_&#952;(a | x)&#8722;\\log &#960;_\\text{ref}(a | x) ]&quot;,&quot;id&quot;:&quot;CVBABDJPXC&quot;}" data-component-name="LatexBlockToDOM"></div><p>Previously we solved this expression, which we&#8217;ll use for MLE:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Pr[U(a_{+}) > U(a_{-})] = &#963;(r(a_{+}) &#8722; r(a_{-}))&quot;,&quot;id&quot;:&quot;ZRXQYUSIQZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Plugging in r:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align*} Pr[U(a_{+}) > U(a_{-})] &amp;= \\sigma\\Big(\\frac{1}{\\beta}[(\\log \\pi_{\\theta}(a_{+} | x) &#8722; \\log \\pi_{\\text{ref}}(a_{+} | x)) &#8722; (\\log \\pi_{\\theta}(a_{-} | x) &#8722; \\log \\pi_\\text{ref}(a_{-} | x))]\\Big) \\end{align*}&quot;,&quot;id&quot;:&quot;GUUWNJCGIQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then we maximize likelihood:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\begin{align*} J(&#952;) &amp;= &#8721; \\log \\sigma\\Big(\\frac{1}{\\beta}[(\\log \\pi_{\\theta}(a_{+} | x) &#8722; \\log \\pi_\\text{ref}(a_{+} | x)) &#8722; (\\log \\pi_{\\theta}(a_{-} | x) &#8722; \\log \\pi_\\text{ref}(a_{-} | x))]\\Big) \\end{align*}&quot;,&quot;id&quot;:&quot;ZGIUFWLGJB&quot;}" data-component-name="LatexBlockToDOM"></div><p>That&#8217;s the final DPO objective. It can be optimized via standard supervised learning on your dataset. Choosing &#946; controls the trade-off between imitation and divergence. But note that you no longer get additional signal beyond the dataset, unlike online PPO.</p><h2>Offline Group Relative Policy Optimization (GRPO)</h2><p>Now we reach the modern variant. GRPO was introduced by DeepSeek in 2024.</p><p>GRPO begins with the same pairwise setup as DPO. In fact, pairwise GRPO is mathematically identical to DPO, just rewritten.</p><p>To simplify notation, define:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;z(x):=\\frac{1}{&#946;}[(\\log \\pi_{\\theta}(a_{+} | x)&#8722;\\log \\pi_\\text{ref}(a_{+} | x))&#8722;(\\log \\pi_{\\theta}(a_{-} | x)&#8722;\\log \\pi_\\text{ref}(a_{-} | x))]&quot;,&quot;id&quot;:&quot;IGQRTPOSXJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then the objective function for DPO becomes:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align*}\nJ(&#952;) &amp;= \\sum \\log &#963;(z(x)) \\\\[4pt]\n\\nabla J(&#952;) &amp;= \\sum (1 - &#963;(z(x))) \\, \\nabla z(x) \\\\[4pt]\n\\nabla z(x) &amp;= \\frac{1}{&#946;}\\left[\\nabla \\log \\pi_{\\theta}(a_{+} \\mid x) - \\nabla \\log \\pi_{\\theta}(a_{-} \\mid x)\\right]\n\\end{align*}\n&quot;,&quot;id&quot;:&quot;FNDZBTTHMA&quot;}" data-component-name="LatexBlockToDOM"></div><p>Define a shorthand:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;w(x,a_{+},a_{-}):=\\frac{1&#8722;&#963;(z(x))}{&#946;}&quot;,&quot;id&quot;:&quot;XSMNPXXYJL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#8711;J(&#952;)=&#8721; w(x,a_{+},a_{-})[&#8711; \\log \\pi_\\theta(a_{+} | x) &#8722; &#8711; \\log \\pi_\\theta(a_{-} | x)]&quot;,&quot;id&quot;:&quot;SCSCEFJROX&quot;}" data-component-name="LatexBlockToDOM"></div><p>Next, GRPO defines some synthetic reward function &#340;:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\hat{R}(a) :=\n\\begin{cases}\n+w(x, a_{+}, a_{-}), &amp; a = a_{+} \\\\[4pt]\n-w(x, a_{+}, a_{-}), &amp; a = a_{-} \\\\[4pt]\n0, &amp; \\text{otherwise}\n\\end{cases}&quot;,&quot;id&quot;:&quot;NWDWLBUXLV&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then we might rewrite the gradient as:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#8711;J(&#952;)=&#8721; \\hat{R}(a) &#8711; \\log &#960;_&#952;(a | x)&quot;,&quot;id&quot;:&quot;HTOTLKLRLT&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is exactly the REINFORCE objective! This is the basic formulation for offline GRPO in the pairwise case. As you can see, all we did was make a few substitutions, but we didn&#8217;t fundamentally change the optimization. Thus, offline GRPO (pairwise) &#8801; DPO &#8801; REINFORCE in disguise.</p><h2>Extending to Groups</h2><p>So that&#8217;s the pairwise case. But the &#8220;group&#8221; in &#8220;group relative policy optimization&#8221; implies that you can have more than two responses. To be clear, with a<sub>1</sub> &gt; a<sub>2</sub> &gt; a<sub>3</sub>, you could decompose that into pairs (a<sub>1</sub> &gt; a<sub>2</sub>, a<sub>2</sub> &gt; a<sub>3</sub>, &#8230;), but GRPO treats the group as a first-class citizen.</p><p>Here&#8217;s where theory gets shaky. In DPO, the weights w<sub>i</sub> are strictly determined &#177;(1&#8722;&#963;(z))/&#946;. GRPO merely observes that these weights satisfy &#8721; w<sub>i</sub> = 0 and generalizes: any set of scores with &#8721; w<sub>i</sub> = 0 is allowed.</p><p>The same supervised objective then applies:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#8711;J(&#952;)=&#8721; \\hat{R}(a_{i}) &#8711; \\log &#960;_&#952;(a_{i} | x)&quot;,&quot;id&quot;:&quot;XAXPNTCCIA&quot;}" data-component-name="LatexBlockToDOM"></div><p>where &#340; uses the custom group weights.</p><p>To adapt this to online GRPO, we reuse the same idea as online PPO. After generating k responses for a prompt, compute their scores, center them (subtract the mean), and treat those as the rewards.</p><h2>End of the Series</h2><p>So is GRPO broken? Many people report that it works for them empirically. But it&#8217;s fair to say that GRPO&#8217;s theoretical foundations are weaker than many other methods. I&#8217;ll end this with a take I posted about GRPO: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fq3K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fq3K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png 424w, https://substackcdn.com/image/fetch/$s_!Fq3K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png 848w, https://substackcdn.com/image/fetch/$s_!Fq3K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png 1272w, https://substackcdn.com/image/fetch/$s_!Fq3K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fq3K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png" width="1199" height="2427" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2427,&quot;width&quot;:1199,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:723723,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.neelsomaniblog.com/i/175844591?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fq3K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png 424w, https://substackcdn.com/image/fetch/$s_!Fq3K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png 848w, https://substackcdn.com/image/fetch/$s_!Fq3K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png 1272w, https://substackcdn.com/image/fetch/$s_!Fq3K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33cb1ba6-9208-4863-8a98-5a9121d890c8_1199x2427.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://twitter.com/neelsomani/status/1976690361553895711">https://x.com/neelsomani/status/1976690361553895711</a></figcaption></figure></div><p>I hope to cover other RL/ML topics in future posts, but that concludes my blog series on reinforcement learning. Feedback is appreciated!</p>]]></content:encoded></item><item><title><![CDATA[What You Didn't Learn in Berkeley CS 188 — Part 3]]></title><description><![CDATA[Off-policy methods, for better sample efficiency and scalability.]]></description><link>https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-9b3</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-9b3</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Thu, 09 Oct 2025 00:57:02 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a1429d12-1261-4de3-878c-f5abfcd3146e_1000x1000.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>So far in this series on reinforcement learning, we&#8217;ve covered <a href="https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley">classical methods</a> and the foundations of <a href="https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-b29">continuous-control methods</a>.</p><p>Let&#8217;s say you want to use these methods at scale. Ideally, we&#8217;d have a method to run as many actors as we want, alongside some way to consolidate the results.</p><p>Basic knowledge of PPO tells us that we can do this as long as the actors are using a policy that isn&#8217;t &#8220;too far&#8221; from the latest policy &#960;<sub>current</sub>. But that restriction inherently caps how many actors we can run concurrently. If an update drifts &#960;<sub>current</sub> too far, then all of the other actors&#8217; work becomes much less valuable.</p><p>This naturally leads us to the &#8220;off-policy&#8221; methods: DDPG, TD3, and SAC.</p><h2><strong>Deep Deterministic Policy Gradient (DDPG)</strong></h2><p>One nice thing about Q-learning was that, in theory, you could fill out the state-action table in parallel. Work was never really wasted, because every visit to an (s, a) pair contributed information toward the optimal value function.</p><p>Of course, the argument for convergence of Q-learning relied on the contraction property of the Bellman update operator. That&#8217;s going to be harder to prove in a continuous action space, because we have to use something like a neural net to output Q-values (DQN), meaning we can&#8217;t guarantee that the policy is strictly improving like in the tabular method. In fact, there is no clean convergence proof for DQN.</p><p>Regardless, the question remains: is there a reasonable method that never &#8220;throws away&#8221; old samples and can use every (s, a) collected? Note that the reason we couldn&#8217;t use Q-learning in a continuous action space is that we couldn&#8217;t solve the max&#8336; in this update:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q(s, a) \\leftarrow &#945; Q(s, a) + (1 - &#945;)(r + \\max_{a&#8217;} Q(s&#8217;, a&#8217;))&quot;,&quot;id&quot;:&quot;KFYFIDTUAJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>What if we try just outputting max&#8336; Q(s&#8242;, a&#8242;) directly? Is it possible to build an algorithm around this?</p><h3><strong>Derivation of the Deterministic Policy Gradient Theorem</strong></h3><p>The deterministic policy gradient theorem is a way to differentiate our objective function by integrating over states rather than actions. Let&#8217;s define a &#8220;deterministic policy&#8221; &#956;:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#956;_&#952; : S &#8594; A&quot;,&quot;id&quot;:&quot;XXBDRGVBTZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>and its objective:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;J(&#952;) := &#120124;_{s&#8764;d^{&#956;_&#952;}}[Q^{&#956;_&#952;}(s, &#956;_&#952;(s))]&quot;,&quot;id&quot;:&quot;OAFTOYFDWT&quot;}" data-component-name="LatexBlockToDOM"></div><p>Our goal is to compute &#8711;<sub>&#952;</sub>J(&#952;) so we can perform gradient ascent. Starting from:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; J(&#952;) = \\sum_t &#947;^t r(s_t, &#956;_&#952;(s_t)) = \\sum_s \\left(\\sum_t &#947;^t \\Pr[s_t=s | &#956;_&#952;]\\right) r(s, &#956;_&#952;(s))&quot;,&quot;id&quot;:&quot;DFZIWEIHSQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Direct differentiation is messy because Pr[s<sub>t </sub>= s] depends on &#952;. We&#8217;d like to express this in terms of value functions instead, and eliminate the Pr[s<sub>t </sub>= s] term.</p><p>Intuitively, this discounted and probability-weighted summation of rewards across each state is equivalent to the expected value of starting the game at all: &#120124;<sub>s_0</sub>[V(s&#8320;)]. Let&#8217;s prove that equivalence formally.</p><h4><strong>Step 1. Relating reward and value</strong></h4><p>We relate r and V via the Bellman equation:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^{&#956;_&#952;}(s) := r(s, &#956;_&#952;(s)) + &#947; &#120124;_{s&#8217;}[V^{&#956;_&#952;}(s&#8217;)]&quot;,&quot;id&quot;:&quot;UJHERWERND&quot;}" data-component-name="LatexBlockToDOM"></div><p>and define the &#8220;discounted state visitation&#8221;:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; p_&#952;(s) := \\sum_t &#947;^t \\Pr[s_t=s]&quot;,&quot;id&quot;:&quot;HVUREAFUKR&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla J(&#952;) = \\nabla \\sum_s p_&#952;(s) r(s, &#956;_&#952;(s)) = \\nabla \\sum_s p_&#952;(s)[V^{&#956;_&#952;}(s) - &#947; &#120124;_{s&#8217;}[V^{&#956;_&#952;}(s&#8217;)]]&quot;,&quot;id&quot;:&quot;VYQNAUWFKI&quot;}" data-component-name="LatexBlockToDOM"></div><p>So we want an expression for &#8711;&#931;&#8347; p<sub>&#952;</sub>(s)V<sup>&#956;_&#952;</sup>(s) and/or &#8711;&#931;&#8347; p<sub>&#952;</sub>(s)&#947;&#120124;[V<sup>&#956;_&#952;</sup>(s&#8242;)].</p><h4><strong>Step 2. Recursive property of state visitation</strong></h4><p>We expand transitions:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\Pr[s_{t+1}=s&#8217;] = \\sum_s \\Pr[s_t=s] \\Pr[s&#8217; | s, &#956;_&#952;(s)]&quot;,&quot;id&quot;:&quot;FOLSMQDHEB&quot;}" data-component-name="LatexBlockToDOM"></div><p>Sum over t:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\sum_t \\Pr[s_{t+1}=s&#8217;] = \\sum_t \\sum_s \\Pr[s_t=s] \\Pr[s&#8217; | s, &#956;_&#952;(s)]&quot;,&quot;id&quot;:&quot;EABSSOZYGZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Apply discounting by &#947;:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_t &#947;^{t+1} \\Pr[s_{t+1}=s&#8217;] = \\sum_t &#947;^{t+1} \\sum_s \\Pr[s_t=s] \\Pr[s&#8217; | s, &#956;_&#952;(s)]&quot;,&quot;id&quot;:&quot;JQXZECPLNV&quot;}" data-component-name="LatexBlockToDOM"></div><p>Define:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;d_0(s) := \\Pr[s_0=s]&quot;,&quot;id&quot;:&quot;KDYLXRWOSU&quot;}" data-component-name="LatexBlockToDOM"></div><p>The left-hand side is p<sub>&#952;</sub>(s&#8242;) except it omits t = 0:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align}\n\\sum_t \\gamma^{t+1} \\Pr[s_{t+1}=s'] \n&amp;= \\sum_{u=1}^\\infty \\gamma^u \\Pr[s_u=s'] = p_\\theta(s') - d_0(s') \\\\\np_\\theta(s') \n&amp;= d_0(s') + \\gamma \\sum_s p_\\theta(s)\\, \\Pr[s' \\mid s, \\mu_\\theta(s)]\n\\end{align}\n&quot;,&quot;id&quot;:&quot;PMWHVTGEXQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>In words: Each state&#8217;s discounted occupancy p<sub>&#952;</sub>(s&#8242;) consists of two parts: the initial-state contribution d&#8320;(s&#8242;) and the &#947;-discounted flow of visitation mass from predecessor states, weighted by their transition probabilities under &#956;<sub>&#952;</sub>.</p><p>It&#8217;s starting to look pretty close to what we want above.</p><h4><strong>Step 3. Multiplying by V(s&#8242;) and summing over s&#8242;</strong></h4><p>Multiply both sides by V(s&#8242;) and sum over s&#8242;:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_{s&#8217;} p_&#952;(s&#8217;) V(s&#8217;) = \\sum_{s&#8217;} d_0(s&#8217;) V(s&#8217;) + &#947; \\sum_s p_&#952;(s)\\sum_{s&#8217;} \\Pr[s&#8217; | s, &#956;_&#952;(s)] V(s&#8217;)&quot;,&quot;id&quot;:&quot;YQNECDUYFA&quot;}" data-component-name="LatexBlockToDOM"></div><p>Recognize that the inner sum is an expectation:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_{s&#8217;} \\Pr[s&#8217; | s, &#956;_&#952;(s)] V(s&#8217;) = &#120124;_{s&#8217;}[V(s&#8217;)]&quot;,&quot;id&quot;:&quot;AGJGQGOAED&quot;}" data-component-name="LatexBlockToDOM"></div><p>So:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_{s&#8217;} p_&#952;(s&#8217;) V(s&#8217;) = \\sum_{s&#8217;} d_0(s&#8217;) V(s&#8217;) + &#947; \\sum_s p_&#952;(s) &#120124;_{s&#8217;}[V(s&#8217;)]&quot;,&quot;id&quot;:&quot;XFKGMAOMNN&quot;}" data-component-name="LatexBlockToDOM"></div><p>Now recall our earlier expression for the gradient, and substitute our expression above back in:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align}\n\\nabla J(\\theta) &amp;= \\nabla \\sum_s p_\\theta(s)\\left[V^{\\mu_\\theta}(s) - \\gamma\\, \\mathbb{E}_{s'}[V^{\\mu_\\theta}(s')]\\right] \\\\\n\\nabla J(\\theta) &amp;= \\nabla \\left(\\sum_{s'} d_0(s')\\, V^{\\mu_\\theta}(s')\\right)\n\\end{align}\n&quot;,&quot;id&quot;:&quot;HLGOGUKZDP&quot;}" data-component-name="LatexBlockToDOM"></div><p>Or equivalently,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla \\left(\\sum_s p_&#952;(s)V(s) - &#947;\\sum_s p_&#952;(s)&#120124;[V(s&#8217;)]\\right) = \\nabla &#120124;_{s_0}[V(s_0)]&quot;,&quot;id&quot;:&quot;RRDHWOBFMI&quot;}" data-component-name="LatexBlockToDOM"></div><h4><strong>Step 4. Chain rule on Q</strong></h4><p>Now the expression is much friendlier. How do we differentiate V(s)? Since V(s) = Q(s, &#956;<sub>&#952;</sub>(s)) for deterministic policies, we apply the multivariable chain rule:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align}\n\\nabla_\\theta \\mathbb{E}_{s_0}[Q(s_0, \\mu_\\theta(s_0))] \n&amp;= \\sum_s \\Pr[s_0 = s] \\big[ \\nabla_\\theta Q(s, a)\\big|_{a=\\mu_\\theta(s)} \\\\\n&amp;\\quad + \\nabla_a Q(s, a)\\big|_{a=\\mu_\\theta(s)} \\nabla_\\theta \\mu_\\theta(s) \\big]\n\\end{align}\n&quot;,&quot;id&quot;:&quot;YCDTTZJCAE&quot;}" data-component-name="LatexBlockToDOM"></div><p>The first term, &#8711;<sub>&#952;</sub>Q(s,a), is recursive with respect to the original gradient &#8711;J via the Bellman equation, except now it&#8217;s over the distribution of s<sub>1</sub> rather than the starting distribution of s<sub>0</sub>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla_&#952; Q(s,a)_{|{a=&#956;_&#952;(s)}} = \\nabla_&#952;[r(s,a) + &#947; &#120124;_{s&#8217;}V^{&#956;_&#952;}(s&#8217;)] = &#947; \\nabla_&#952; &#120124;_{s&#8217;}[V^{&#956;_&#952;}(s&#8217;)]&quot;,&quot;id&quot;:&quot;CGYPNSUWDG&quot;}" data-component-name="LatexBlockToDOM"></div><p>Unrolling this recursion gives:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align*}\n\\nabla_\\theta J(\\theta)\n&amp;= \\gamma\\,\\nabla_\\theta \\mathbb{E}_{s_1}[V^{\\mu_\\theta}(s_1)]\n  + \\nabla_a Q(s_0, a)\\big|_{a=\\mu_\\theta(s_0)}\\,\\nabla_\\theta \\mu_\\theta(s_0) \\\\[6pt]\n\\nabla_\\theta J(\\theta)\n&amp;= \\gamma^2\\,\\nabla_\\theta \\mathbb{E}_{s_2}[V^{\\mu_\\theta}(s_2)]\n  + \\gamma\\,\\nabla_a Q(s_1, a)\\big|_{a=\\mu_\\theta(s_1)}\\,\\nabla_\\theta \\mu_\\theta(s_1) \\\\[-2pt]\n&amp;\\quad\n  + \\nabla_a Q(s_0, a)\\big|_{a=\\mu_\\theta(s_0)}\\,\\nabla_\\theta \\mu_\\theta(s_0)\n\\end{align*}\n&quot;,&quot;id&quot;:&quot;QZSMWPKHJK&quot;}" data-component-name="LatexBlockToDOM"></div><p>Continuing indefinitely:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla_\\theta J(\\theta)\n= \\sum_{t=0}^{\\infty} \\gamma^t \\, \\mathbb{E}_{s_t}\n\\!\\left[\n  \\nabla_a Q(s_t, a)\\big|_{a=\\mu_\\theta(s_t)} \\,\n  \\nabla_\\theta \\mu_\\theta(s_t)\n\\right]&quot;,&quot;id&quot;:&quot;CUKBYZVVPC&quot;}" data-component-name="LatexBlockToDOM"></div><p>Equivalently, using the discounted state-visitation form:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\nabla_&#952; J(&#952;) = &#120124;_{s&#8764;p_&#952;}[\\nabla_a Q(s,a)|_{a=&#956;_&#952;(s)} \\nabla_&#952; &#956;_&#952;(s)]&quot;,&quot;id&quot;:&quot;RYFTVJFQAZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is the Deterministic Policy Gradient Theorem. The key difference from the traditional (stochastic) policy gradient theorem is that the expectation is taken over the state distribution rather than the action distribution.</p><p>That single shift, integrating over p<sub>&#952;</sub>(s) instead of &#960;(a|s), makes deterministic continuous control methods tractable and forms the foundation for DDPG and its successors.</p><h3><strong>Practical Considerations of DDPG</strong></h3><p>We can&#8217;t compute this expectation analytically, so we approximate Q and &#956; with neural networks.</p><p>The first relaxation that we need to make, in line with our effort to increase sample efficiency, is to allow ourselves to use samples that don&#8217;t necessarily come from the current state distribution p<sub>&#952;</sub>. The original DPG paper shows that for any sampling distribution p with sufficient coverage:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla_&#952; J(&#952;) &#8733; &#120124;_{s&#8764;p}[\\nabla_&#952; &#956;_&#952;(s) \\nabla_a Q^{&#956;_&#952;}(s,a)|_{a=&#956;_&#952;(s)}]&quot;,&quot;id&quot;:&quot;SMVVGJXICJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Thus, we can reuse samples from &#8220;replay buffers,&#8221; where we store all of the previous samples that we&#8217;ve observed, even after our gradient has had many updates. This is the key to off-policy learning.</p><p>Second, DDPG uses two networks: the actor &#956;<sub>&#952;</sub> and the critic Q<sub>&#966;</sub>. During exploration, Gaussian noise is added to &#956;<sub>&#952;</sub>(s). The critic is trained by minimizing:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L(&#966;) = \\frac{1}{N} \\sum (Q_&#966;(s_t,a_t) - y_t)^2&quot;,&quot;id&quot;:&quot;EOOFKTUGQS&quot;}" data-component-name="LatexBlockToDOM"></div><p>where</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_t = r_t + &#947; Q_{&#966;&#8217;}(s_{t+1}, &#956;_{&#952;&#8217;}(s_{t+1}))&quot;,&quot;id&quot;:&quot;JEWOANARSG&quot;}" data-component-name="LatexBlockToDOM"></div><p>and the actor loss is simply:</p><pre><code>actions = actor(states)
q_values = critic(states, actions)
actor_loss = -q_values.mean()
actor_loss.backward()</code></pre><p>In practice, DDPG maintains frozen target networks Q<sub>&#966;&#8217;</sub> and &#956;<sub>&#952;&#8217;</sub>. These networks are updated &#8220;softly&#8221;:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#966;&#8217; &#8592; &#964;&#966; + (1-&#964;)&#966;&#8217;, \\quad &#952;&#8217; &#8592; &#964;&#952; + (1-&#964;)&#952;&#8217;&quot;,&quot;id&quot;:&quot;ZWRPGZTZMB&quot;}" data-component-name="LatexBlockToDOM"></div><p>for small &#964;. In practice, this reduces the variance of the network updates. But buyer beware: DDPG is still known to be very unstable and sensitive to hyperparameters.</p><h2><strong>Twin Delayed DDPG (TD3)</strong></h2><p>TD3 is directly a response to DDPG. It makes three changes to DDPG, all starting with the letter D, two of which are fairly simple:</p><ol><li><p>Double critics<strong>:</strong> Instead of one critic network, now we have two, and we use min(Q_{&#966;&#8242;&#8321;}, Q_{&#966;&#8242;&#8322;}). The reasoning is that Q-networks can be spiky and randomly assign too high of a value to some states. &#956;_&#952;(s) is computing arg max&#8336; Q(s, a), which implicitly relies on max&#8336; Q(s, a). This leads to systematic over-estimation of the true Q-value. By taking min(Q_{&#966;&#8242;&#8321;}, Q_{&#966;&#8242;&#8322;}), we counteract this overestimation bias and err toward underestimating the true objective function.</p></li><li><p>Delayed actor updates<strong>:</strong> The heuristic reasoning for convergence (not formally proven) is structurally identical to the convergence argument for policy iteration. We want the state values to converge first, and then we update the policy. TD3 solves this by only updating the actor once every two times the critic network is updated.</p></li></ol><p>The last change to DDPG, called deterministic target smoothing, is more nuanced.</p><h3><strong>Deterministic Target Smoothing Analysis</strong></h3><p>The critical insight here is that (s, a) is continuous, and therefore it&#8217;s extremely unlikely that you&#8217;ll ever hit the same (s, a) twice. That means that if you have a randomly high Q(s&#8242;, a&#8242;) from initialization, that spike will permanently distort that part of the Q-network. Worse, it propagates downstream through the Bellman update:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q(s, a) &#8592; r + &#947; Q(s', a')&quot;,&quot;id&quot;:&quot;PPAJRJGHLN&quot;}" data-component-name="LatexBlockToDOM"></div><p>This contamination affects nearby states too, since Q-networks generalize over continuous space. The double critics mitigate but don&#8217;t eliminate this problem.</p><p>We solve it by smoothing out that local, spurious (s, a) pair with its neighbors. Before, the target was:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_t = r_t + &#947; \\min_i Q_{&#966;'_{i}}(s_{t+1}, &#956;_{&#952;'}(s_{t+1}))&quot;,&quot;id&quot;:&quot;YFUPQONIXP&quot;}" data-component-name="LatexBlockToDOM"></div><p>Instead, TD3 uses a smoothed target:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_t = r_t + &#947; &#120124;_{\\epsilon}\\left[\\min_{i}Q_{\\phi'_i}(s_{t+1}, &#956;_{\\theta'}(s_{t+1}) + \\epsilon)\\right], \\quad \\epsilon &#8764; &#119977;(0, \\sigma^2)&quot;,&quot;id&quot;:&quot;ZJLGGFQAGE&quot;}" data-component-name="LatexBlockToDOM"></div><p>Maybe a single (s, a) pair has a random spike, but on average, the neighborhood of points is likely to be reasonable. Computing that expectation is intractable, but we can approximate it via Monte Carlo. In practice, a single &#949;-sample per update is sufficient to get a good estimate.</p><p>If we take a second-order Taylor expansion of Q around the mean action &#956;<sub>&#952;&#8242;</sub>(s<sub>t+1</sub>), we find that this expectation implicitly penalizes the Laplacian of Q with respect to its action input. That is, the added noise regularizes curvature, discouraging sharp spikes in Q-values that would otherwise destabilize learning.</p><p>Together, these three changes to DDPG make TD3 a stable and popular alternative.</p><h2><strong>Soft Actor-Critic (SAC)</strong></h2><p>Now it would be awesome if somehow SAC were a natural continuation of DDPG/TD3. That doesn&#8217;t do it justice, though, because SAC is actually philosophically and foundationally distinct from all of the methods that we&#8217;ve covered so far, even in previous posts.</p><h3><strong>Philosophical Justification</strong></h3><p>The thing about our original objective function (J(&#952;) := &#120124;<sub>&#964;</sub>[G(&#964;)]) is that it only cares about expected return, not diversity of exploration. All of the exploration we&#8217;ve done so far has been either by solving for the parameters of a stochastic policy (e.g. PPO) or by adding randomness to the actions (e.g. DDPG). The optimal policy is still allowed to collapse into a deterministic mapping.</p><p>The reality is that there are many possible policies that could explain our observations. Which one should we prefer?</p><p>The <a href="https://en.wikipedia.org/wiki/Principle_of_maximum_entropy">principle of maximum entropy</a> tells us that, out of all the distributions consistent with our observations, we should pick the one with maximum entropy. This is the distribution that makes the fewest assumptions about the underlying data.</p><p>For example, if we have a variable x such that &#120124;<sub>p</sub>[f(x)] = c, we should solve:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\max_p H(p) \\quad \\text{s.t.} \\quad &#120124;_p[f(x)] = c&quot;,&quot;id&quot;:&quot;TNDAQMGQJO&quot;}" data-component-name="LatexBlockToDOM"></div><p>The Lagrangian is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L(p, &#955;) = -\\sum_x p(x)\\log p(x) + &#955;(&#120124;_p[f(x)] - c)&quot;,&quot;id&quot;:&quot;KRVMBASFBP&quot;}" data-component-name="LatexBlockToDOM"></div><p>and setting its derivative to zero yields:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p^*(x) &#8733; e^{&#955; f(x)}&quot;,&quot;id&quot;:&quot;YKQSHKUXIZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is the Boltzmann distribution, the unique maximum-entropy distribution that satisfies the constraint.</p><h3><strong>Applying to Reinforcement Learning</strong></h3><p>Applying this idea to RL, if we assume we want some level of return R&#770;, then we should pick (in discrete form):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\max_q H(q) = -\\sum_{\\tau} q(\\tau)\\log q(\\tau)&quot;,&quot;id&quot;:&quot;AAMIWBZJLF&quot;}" data-component-name="LatexBlockToDOM"></div><p>subject to</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_{\\tau} q(\\tau) = 1, \\qquad &#120124;_q[R(\\tau)] = \\hat{R}&quot;,&quot;id&quot;:&quot;JEYDXEJLPB&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here q(&#964;) is a hypothetical &#8220;best&#8221; trajectory distribution that reflects our observations, and &#120124;<sub>q</sub> denotes expectation over trajectories induced by q. Of course, if we set R&#770; arbitrarily high, the resulting entropy will approach 0.</p><p>Forming the Lagrangian and setting the derivative to 0:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L(q, &#955;, &#946;) = -\\sum_{&#964;} q(&#964;)\\log q(&#964;) + &#955;\\left(\\sum_{&#964;} q(&#964;) - 1\\right) + &#946;\\left(\\sum_{&#964;} q(&#964;)R(&#964;) - \\hat{R}\\right)&quot;,&quot;id&quot;:&quot;DYVZKESKTO&quot;}" data-component-name="LatexBlockToDOM"></div><p>Taking the derivative with respect to q(&#964;):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{&#8706;L}{&#8706;q(&#964;)} = -(\\log q(&#964;) + 1) + &#955; + &#946; R(&#964;) = 0&quot;,&quot;id&quot;:&quot;LVPWKXDTFL&quot;}" data-component-name="LatexBlockToDOM"></div><p>which gives:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;q(&#964;) &#8733; e^{&#946; R(&#964;)}&quot;,&quot;id&quot;:&quot;IIOZDXCFGO&quot;}" data-component-name="LatexBlockToDOM"></div><p>Setting &#945; := 1/&#946;, we obtain the canonical maximum entropy form used in SAC:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;q(&#964;) &#8733; \\exp\\left(\\tfrac{1}{&#945;}\\sum_t r(s_t, a_t)\\right)&quot;,&quot;id&quot;:&quot;OFSZFJXOPZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>The last missing piece is that we&#8217;ve defined what the optimal trajectory distribution looks like, but the agent only controls the policy distribution. The environment dynamics also affect the likelihood of any trajectory:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p_0(&#964;) = p(s_0)\\prod_t p(s_{t+1}|s_t, a_t)&quot;,&quot;id&quot;:&quot;BHRXNUKDIS&quot;}" data-component-name="LatexBlockToDOM"></div><p>Using Bayes&#8217; rule (Pr[A | B] = Pr[A and B] / Pr[B]), we include this prior over trajectories:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;q^*(&#964;) &#8733; p_0(&#964;) \\exp\\left(\\tfrac{1}{&#945;}\\sum_t r(s_t, a_t)\\right)&quot;,&quot;id&quot;:&quot;HLUARWEYUI&quot;}" data-component-name="LatexBlockToDOM"></div><h3><strong>Properties of the Optimal Policy</strong></h3><p>So we&#8217;ve solved for the optimal trajectory distribution q*. But we can only control the policy &#960;<sub>&#952;</sub>(a|s), which induces its own trajectory distribution through the environment dynamics:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;p_&#960;(&#964;) = p(s_0)\\prod_t &#960;(a_t|s_t)p(s_{t+1}|s_t,a_t)&quot;,&quot;id&quot;:&quot;ZGYSIUCAGQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>The question becomes: which policy-induced trajectory distribution p<sub>&#960;</sub>(&#964;) is closest to q(&#964;)*? This leads naturally to minimizing the KL divergence:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#960;^* = \\arg\\min_{&#960;} D_{KL}(p_&#960;(&#964;) || q^*(&#964;))&quot;,&quot;id&quot;:&quot;OPXTXPGEYC&quot;}" data-component-name="LatexBlockToDOM"></div><p>Expanding:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align}\n\\pi^* \n&amp;= \\arg\\min_\\pi \\mathbb{E}_{p_\\pi}\\!\\left[\\log p_\\pi(\\tau) - \\log q^*(\\tau)\\right] \\\\[4pt]\n&amp;= \\arg\\min_\\pi \\mathbb{E}_{p_\\pi}\\!\\left[\n    \\log p_\\pi(\\tau) \n    - \\log p_0(\\tau)\n    - \\tfrac{1}{\\alpha} \\sum_t r(s_t, a_t)\n\\right]\n\\end{align}\n&quot;,&quot;id&quot;:&quot;BWRFMTLQVC&quot;}" data-component-name="LatexBlockToDOM"></div><p>Since log p_&#960;(&#964;) &#8722; log p&#8320;(&#964;) = &#8721;&#8348; log &#960;(a_t|s_t), this simplifies to:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#960;^* = \\arg\\min_&#960; &#120124;_{p_&#960;}\\sum_t [\\log &#960;(a_t|s_t) - \\tfrac{1}{&#945;}r(s_t,a_t)]&quot;,&quot;id&quot;:&quot;VYSLETZGGA&quot;}" data-component-name="LatexBlockToDOM"></div><p>Rewriting as a maximization gives:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#960;^* = \\arg\\max_&#960; &#120124;_{p_&#960;}\\sum_t [r(s_t,a_t) + &#945; H(&#960;(&#183;|s_t))]&quot;,&quot;id&quot;:&quot;YRWUWGQMFN&quot;}" data-component-name="LatexBlockToDOM"></div><p>since &#945; is a constant so multiplying by it doesn&#8217;t change the outcome of the argmax. Thus, SAC explicitly maximizes both expected reward and policy entropy, balancing exploitation with exploration in a single unified framework.</p><h3><strong>State Value and Q-Function Derivation</strong></h3><p>To implement this, we&#8217;re going to need expressions for both state values and Q-values. The state value function is just the objective above, with discounting:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^*(s) = \\max_&#960; &#120124;_{&#964;|s_0=s}\\left[\\sum_t &#947;^t(r(s_t,a_t) + &#945; H(&#960;(&#183;|s_t)))\\right]&quot;,&quot;id&quot;:&quot;LKQFJKHVUK&quot;}" data-component-name="LatexBlockToDOM"></div><p>By the Bellman equations:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^*(s) = \\max_&#960; &#120124;_{a,s&#8217;}[r(s,a) + &#945; H(&#960;(&#183;|s)) + &#947; V^*(s&#8217;)]&quot;,&quot;id&quot;:&quot;CGOJYATQGL&quot;}" data-component-name="LatexBlockToDOM"></div><p>and the corresponding Q-function:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q^*(s,a) = r(s,a) + &#947; &#120124;_{s&#8217;}[V^*(s&#8217;)]&quot;,&quot;id&quot;:&quot;CQLYELYTRA&quot;}" data-component-name="LatexBlockToDOM"></div><p>We can rewrite V* in terms of Q*:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^*(s) = \\max_&#960; &#120124;_{a}[Q^*(s,a) + &#945; H(&#960;(&#183;|s))]&quot;,&quot;id&quot;:&quot;LVFVKICOTG&quot;}" data-component-name="LatexBlockToDOM"></div><p>This form is friendlier because now we&#8217;re only taking an expectation over the action a. That expectation is equal to:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^*(s) = \\max_&#960; \\sum_a &#960;(a|s)[Q^*(s,a) - &#945; \\log &#960;(a|s)], \\quad \\text{s.t.}\\ \\sum_a &#960;(a|s)=1&quot;,&quot;id&quot;:&quot;IABWTSOFIN&quot;}" data-component-name="LatexBlockToDOM"></div><p>Let&#8217;s try to solve for that V*. We write the Lagrangian and set the derivative to 0:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;L(&#960;, &#955;) = \\sum_a &#960;(a|s)[Q^*(s,a) - &#945; \\log &#960;(a|s)] + &#955;(\\sum_a &#960;(a|s) - 1)&quot;,&quot;id&quot;:&quot;UVPQFOHLQO&quot;}" data-component-name="LatexBlockToDOM"></div><p>gives:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q^*(s,a) - &#945; \\log &#960;^*(a|s) - &#945; + &#955; = 0&quot;,&quot;id&quot;:&quot;DRRZHKMFNU&quot;}" data-component-name="LatexBlockToDOM"></div><p>Therefore:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\pi^*(a \\mid s) \\propto \\exp\\!\\left(\\frac{Q^*(s,a)}{\\alpha}\\right)\n&quot;,&quot;id&quot;:&quot;VFXTIDWYJB&quot;}" data-component-name="LatexBlockToDOM"></div><p>Normalizing yields the softmax policy:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\pi^*(a \\mid s)\n= \\frac{\\exp\\!\\left(\\tfrac{Q^*(s,a)}{\\alpha}\\right)}\n       {\\sum_{a'} \\exp\\!\\left(\\tfrac{Q^*(s,a')}{\\alpha}\\right)}\n&quot;,&quot;id&quot;:&quot;VMEKRYPGJL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Now we can substitute back into V:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^*(s)\n= \\alpha \\log \\sum_a \\exp\\!\\left(\\frac{Q^*(s,a)}{\\alpha}\\right)&quot;,&quot;id&quot;:&quot;XNHMOMVQRL&quot;}" data-component-name="LatexBlockToDOM"></div><h3><strong>Soft Policy Improvement</strong></h3><p>Finally, we need a way to computationally solve for the optimal policy pi*. We could theoretically fit a network to the definition of the optimal pi* above. But in continuous spaces that denominator is intractable. Instead, we define a &#8220;soft policy improvement&#8221; operator:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&#960;_{new} = \\arg\\max_&#960; &#120124;_{s&#8764;D, a&#8764;&#960;}[Q_&#952;(s,a) + &#945; H(&#960;)]&quot;,&quot;id&quot;:&quot;NYPIHNVSOQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>This operator has the same fixed point as the optimal policy above. (You can try substituting it in.) In practice, we minimize its negative:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;J_&#960;(&#966;) = &#120124;_{a&#8764;&#960;}[&#945; \\log &#960;(a|s) - Q_&#952;(s,a)]&quot;,&quot;id&quot;:&quot;DQZRPZEXJO&quot;}" data-component-name="LatexBlockToDOM"></div><p>The critic still follows a soft Bellman target:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q^*(s,a) = r(s,a) + &#947; &#120124;_{s&#8217;}[V^*(s&#8217;)]&quot;,&quot;id&quot;:&quot;KMFYKLMNBR&quot;}" data-component-name="LatexBlockToDOM"></div><p>with:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^*(s) = \\alpha \\log \\sum_a \\exp\\!\\left(\\frac{Q^*(s,a)}{\\alpha}\\right)\n&quot;,&quot;id&quot;:&quot;RQAXZYZCJM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Finally:</p><ul><li><p>SAC uses double critics (like TD3) to mitigate bias, and</p></li><li><p>Updates the temperature &#945; automatically to maintain a target entropy.</p></li></ul><p>SAC&#8217;s soft Bellman operator is a &#947;-contraction in the tabular case, ensuring convergence under idealized assumptions.</p><h2><strong>Wrapping Up</strong></h2><p>That concludes our discussion of the off-policy methods. In the next and final post of the series, I&#8217;ll cover incorporating human feedback, which is relevant in post-training LLMs: DPO and GRPO.</p>]]></content:encoded></item><item><title><![CDATA[What You Didn’t Learn in Berkeley CS 188 — Part 2]]></title><description><![CDATA[Implementing the policy gradient methods: REINFORCE, A2C, TRPO, PPO.]]></description><link>https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-b29</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-b29</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Tue, 07 Oct 2025 03:15:59 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/85acc66d-d37e-493a-b067-4f456aa45297_800x391.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In my last post, I covered classical reinforcement learning methods. Some of these appeared in CS 188, but not at the depth needed to understand why they work. In this post, I show how these basic methods can be rethought or extended to handle very large state spaces or continuous action spaces.</p><p>If you recall, Q-learning, value iteration, and other tabular methods require storing a full set of state&#8211;action values. The policy is implicitly a function of the Q-values: iterate over actions and pick the one that maximizes expected value.</p><p>Even in a continuous state space, the idea still applies. Define a parameterized Q-function Q<sub>&#952;</sub>(s,a) and an implicit greedy policy</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\pi(s) := \\arg\\max_a Q_\\theta(s,a)&quot;,&quot;id&quot;:&quot;GWLUPNUXWI&quot;}" data-component-name="LatexBlockToDOM"></div><p>Define a Bellman-type residual when you sample an action (a):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;J(\\theta) := \\big[r + \\gamma \\max_{a&#8217;} Q_\\theta(s&#8217;,a&#8217;)\\big] - Q_\\theta(s,a)&quot;,&quot;id&quot;:&quot;LFWHQGHGGY&quot;}" data-component-name="LatexBlockToDOM"></div><p>You can take gradient steps on Q to reduce this residual, typically by minimizing its square. In practice, the target term (r + &#947; max<sub>a&#8217;</sub>Q<sub>&#952;</sub>(s&#8217;,a&#8217;)) is computed with a stale copy &#952;<sup>-</sup> to reduce instability from target chasing due to stochastic rewards and transitions. This is a Deep Q-Network (DQN).</p><p>That works for discrete action spaces. In continuous action spaces, computing max<sub>a</sub>Q<sub>&#952;</sub>(s, a) is generally intractable. This motivates learning the policy directly rather than inferring it from Q-values. We introduce a policy over a continuous action space, that is, a probability density function. Let &#960;<sub>&#952;</sub>(a | s) be a policy parameterized by &#952;, for example the parameters of a Gaussian. If we can properly define a loss function, we can optimize &#952; using SGD or Adam.</p><p>Introducing the policy gradient methods. These are the methods you&#8217;ll often hear about if you scroll X. In this post, I implement a couple of these methods on the <em>Pendulum</em> environment.</p><p>Code: <a href="https://github.com/neelsomani/policy-gradient">https://github.com/neelsomani/policy-gradient</a></p><h2><strong>REINFORCE: Policy-Gradient Derivation</strong></h2><p>This is a common derivation which you can find in many places. Let:</p><ul><li><p>&#960;<sub>&#952;</sub> be the policy,</p></li><li><p>&#964; = [(s<sub>1</sub>, a<sub>1</sub>), &#8230;, (s<sub>n</sub>, a<sub>n</sub>)] be a trajectory,</p></li><li><p>G<sub>&#964;</sub> be the discounted return of &#964;,</p></li><li><p>&#981;(&#964;) = &#8719;<sub>t=1,&#8230;,n</sub>P(s<sub>t+1</sub> | s<sub>t</sub>, a<sub>t</sub>) &#8719;<sub>t=1,&#8230;,n</sub> &#960;<sub>&#952;</sub>(a<sub>t</sub> | s<sub>t</sub>) be the probability of &#964;.</p></li></ul><p>Then we can define our objective as:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;J(\\theta) = \\mathbb{E}_\\tau[G_\\tau] = \\sum_\\tau \\phi(\\tau)G_\\tau&quot;,&quot;id&quot;:&quot;UPZODETXHD&quot;}" data-component-name="LatexBlockToDOM"></div><p>The gradient of the objective function is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\nabla_\\theta J(\\theta) = \\sum_\\tau \\Big(\\prod_{t=1}^{n} P(s_{t+1}\\mid s_t,a_t)\\Big) \\nabla_\\theta \\Big(\\prod_{t=1}^{n} \\pi_\\theta(a_t\\mid s_t)\\Big) G_\\tau.&quot;,&quot;id&quot;:&quot;FOGUHUIRQA&quot;}" data-component-name="LatexBlockToDOM"></div><p>But we don&#8217;t want to compute the product rule across &#8719;<sub>t=1,&#8230;,n</sub> &#960;<sub>&#952;</sub>(a<sub>t</sub> | s<sub>t</sub>). The classic way to get around that is using the log-trick, &#8711;f = f * &#8711;log(f):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\n\\nabla_\\theta J(\\theta)\n&amp;= \\sum_{\\tau}\n   \\Bigg(\\prod_{t=1}^{n} P(s_{t+1}\\mid s_t,a_t)\\Bigg)\n   \\Bigg(\\prod_{t=1}^{n} \\pi_\\theta(a_t\\mid s_t)\\Bigg)\n   \\nabla_\\theta \\sum_{t=1}^{n} \\log \\pi_\\theta(a_t\\mid s_t)\\, G_\\tau \\\\\n&amp;= \\mathbb{E}_{\\tau}\\Bigg[ \\sum_{t=1}^{n} \\nabla_\\theta \\log \\pi_\\theta(a_t\\mid s_t)\\, G_\\tau \\Bigg]\n\\end{aligned}\n&quot;,&quot;id&quot;:&quot;KQRJKTTVWR&quot;}" data-component-name="LatexBlockToDOM"></div><p>To reduce the variance of the gradient, we use the equivalent unbiased estimator that pushes the return inside the sum:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\nabla_\\theta J(\\theta) = \\mathbb{E}_\\tau \\Big[\\sum_{t=1}^{n} \\nabla_\\theta \\log \\pi_\\theta(a_t\\mid s_t) G_\\tau\\Big]&quot;,&quot;id&quot;:&quot;OCMFBQHCIV&quot;}" data-component-name="LatexBlockToDOM"></div><h3><strong>The Causality Argument</strong></h3><p>We now justify focusing on the return from time t onward. First expand the trajectory expectation as a tower of expectations:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\n\\mathbb{E}_{\\tau}[f(\\tau)]\n&amp;= \\sum_{\\tau}\n   \\Bigg(\\prod_{t=1}^{n} P(s_{t+1} \\mid s_t, a_t)\\Bigg)\n   \\Bigg(\\prod_{t=1}^{n} \\pi_\\theta(a_t \\mid s_t)\\Bigg)\n   f(\\tau) \\\\\n&amp;= \\mathbb{E}_{a_1 \\sim \\pi(\\cdot \\mid s_1)}\n   \\mathbb{E}_{s_2 \\sim P(\\cdot \\mid s_1, a_1)}\n   \\mathbb{E}_{a_2 \\sim \\pi(\\cdot \\mid s_2)} \\cdots\n   \\big[f(\\tau)\\big].\n\\end{aligned}&quot;,&quot;id&quot;:&quot;QBUBJIPYFS&quot;}" data-component-name="LatexBlockToDOM"></div><p>For any fixed t,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla_\\theta\\left[\\log \\pi_\\theta(a_t\\mid s_t)G_\\tau\\right] = \\nabla_\\theta\\left[\\log \\pi_\\theta(a_t\\mid s_t) \\big(\\text{const} + \\sum_{k=t}^{n} \\gamma^{k-1} r_k\\big)\\right]&quot;,&quot;id&quot;:&quot;DFPJFFTQSV&quot;}" data-component-name="LatexBlockToDOM"></div><p>and the &#8220;const&#8221; term depends only on (s<sub>1</sub>, a<sub>1</sub>), &#8230;, (s<sub>t-1</sub>, a<sub>t-1</sub>). Taking the conditional expectation over a<sub>t</sub> ~ &#960;<sub>&#952;</sub>( . | s<sub>t</sub>) and using the log-trick in reverse,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{E}_{a_t}\\big[\\nabla\\theta \\log \\pi_\\theta(a_t\\mid s_t)\\big] = \\nabla_\\theta \\sum_{a_t} \\pi_\\theta(a_t\\mid s_t) = \\nabla_\\theta 1 = 0&quot;,&quot;id&quot;:&quot;ZMCMGDVUCA&quot;}" data-component-name="LatexBlockToDOM"></div><p>so all terms prior to t vanish in expectation. Define the Monte Carlo return from t:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; G_t := \\sum_{k=t}^{n} \\gamma^{k-1} r_k&quot;,&quot;id&quot;:&quot;DKTYZQEVUM&quot;}" data-component-name="LatexBlockToDOM"></div><p>then:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla_\\theta J(\\theta) = \\mathbb{E}_\\tau\\Big[\\sum_{t=1}^{n} \\nabla_\\theta \\log \\pi_\\theta(a_t\\mid s_t) G_t\\Big]&quot;,&quot;id&quot;:&quot;ZQBYJZFNUF&quot;}" data-component-name="LatexBlockToDOM"></div><p>This argument is often called causality.</p><h3><strong>Implementing REINFORCE in PyTorch</strong></h3><p>In PyTorch, you don&#8217;t pass gradients directly. You define a loss built from PyTorch primitives. Anything that has parameters that you need to differentiate the loss with respect to must be written using PyTorch&#8217;s primitives. A common surrogate for the objective above is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; L(\\theta) := - \\sum_{t} \\log \\pi_\\theta(a_t\\mid s_t) G_t&quot;,&quot;id&quot;:&quot;VWKOCXPWIF&quot;}" data-component-name="LatexBlockToDOM"></div><p>which satisfies:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla_\\theta L(\\theta) = -\\nabla_\\theta J(\\theta)&quot;,&quot;id&quot;:&quot;RFQHAJEPFD&quot;}" data-component-name="LatexBlockToDOM"></div><p>As long as we sample trajectories in an unbiased way, we are optimizing with respect to an unbiased estimate of &#8711;<sub>&#952;</sub>&#8203;J(&#952;).</p><p>How do we represent &#960;(a | s) in continuous action spaces? We could try to build a model that takes (s, a) and outputs a probability, but a pdf must be non-negative and integrate to 1. Neural nets output arbitrary real numbers. With a discrete action set we could normalize with a softmax, but that does not extend to a continuum of actions. Instead, we make the network output the parameters of a distribution, for example a Gaussian with mean &#956;<sub>&#952;</sub>&#8203;&#8203;(s) and scale &#963;<sub>&#952;</sub>&#8203;&#8203;(s), then sample from it.</p><p>For <em>Pendulum</em>, actions lie in (-2, 2). One method to output within those bounds: A tanh head gives (-1, 1), which we scale to (-2, 2).</p><p>We also need &#963; &gt; 0. Rather than predict &#963; directly, predict log(&#963;) and map it with exponentiation or softplus.</p><p>There are a ton of tricks like this to enforce bounds.</p><ul><li><p>Just use the raw head if you&#8217;re down to output in across all of R</p></li><li><p>tanh or sigmoid if you want to keep it within a range and ensure its differentiable</p></li><li><p>Clip if you don&#8217;t care if it&#8217;s differentiable outside the range (common for log(&#963;))</p></li><li><p>Exponentiate or softplus to make it (0, inf)</p></li></ul><p>Typically in PyTorch, the module&#8217;s forward method returns deterministic parameters (&#956;, &#963;), and sampling happens in a separate method.</p><p>Still, even with a correct REINFORCE, convergence can be slow.</p><h2><strong>Baselines and the justification for A2C</strong></h2><p>From the original REINFORCE paper, subtracting a baseline B(s<sub>t</sub>) leaves the gradient unbiased:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla_\\theta J(\\theta) = \\mathbb{E}_\\tau \\Big[\\sum_t \\nabla\\theta \\log \\pi_\\theta(a_t\\mid s_t) (G_t - B(s_t))\\Big]&quot;,&quot;id&quot;:&quot;ZYWXLTFVJC&quot;}" data-component-name="LatexBlockToDOM"></div><p>Proof:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{E}_\\tau[\\nabla\\theta \\log \\pi_\\theta(a_t\\mid s_t) B(s_t)] = \\mathbb{E}_{s_t}\\left[ B(s_t) \\mathbb{E}_{a_t\\sim \\pi}[\\nabla_\\theta \\log \\pi_\\theta(a_t\\mid s_t)] \\right] = 0&quot;,&quot;id&quot;:&quot;VGIYWGFVXF&quot;}" data-component-name="LatexBlockToDOM"></div><p>by the same reasoning as the causality argument above.</p><p>Baselines can reduce the variance of the gradient computation. Choosing B(s<sub>t</sub>) = V<sup>&#960;</sup>(s<sub>t</sub>) yields:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\nabla_\\theta J(\\theta) = \\mathbb{E}_\\tau \\Big[\\sum_t \\nabla\\theta \\log \\pi_\\theta(a_t\\mid s_t) (G_t - V^\\pi(s_t))\\Big]&quot;,&quot;id&quot;:&quot;WTZQGKSNET&quot;}" data-component-name="LatexBlockToDOM"></div><p>where A(s, a) = Q(s,a) - V<sup>&#960;</sup>(s) is called the &#8220;advantage&#8221;. Estimating V<sup>&#960;</sup> with another model, called a critic network, gives the actor&#8211;critic framework.</p><p>Practical notes from my final implementation for <em>Pendulum</em>:</p><ul><li><p>Full Monte Carlo returns had too much variance, so I used TD(0) targets for the critic.</p></li><li><p>Second, I found the algorithm was highly sensitive to &#947;. &#947;=0.99 did not converge, while &#947;=0.9 did. The learning rate for the optimizer barely mattered.</p></li><li><p>Finally, the log standard deviation was not learning properly, and the recommendation in this <a href="https://colab.research.google.com/github/MrSyee/pg-is-all-you-need/blob/master/01.A2C.ipynb">notebook</a> helped by using a softplus stabilization.</p></li></ul><h2><strong>From TRPO to PPO</strong></h2><h3><strong>Motivation: Reusing Data</strong></h3><p>Suppose you compute a batch of trajectories under &#960;<sub>old</sub> to estimate the policy gradient:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\nabla_\\theta J(\\theta) = \\mathbb{E}_\\tau\\Big[\\sum_t \\nabla\\theta \\log \\pi_\\theta(a_t\\mid s_t) G_t\\Big]&quot;,&quot;id&quot;:&quot;JXYRCCTQJC&quot;}" data-component-name="LatexBlockToDOM"></div><p>All of that work gives you a single update to &#952;. After you update, the batch is no longer on-policy. To reuse the data, you would need to reweight old observations so that expectations match those under &#960;<sub>new</sub>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\mathbb{E}_{\\tau\\sim \\pi{\\text{new}}}[G_\\tau] = \\mathbb{E}_{\\tau\\sim \\pi{\\text{old}}}\\Big[\\Big(\\prod_t \\frac{\\pi_{\\text{new}}(a_t\\mid s_t)}{\\pi_{\\text{old}}(a_t\\mid s_t)}\\Big) G_\\tau\\Big]&quot;,&quot;id&quot;:&quot;OZOREQGOIH&quot;}" data-component-name="LatexBlockToDOM"></div><p>but the product of ratios has high variance.</p><h3><strong>Performance Difference Lemma</strong></h3><p>To simplify things, we&#8217;re going to define the &#8220;discounted state visitation&#8221; distribution:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;d^\\pi(s) = (1-\\gamma) \\sum_{t=0}^\\infty \\gamma^t \\Pr(s_t=s \\mid \\pi)&quot;,&quot;id&quot;:&quot;IRJFKGRQLP&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then, as we&#8217;ll prove in this section, here&#8217;s what called the &#8220;performance difference lemma&#8221;:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;J(\\pi_{\\text{new}}) - J(\\pi_{\\text{old}})\n= \\frac{1}{1 - \\gamma} \\,\n\\mathbb{E}_{s \\sim d^{\\pi_{\\text{new}}},\\, a \\sim \\pi_{\\text{new}}}\n\\big[ A^{\\pi_{\\text{old}}}(s, a) \\big].&quot;,&quot;id&quot;:&quot;GSNVOTUYWR&quot;}" data-component-name="LatexBlockToDOM"></div><p>Notice you&#8217;re taking an expectation over &#960;<sub>new</sub>, but you&#8217;re computing the advantages based on &#960;<sub>old</sub>.</p><p>First, note that:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;J(\\pi) = \\mathbb{E}_{s_0}\\big[ V^{\\pi}(s_0) \\big]&quot;,&quot;id&quot;:&quot;VNNYBRQOTM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\nJ(\\pi_{\\text{new}}) - J(\\pi_{\\text{old}})\n&amp;= \\mathbb{E}_{s_0}\\!\\left[ V^{\\pi_{\\text{new}}}(s_0) - V^{\\pi_{\\text{old}}}(s_0) \\right] \\\\\n&amp;= \\mathbb{E}_{s_0,\\,a_0 \\sim \\pi_{\\text{new}}(\\cdot \\mid s_0)}\\!\n   \\left[ r(s_0,a_0) + \\gamma \\,\\mathbb{E}_{s_1 \\sim P(\\cdot \\mid s_0,a_0)} \\big[ V^{\\pi_{\\text{new}}}(s_1) \\big]\n          - V^{\\pi_{\\text{old}}}(s_0) \\right]\n\\end{aligned}&quot;,&quot;id&quot;:&quot;GPLHXBUAUW&quot;}" data-component-name="LatexBlockToDOM"></div><p>Now, we&#8217;ll use an add &amp; subtract trick to expose the advantage within the expectation:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\n&amp;= \\mathbb{E}_{s_0,\\,a_0 \\sim \\pi_{\\text{new}}}\\!\n   \\left[\n      \\underbrace{ r(s_0,a_0) + \\gamma \\,\\mathbb{E}_{s_1}[ V^{\\pi_{\\text{old}}}(s_1) ] - V^{\\pi_{\\text{old}}}(s_0) }_{=\\,A^{\\pi_{\\text{old}}}(s_0,a_0)}\n   \\right] \\\\\n&amp;\\qquad + \\gamma \\,\\mathbb{E}_{s_0,\\,a_0 \\sim \\pi_{\\text{new}},\\,s_1 \\sim P(\\cdot \\mid s_0,a_0)}\n   \\left[ V^{\\pi_{\\text{new}}}(s_1) - V^{\\pi_{\\text{old}}}(s_1) \\right] \\\\\n&amp;= \\mathbb{E}_{s_0,\\,a_0 \\sim \\pi_{\\text{new}}}\\!\\left[ A^{\\pi_{\\text{old}}}(s_0,a_0) \\right]\n   \\;+\\; \\gamma \\,\\mathbb{E}_{s_1,\\,a_1 \\sim \\pi_{\\text{new}}}\\!\\left[ V^{\\pi_{\\text{new}}}(s_1) - V^{\\pi_{\\text{old}}}(s_1) \\right]\n\\end{aligned}&quot;,&quot;id&quot;:&quot;WTWQLPGOLL&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is useful because we can write:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; J(\\theta_{\\text{new}}) = J(\\theta_{\\text{old}}) + \\frac{1}{1-\\gamma} \\mathbb{E}_{s\\sim d^{\\pi{\\text{new}}}, a\\sim \\pi_{\\text{new}}}\\big[A^{\\pi_{\\text{old}}}(s, a)\\big]&quot;,&quot;id&quot;:&quot;HWCCPQMMNM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Maximizing J(&#952;<sub>new</sub>) amounts to maximizing:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{E}_{s\\sim d^{\\pi_{\\text{new}}}, a\\sim \\pi_{\\text{new}}}[A^{\\pi_{\\text{old}}}(s,a)]&quot;,&quot;id&quot;:&quot;BODYDKXEJQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>since the first term is a constant and 1/(1-&#947;) is a scale. The issue is that this expectation relies on &#960;<sub>new</sub> (in the distributions of both the actions and the states), which would require resampling.</p><p>In theory we could importance weight both the state distribution and the action distribution:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\mathbb{E}_{s\\sim d^{\\pi_{\\text{new}}}, a\\sim \\pi_{\\text{new}}}[A^{\\pi_{\\text{old}}}(s,a)] = \\mathbb{E}_{s\\sim d^{\\pi_{\\text{old}}}, a\\sim \\pi_{\\text{old}}} \\left[\\frac{d^{\\pi_{\\text{new}}}(s)}{d^{\\pi_{\\text{old}}}(s)} \\cdot \\frac{\\pi_{\\text{new}}(a\\mid s)}{\\pi_{\\text{old}}(a\\mid s)} \\cdot A^{\\pi_{\\text{old}}}(s,a)\\right]&quot;,&quot;id&quot;:&quot;MBIOOQCNTS&quot;}" data-component-name="LatexBlockToDOM"></div><p>We do not have access to d<sup>&#960;</sup>. Instead, TRPO assumes:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;d^{\\pi_{\\text{new}}} \\sim d^{\\pi_{\\text{old}}}&quot;,&quot;id&quot;:&quot;HLMBTCODLU&quot;}" data-component-name="LatexBlockToDOM"></div><p>While we cannot enforce this directly, we can bound |d<sup>&#960;_new</sup> - d<sup>&#960;_old</sup>| by controlling a policy divergence. We bound the following expression:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\max_s D_{\\mathrm{KL}}(\\pi_{\\text{old}}(\\cdot\\mid s)|\\pi_{\\text{new}}(\\cdot\\mid s))&quot;,&quot;id&quot;:&quot;NICOJAQFLT&quot;}" data-component-name="LatexBlockToDOM"></div><p>which allows the authors to get a lower bound on J(&#960;<sub>new</sub>). In practice we constrain the expected KL under d<sup>&#960;_old</sup>, which is tractable.</p><p>With the state distribution approximated as unchanged, the remaining scaling &#960;<sub>new</sub>/&#960;<sub>old </sub>is called &#8220;importance sampling.&#8221; TRPO&#8217;s final surrogate and constraint become:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\max_{\\theta} \\;\n\\mathbb{E}_{s \\sim d^{\\pi_{\\text{old}}},\\, a \\sim \\pi_{\\text{old}}}\n\\!\\left[\n\\frac{\\pi_{\\theta}(a \\mid s)}{\\pi_{\\text{old}}(a \\mid s)} \\,\nA^{\\pi_{\\text{old}}}(s,a)\n\\right]\n\\quad\n\\text{s.t.} \\quad\n\\mathbb{E}_{s \\sim d^{\\pi_{\\text{old}}}}\n\\!\\big[\nD_{\\mathrm{KL}}\\big(\n\\pi_{\\text{old}}(\\cdot \\mid s)\n\\;\\|\\;\n\\pi_{\\theta}(\\cdot \\mid s)\n\\big)\n\\big]\n\\;\\le\\;\n\\delta&quot;,&quot;id&quot;:&quot;UBMADYAUPI&quot;}" data-component-name="LatexBlockToDOM"></div><h2><strong>Proximal Policy Optimization (PPO)</strong></h2><h3><strong>PPO-Penalty</strong></h3><p>The first variant of PPO comes directly from the Lagrangian of the TRPO objective with &#955; &gt; 0:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathcal{L}(\\theta,\\lambda) = \\mathbb{E}\\left[\\frac{\\pi_\\theta}{\\pi_{\\text{old}}} * A\\right] - \\lambda\\Big(\\mathbb{E}[D_{\\mathrm{KL}}(\\pi_{\\text{old}}|\\pi_\\theta)] - \\delta\\Big).&quot;,&quot;id&quot;:&quot;ZOLXZMBJZL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Dropping the constant &#955; * &#948; and defining &#946; := &#955;, we arrive at the standard form:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\max_\\theta \\mathbb{E}\\left[\\frac{\\pi_\\theta}{\\pi_{\\text{old}}} * A\\right] - \\beta *  \\mathbb{E}\\left[D_{\\mathrm{KL}}(\\pi_{\\text{old}}|\\pi_\\theta)\\right]&quot;,&quot;id&quot;:&quot;CZVQCXTHSM&quot;}" data-component-name="LatexBlockToDOM"></div><p>In practice, &#946; is adapted so the empirical KL stays close to &#948;.</p><h3><strong>PPO-Clip</strong></h3><p>PPO-Clip takes a slightly different approach to staying within the trust region. Consider the importance ratio for a single sample:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; r_t(\\theta) := \\frac{\\pi_\\theta(a_t\\mid s_t)}{\\pi_{\\text{old}}(a_t\\mid s_t)}&quot;,&quot;id&quot;:&quot;PWXCOCJMNN&quot;}" data-component-name="LatexBlockToDOM"></div><p>Instead of constraining by the mean KL, we could just make it so the model doesn&#8217;t reward adjusting the policy &#960;<sub>new</sub> when it deviates wildly from &#960;<sub>old</sub>. You can imagine it is possible to prove a bound on KL divergence if &#960;<sub>new</sub> ~ &#960;<sub>old</sub>. If we could enforce per-sample constraints, we would maximize:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;r_t A_t \\quad\\text{s.t.}\\quad 1-\\varepsilon \\le r_t \\le 1+\\varepsilon&quot;,&quot;id&quot;:&quot;LEQVLXNZEM&quot;}" data-component-name="LatexBlockToDOM"></div><p>But it&#8217;s hard to jointly impose that many constraints over a single &#952;. Instead, PPO modifies the <em>objective</em> so there is no incentive to push r<sub>t</sub> outside the interval. A naive attempt is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\mathbb{E}\\big[\\mathrm{clip}(r_t,1-\\varepsilon,1+\\varepsilon) * A_t\\big]&quot;,&quot;id&quot;:&quot;SJJGMFWZPD&quot;}" data-component-name="LatexBlockToDOM"></div><p>But this surrogate can overestimate the true objective in two cases:</p><ul><li><p>A<sub>t </sub>&lt; 0 and r<sub>t </sub>&gt; 1 + &#949; where the penalty is capped at (1 + &#949;) * A<sub>t</sub> but should be more negative, and</p></li><li><p>A<sub>t </sub>&gt; 0 and r<sub>t </sub>&lt; 1 - &#949; where the reward should be smaller than (1 - &#949;) * A<sub>t</sub>.</p></li></ul><p>We need the surrogate to only underestimate the true objective, because that ensures that maximizing the surrogate also maximizes a lower bound on the objective. The conservative surrogate fixes both by lower bounding the unclipped objective:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; L_{\\text{clip}}(\\theta) = \\mathbb{E}\\big[\\min\\big(r_t(\\theta)A_t, \\mathrm{clip}(r_t(\\theta),1-\\varepsilon,1+\\varepsilon) * A_t\\big)\\big]&quot;,&quot;id&quot;:&quot;GSPVMIUYVG&quot;}" data-component-name="LatexBlockToDOM"></div><p>This discourages large deviations from &#960;<sub>old</sub>. In practice we also track the mean KL over the batch and stop early if it exceeds the target &#948;. And that&#8217;s the second variant of PPO.</p><h2><strong>Scaling</strong></h2><p>The policies above assume trajectories are sampled on-policy from the current &#960;. At scale, actors may be lagged or the data may be offline.</p><p>In future posts, I plan to cover off-policy methods such as DDPG, TD3, and SAC. I also plan to write a primer on incorporating human feedback using GRPO and non-RL approaches like DPO.</p><p>If you liked this material and want a reference for these algorithms and more, I recommend: <a href="https://lilianweng.github.io/posts/2018-04-08-policy-gradient/">Lilian Weng&#8217;s overview of policy gradients</a></p>]]></content:encoded></item><item><title><![CDATA[What You Didn’t Learn in Berkeley CS 188 — Part 1]]></title><description><![CDATA[Why isn&#8217;t there model-free policy iteration?]]></description><link>https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Sat, 04 Oct 2025 02:00:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rxn4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Berkeley&#8217;s CS 188 covers many important foundations of reinforcement learning. But there&#8217;s still a gap between what&#8217;s taught in that undergraduate course and the baseline expected if you&#8217;re working in the field.</p><p><a href="https://inst.eecs.berkeley.edu/~cs188/su24/">Berkeley&#8217;s course</a> covers, in no particular order:</p><ul><li><p>Basic search algorithms</p></li><li><p>Constraint satisfaction problems (CSPs)</p></li><li><p>Minimax (with alpha-beta pruning)</p></li><li><p>Bayes nets</p></li><li><p>Markov Decision Process (MDP) definition</p></li><li><p>Policy iteration</p></li><li><p>Q-learning (and some variations)</p></li></ul><p>This material is foundational, but the way it&#8217;s taught often feels fragmented. My goal here is to reorganize the basics into a clearer ontology that naturally sets up modern, continuous-control methods. The information hierarchy, I&#8217;d argue, could be sharper than what&#8217;s presented in CS 188.</p><p>This post is the first part in a series on reinforcement learning. Later posts will cover <a href="https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-b29">continuous control</a>, <a href="https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-9b3">off-policy methods</a>, and <a href="https://www.neelsomaniblog.com/p/what-you-didnt-learn-in-berkeley-242">RL for post-training</a>.</p><h2>CS 188 Recap: Markov Decision Process Definition</h2><p>If you already remember the basics from 188, you can skip this section. Reinforcement learning is typically formalized as a Markov Decision Process (MDP). The MDP specifies the environment:</p><ul><li><p><strong>States S</strong>: possible configurations of the world.</p></li><li><p><strong>Actions A</strong>: moves the agent can take.</p></li><li><p><strong>Transitions P(s&#8217; | s, a)</strong>: probability of landing in s&#8217; after taking action a in s.</p></li><li><p><strong>Rewards R(s, a, s&#8217;)</strong>: immediate payoff for (s, a) &#8594; s&#8217;.</p></li><li><p><strong>Discount &#947; &#8712; [0, 1)</strong>: how much you value the future.</p></li></ul><p>Those five define the problem itself. On the agent side, we define constructs that depend on the MDP:</p><ul><li><p><strong>Policy &#960;</strong>: a mapping from states to actions.</p></li><li><p><strong>Value function V<sup>&#960;</sup>(s)</strong>: expected discounted return from state s under &#960;.</p></li><li><p><strong>Q-function Q<sup>&#960;</sup>(s, a)</strong>: expected return from (s, a) under &#960;. If no &#960; is present, then Q refers to the Q-values estimates that we have established so far.</p></li></ul><h2>A Clearer Ontology</h2><p>CS 188 distinguishes &#8220;model-based&#8221; vs. &#8220;model-free&#8221; methods:</p><ul><li><p><strong>Model-based</strong>: assumes access to the transition probabilities and rewards (P and R).</p></li><li><p><strong>Model-free</strong>: learns from sampled experience without ever observing P or R directly.</p></li></ul><p>Another useful axis is &#8220;policy-based&#8221; vs. &#8220;value-based&#8221;:</p><ul><li><p><strong>Policy-based</strong>: directly solve for the optimal policy, then improve the value estimates by following that policy, repeating until convergence. (Many methods also use value estimates as baselines or for other purposes, e.g. actor-critic.)</p></li><li><p><strong>Value-based</strong>: solve V or Q directly until convergence, using the greedy policy that maximizes the expected value of the next state:</p></li></ul><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\pi(s) := \\arg\\max_a Q(s,a) = \\sum_{s&#8217;} P(s&#8217; \\mid s,a) *\\big(R(s,a,s&#8217;) + \\gamma V(s&#8217;)\\big)&quot;,&quot;id&quot;:&quot;GFBHAIJXZO&quot;}" data-component-name="LatexBlockToDOM"></div><p>So we have two orthogonal axes: model-based vs. model-free, and value-based vs. policy-based. Together they give us a 2-by-2 view of classical RL methods:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rxn4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rxn4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rxn4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rxn4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rxn4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rxn4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg" width="1456" height="909" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:909,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:159911,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.neelsomaniblog.com/i/175240248?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rxn4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rxn4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rxn4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rxn4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98848110-b17b-4c6b-9727-4416a6797ff7_2531x1580.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the ontology I propose. The last quadrant, model-free policy iteration, is the most interesting, and we&#8217;ll work our way toward it.</p><h2>Value Iteration</h2><p>Value iteration iteratively updates the value function until convergence. When you know P and R, the &#8220;Bellman optimality update&#8221; is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V(s) \\leftarrow \\max_a \\sum_{s&#8217;} P(s&#8217; \\mid s,a) * \\big(R(s,a,s&#8217;) + \\gamma V(s&#8217;)\\big)&quot;,&quot;id&quot;:&quot;FBFCIHPTCD&quot;}" data-component-name="LatexBlockToDOM"></div><p>In other words, the value of a state is the maximum expected reward + discounted value of the next state. It&#8217;s implicitly summing an infinite discounted series. If V* is the fixed point of the update operator above, then:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^*(s) = \\mathbb{E}[R(s,a,s')] + \\gamma \\mathbb{E}[R(s', a', s&#8217;')] + \\gamma^2 \\mathbb{E}[R(s'', a&#8217;'s''')] + \\cdots&quot;,&quot;id&quot;:&quot;APLJIKBTJP&quot;}" data-component-name="LatexBlockToDOM"></div><h3>Bellman Operator and Contraction</h3><p>It makes sense that if we can solve for the fixed point V*, then we can define the optimal policy by greedily following whichever action maximizes the expected value of the next state. But how do we prove that the iterative process above actually converges?</p><p>To do so, we define the &#8220;Bellman optimality operator&#8221; (T) - the new value function if you perform a single iteration above:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;(TV)(s) = \\max_a \\sum_{s'} P(s' \\mid s,a) * \\big(R(s,a,s') + \\gamma V(s')\\big).&quot;,&quot;id&quot;:&quot;RLZAUTZYBM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then for any two value functions V and W:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align*}\n|(TV)(s) - (TW)(s)|\n&amp;= \\Big| \\max_{a} \\mathbb{E}_{s'}[R(s,a,s') + \\gamma V(s')] \n    - \\max_{a} \\mathbb{E}_{s'}[R(s,a,s') + \\gamma W(s')] \\Big| \\\\\n&amp;\\leq \\max_{a} \\Big| \\mathbb{E}_{s'}[R(s,a,s') + \\gamma V(s')] \n                 - \\mathbb{E}_{s'}[R(s,a,s') + \\gamma W(s')] \\Big| \\\\\n&amp;= \\max_{a} \\Big| \\mathbb{E}_{s'}[\\gamma (V(s') - W(s'))] \\Big| \\\\\n&amp;\\leq \\gamma \\max_{a} \\mathbb{E}_{s'}\\big[|V(s') - W(s')|\\big] \\\\\n&amp;\\leq \\gamma \\max_{s'} |V(s') - W(s')| \\\\\n&amp;= \\gamma \\, \\|V - W\\|_\\infty.\n\\end{align*}&quot;,&quot;id&quot;:&quot;QQNCWUATZY&quot;}" data-component-name="LatexBlockToDOM"></div><p>And since we didn&#8217;t specify which s:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\quad \\|TV - TW\\|_\\infty \\;\\leq\\; \\gamma \\, \\|V - W\\|_\\infty.&quot;,&quot;id&quot;:&quot;BILCEHCEFC&quot;}" data-component-name="LatexBlockToDOM"></div><p>where that infinity operator refers to the maximum distance between any two states. In other words, when you apply the Bellman update operator to any two value functions, the resulting value functions are closer together.</p><p>That matters because you can now show that T must converge to a single fixed point. The proof is simple. Assume that there are two possible fixed points. Then the update above moves them closer together, meaning that there cannot be any non-zero distance between the points. To show existence, we need to know that if you keep getting closer and closer to some limit, then that limit is still a valid value function. For finite state spaces this is obvious because value functions are just real vectors, and in R<sup>n</sup> every Cauchy sequence converges. This is called the &#8220;Banach fixed point theorem&#8221;.</p><h3>Iterative Expansion</h3><p>Notice that after one application:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;TV(s) = \\mathbb{E}_{s&#8217;}[R(s,a_1,s&#8217;)] + \\gamma \\mathbb{E}_{s&#8217;}[V(s&#8217;)]&quot;,&quot;id&quot;:&quot;AJVCKCBNDP&quot;}" data-component-name="LatexBlockToDOM"></div><p>where a<sub>1 </sub>is the action recommended by the greedy policy given V. After two iterations:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;T^2V(s) = \\mathbb{E}_{s''}[R(s,a_2,s'')] + \\gamma \\mathbb{E}_{s'}[R(s,a_1,s')] + \\gamma^2 \\mathbb{E}_{s'}[V(s')]&quot;,&quot;id&quot;:&quot;XZJAEXEGQH&quot;}" data-component-name="LatexBlockToDOM"></div><p>Each step adds one more discounted term. Early actions can be wrong, but their contribution shrinks geometrically. Eventually you converge to V*. In practice, you can truncate after k terms.</p><h2>Policy Iteration</h2><p>Now the policy-based, model-based quadrant. Unlike value iteration, here we separate policy evaluation from policy improvement:</p><p>1. <strong>Policy evaluation</strong>: solve</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^{\\pi}(s) = \\mathbb{E}_{s&#8217;}\\big[R(s,\\pi(s),s&#8217;) + \\gamma V^{\\pi}(s&#8217;)\\big]&quot;,&quot;id&quot;:&quot;PHCFZLMIVL&quot;}" data-component-name="LatexBlockToDOM"></div><p>Unlike value iteration, this is a linear system of equations that you can invert directly, since<strong> </strong>&#960;<strong> </strong>is known.</p><p>2. <strong>Policy improvemen</strong>t: set</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot; \\pi&#8217;(s) = \\arg\\max_a \\mathbb{E}_{s&#8217;}\\big[R(s,a,s&#8217;) + \\gamma V^{\\pi}(s&#8217;)\\big]&quot;,&quot;id&quot;:&quot;BVTEABXJKL&quot;}" data-component-name="LatexBlockToDOM"></div><p>And repeat until convergence.</p><h3>Proof of Convergence</h3><p>The first step is showing that V(s) can only increase for any given s.</p><p>Define T<sup>&#960;</sup> as the &#8220;Bellman expectation operator,&#8221; which updates a state-value function V in the &#8220;direction of&#8221; &#960;:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;(T^\\pi V)(s) = \\mathbb{E}_{s&#8217;}[R(s,\\pi(s),s&#8217;) + \\gamma V(s&#8217;)]&quot;,&quot;id&quot;:&quot;NSWCEQPDVA&quot;}" data-component-name="LatexBlockToDOM"></div><p>(Note that V might not have been generated by following &#960;.) By definition, V<sup>&#960;</sup> = T<sup>&#960;</sup> V<sup>&#960;</sup>. Notice that if you apply T<sup>&#960;&#8217; </sup>enough times to any value function V, you&#8217;ll end up with V<sup>&#960;&#8217;</sup>. If &#960;&#8217; is greedy w.r.t. V<sup>&#960;</sup>, then</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^\\pi \\le T^{\\pi&#8217;}V^\\pi \\le (T^{\\pi&#8217;})^2 V^\\pi \\le \\dots \\to V^{\\pi&#8217;}.&quot;,&quot;id&quot;:&quot;RONLDRDRYX&quot;}" data-component-name="LatexBlockToDOM"></div><p>since after enough iterations of T<sup>&#960;&#8217;</sup>, the policy becomes &#960;&#8217;. The first inequality holds because we are using the same state-values, and only deviating if the new action leads to a strictly higher state-value. The second inequality is more subtle, but it relies on the monotonicity of the Bellman update operator. If you have two value functions where V(s) &gt; U(s), then T<sup>&#960;</sup>V(s) &gt; T<sup>&#960;</sup>U(s). This result comes directly from the Bellman operator definition if you compare each term of the expression. We&#8217;re just applying this property to the last two terms of the inequality to produce the next term, infinitely many times.</p><p>So each greedy improvement can only increase value. Since there are finitely many deterministic policies (|A|<sup>|S|</sup>), this process must terminate eventually.</p><h2>Model-Free Value Iteration (Q-Learning)</h2><p>In the real world, we might not know P and R. We just have to start taking actions and figuring it out empirically. This leads us to the &#8220;model-free&#8221; methods. We can&#8217;t directly compute the fixed point from before, because it relies on knowing P and the true reward function R:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;V^{*}(s) = \\max_a \\sum_{s&#8217;} P(s&#8217; \\mid s,a) * \\big(R(s,a,s&#8217;) + \\gamma V^{*}(s&#8217;)\\big)&quot;,&quot;id&quot;:&quot;KWACJIOHNV&quot;}" data-component-name="LatexBlockToDOM"></div><p>So instead we define a new fixed point, which relies on a value function for each state-action pair:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q^*(s,a) = \\mathbb{E}[R(s,a,s&#8217;)] + \\gamma \\mathbb{E}[\\max_{a&#8217;} Q^*(s&#8217;,a&#8217;)]&quot;,&quot;id&quot;:&quot;YXULIBIRMR&quot;}" data-component-name="LatexBlockToDOM"></div><p>In other words, the value of a state-action pair is the expected reward, plus the discounted value of the next state-action pair you&#8217;d find yourself in.</p><p>Naively you might try approximating this by gathering observations:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q(s,a) \\leftarrow r + \\gamma \\max_{a&#8217;} Q(s&#8217;,a&#8217;)&quot;,&quot;id&quot;:&quot;OUQZTFWUXH&quot;}" data-component-name="LatexBlockToDOM"></div><p>But that doesn&#8217;t quite work. r is noisy - it varies based on which s&#8217; you land in. You need an averaging scheme. The natural instinct might be to use a sample mean:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q(s,a) \\leftarrow \\tfrac{1}{t+1}[r+\\gamma\\max_{a&#8217;}Q(s&#8217;,a&#8217;)] + \\tfrac{t}{t+1}Q(s,a)&quot;,&quot;id&quot;:&quot;PJFLWBHGHR&quot;}" data-component-name="LatexBlockToDOM"></div><p>This works, but exploration is non-stationary. Q itself is changing, so early samples are inaccurate. Instead, we use an exponential moving average (EMA):</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q(s,a) \\leftarrow \\alpha[r+\\gamma\\max_{a&#8217;}Q(s&#8217;,a&#8217;)] + (1-\\alpha)Q(s,a)&quot;,&quot;id&quot;:&quot;ZEJRYSWOQM&quot;}" data-component-name="LatexBlockToDOM"></div><p>Iterate, slowly lower alpha, and the process converges. Each update adds another discounted term, just like value iteration.</p><h3>Proof of Convergence</h3><p>This is trickier than proving convergence for the model-based methods, because now it&#8217;s stochastic. The standard proof defines each update as:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q_{t+1}(s_t,a_t)-Q_t(s_t,a_t) = \\alpha [TQ_t + M_{t+1} - Q_t(s_t,a_t)]&quot;,&quot;id&quot;:&quot;EVASGJECEC&quot;}" data-component-name="LatexBlockToDOM"></div><p>where M is zero-mean noise. Ignoring noise, this looks like:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{Q_{t+1}-Q_t}{\\alpha} = TQ_t - Q_t&quot;,&quot;id&quot;:&quot;VINWVLQJGP&quot;}" data-component-name="LatexBlockToDOM"></div><p>Then they define a continuous version of Q called q, which is respect to a new variable tau:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\tau=\\sum\\alpha&quot;,&quot;id&quot;:&quot;CGLCUDHGOJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>In the limit as alpha goes to 0,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{dq}{d\\tau} = Tq - q&quot;,&quot;id&quot;:&quot;HWEUWBIVDD&quot;}" data-component-name="LatexBlockToDOM"></div><p>This ODE is used to demonstrate convergence. The full analysis is beyond the scope of this post.</p><h2>Model-Free Policy Iteration</h2><p>Now let&#8217;s try to apply the same methodology to policy iteration, just as Q-learning stochastically approximated value iteration.</p><p>In principle, we could evaluate Q<sup>&#960;</sup> by sampling, then improve &#960;, and repeat. But exact evaluation by sampling is very slow and requires waiting for convergence each round. Worse, unlike the model-based case, you can&#8217;t invert the system of equations for V<sup>&#960;</sup>. So the advantages of policy iteration vanish in the model-free setting.</p><p>A practical compromise is approximate policy iteration: run only k evaluation steps before improving the policy. This weakens convergence guarantees. With k=1, you get a popular method called SARSA.</p><p>SARSA&#8217;s policy is to follow Q most of the time, except for epsilon probability where you take a random action, called an &#8220;epsilon-greedy policy&#8221;:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\pi(a \\mid s) =\n\\begin{cases}\n1 - \\varepsilon + \\dfrac{\\varepsilon}{|A|}, &amp; a = \\arg\\max_{a'} Q(s, a') \\\\[6pt]\n\\dfrac{\\varepsilon}{|A|}, &amp; \\text{otherwise.}\n\\end{cases}&quot;,&quot;id&quot;:&quot;RVXLITCVAR&quot;}" data-component-name="LatexBlockToDOM"></div><p>We update with:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;Q(s_t,a_t) \\leftarrow (1-\\alpha)Q(s_t,a_t) + \\alpha\\big[r(s_t,a_t,s_{t+1}) + \\gamma Q(s_{t+1},a_{t+1})\\big]&quot;,&quot;id&quot;:&quot;VTJZAJATHN&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is called &#8220;TD(0)&#8221;, or temporal-difference learning with lambda equals 0. Temporal-difference learning allows us to extend further. Instead of just one-step lookahead, you can mix multiple n-step returns G<sub>n</sub>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;G_t^{(\\lambda)} = (1-\\lambda)\\sum_{n=1}^\\infty \\lambda^{n-1} G_n&quot;,&quot;id&quot;:&quot;XALAGQMDMI&quot;}" data-component-name="LatexBlockToDOM"></div><p>In practice, computing this infinite sum is approximated dynamically. Lots of simple algebra goes into deriving the recursion (eligibility traces), but I&#8217;ll skip it here.</p><h2>Wrap-Up</h2><p>This is why you don&#8217;t really see a neat &#8220;model-free policy iteration.&#8221; Without P and R, exact evaluation is gone, and requiring near-converged sample evaluation before every improvement is prohibitively inefficient.</p><p>Is that all we need to know for reinforcement learning? Unfortunately the methods above don&#8217;t work in many domains. They require not &#8220;too big&#8221; of a state-action table, and there&#8217;s no way to handle continuous action spaces. I&#8217;ll cover how you can solve these more complex scenarios in my next post.</p><p>For a comprehensive overview that overlaps CS 188 material and this post, see <a href="https://lilianweng.github.io/posts/2018-02-19-rl-overview/">Lilian Weng&#8217;s excellent writeup</a>.</p>]]></content:encoded></item><item><title><![CDATA[A Free Market for Eyeballs]]></title><description><![CDATA[Many takes on the attention economy, with very few attention economists.]]></description><link>https://www.neelsomaniblog.com/p/a-free-market-for-eyeballs</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/a-free-market-for-eyeballs</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Tue, 29 Jul 2025 02:52:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0dd0767b-71b6-4a5b-a335-2933fbb44827_512x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have a friend who blew up on Twitter/X a few years ago. His account started like mine, just a couple thousand followers, mostly his friends. But by the end of the year, he had so many viral tweets that he was at 100K+.</p><p>I brought this up to my mom, a first-generation immigrant. She said, "So? What has he gotten out of that?" I remember thinking that she was completely out-of-touch, but I had no strong argument to defend my intuition.</p><p>Since then, this kid has met billionaires like Sam Altman, Elon Musk; he's working at a top company making 7-figures per year; and he manages a small fund on the side.</p><p>I think this anecdote says a lot about X and why exactly it's valuable. I used to work at a hedge fund, so I'm cursed to always think in finance terms. <strong>To me, posting on X is a form of regulatory arbitrage, because attention is capital that isn't taxed.</strong></p><h2>The Optics-to-Economics Pipeline</h2><p>Finance moguls often preach about the "Section 1031 exchange," a tax feature that allows you to repeatedly swap your investment properties for more valuable ones, and continuously defer capital gains tax. What I've observed is that the same phenomenon happens with attention on X.</p><p>In some ways, my entrepreneurial career started on X. I was a quant at Citadel at the time, and a shitposter on the side. I left in 2022 at the calling of my wise friends, who advised me to build on the Terra blockchain.</p><p>Just two months later, Terra had collapsed:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9MeI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9MeI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png 424w, https://substackcdn.com/image/fetch/$s_!9MeI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png 848w, https://substackcdn.com/image/fetch/$s_!9MeI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png 1272w, https://substackcdn.com/image/fetch/$s_!9MeI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9MeI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png" width="1202" height="738" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:738,&quot;width&quot;:1202,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9MeI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png 424w, https://substackcdn.com/image/fetch/$s_!9MeI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png 848w, https://substackcdn.com/image/fetch/$s_!9MeI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png 1272w, https://substackcdn.com/image/fetch/$s_!9MeI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3c637d-d6ba-4567-864d-213810967e03_1202x738.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://x.com/neelsomani/status/1525172426803380225">https://x.com/neelsomani/status/1525172426803380225</a></figcaption></figure></div><p>I received condolences via iMessage and DMs. But I wasn't alarmed, because I intuitively knew that I was better off. I had gained an asset: attention.</p><p>Just a few months later, I had raised $15 million for my next project. And I'm not the only one. Roy Lee was a student at Columbia who was expelled for using AI to cheat on his software engineering interviews; the incident went <a href="https://x.com/im_roy_lee/status/1905063484783472859">viral</a>, and he went on to raise a $15 million round from a16z.</p><h2>Reversal of Fortune</h2><p>There's a famous documentary where a homeless man, Ted Rodrigue, is given $100K. Within six months, he unfortunately went back to being broke and living in a tent.</p><p>Examples like the poly-employed software engineer <a href="https://techcrunch.com/2025/07/03/who-is-soham-parekh-the-serial-moonlighter-silicon-valley-startups-cant-stop-hiring/">Soham Parekh</a> are the Ted Rodrigue's of X. Not everyone is built for viral attention, and when you don't know how to manage it, you squander it. Soham's interview on TBPN was <a href="https://x.com/Austen/status/1940947073261858887">widely criticized</a>. His story was inconsistent, and the attention wasn't directed toward anything greater.</p><p>That's not the only way things go wrong. Sometimes people have a huge following, but they channel that attention toward horrible ideas. Even as a founder with attention, you can pick the wrong idea. That's probably the most popular reason why some influencers don't end up converting their attention capital. They just can't figure out how to monetise it.</p><h2>This Scroll Could Change Your Life</h2><p>The value of posting is obvious. You get to build this valuable intangible, untaxable asset called attention. But why are we still scrolling? Who's doing the consuming?</p><p>I'm reminded of a time I was skiing in Aspen with my good friend from college, and we were talking about how when we're with our families, we find it tempting to scroll on our phones. That's obviously a very sad thing - those are our loved ones! But this chart explains it:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MLnr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MLnr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png 424w, https://substackcdn.com/image/fetch/$s_!MLnr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png 848w, https://substackcdn.com/image/fetch/$s_!MLnr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png 1272w, https://substackcdn.com/image/fetch/$s_!MLnr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MLnr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png" width="1306" height="772" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8431672e-355f-49d3-ba63-a754775ff491_1306x772.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:772,&quot;width&quot;:1306,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MLnr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png 424w, https://substackcdn.com/image/fetch/$s_!MLnr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png 848w, https://substackcdn.com/image/fetch/$s_!MLnr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png 1272w, https://substackcdn.com/image/fetch/$s_!MLnr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8431672e-355f-49d3-ba63-a754775ff491_1306x772.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">TRUMP memecoin chart: <a href="https://coinmarketcap.com/currencies/official-trump/">https://coinmarketcap.com/currencies/official-trump/</a></figcaption></figure></div><p>Donald Trump's token (TRUMP) was trading at merely $2 to $3 for almost an hour after it launched. It was dinnertime in California, and people weren't on their phones. Word took a while to spread, but over the next 24 hours, people had moved their capital over and drove the price up to $60-70.</p><p>Who's to say whether the TRUMP coin would go up at that time. But I use it as an example to show the extreme returns of being the first to get asymmetric information. In some sense, every post we read is in search of a metaphorical "TRUMP coin." Every so often, we come across a post that's so valuable that it justifies reading all of the slop. A job opportunity, the release of a hugely time-saving app, a Luma page for an awesome event happening in our city.</p><p>That's our opportunity cost when we're not scrolling. And that's why we're still on X.</p><h2>A Tool To Save You Time</h2><p>I hacked together a small, open-source project called <a href="https://github.com/neelsomani/tweet-insight-daily">Today On Tech Twitter</a>.</p><p>I created an X account that follows what I think is a representative sample of accounts on "Tech Twitter". This website just scrapes the posts on that account's feed every evening and passes the posts to ChatGPT to summarize the latest events.</p><p>I built the website because I feel the primitive is useful in many ways. For the casual scroller, this tool offers an easy way to take a break and later catch up on what you missed. My intention was for the representative sample to include diverse views, to mitigate the possibility of users getting stuck in a "bubble."</p><p>For engineers, this data can be the input to a process that uses AI to generate content related to current events. That might be viral videos, or automated posts from your company account on X. The JSON API is publicly available: <a href="https://www.todayontechtwitter.com/api/s3-data?utc_date=2025-07-29">https://www.todayontechtwitter.com/api/s3-data?utc_date=2025-07-29</a></p><p>X isn't for everyone, but for those who are here, you might as well exploit the arbitrage.</p>]]></content:encoded></item><item><title><![CDATA[The BLAST Playbook]]></title><description><![CDATA[I argue that the $500B-1T in annual software investments should reallocate to BLAST: assets that monetize boredom, loneliness, and scarcity.]]></description><link>https://www.neelsomaniblog.com/p/the-blast-playbook</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/the-blast-playbook</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Mon, 24 Feb 2025 23:53:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9130f087-f151-4139-a1d6-0bd10b1b1b5e_420x300.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A growing consensus is forming: software investing is in a horrible position.</p><p>The software VC model is getting squeezed from both ends:</p><ol><li><p>AI tools like Cursor and Devin have made software fast and cheap to build, meaning great builders don't need to take external capital.</p></li><li><p>Any software product is easily copied, so software ARR is less defensible by the day.</p></li></ol><p>The implication? The $500B-1T+ annual software investing machine now lacks investable companies. <strong>Over the next 3-5 years, software funds will underperform relative to their peers, and many traditional VCs will eventually close shop.</strong></p><p>Two big questions remain:</p><ol><li><p>Where should capital be deployed instead?</p></li><li><p>What should a founder build today?</p></li></ol><p>Spoiler alert: The answers don&#8217;t match.</p><h2>Where To Deploy Capital: BLAST</h2><p>Smart capital already sees the problem. But what assets are able to scalably absorb hundreds of billions of dollars?</p><p>Introducing BLAST: the <strong>Boredom, Loneliness, and Scarcity Thesis</strong>:</p><ul><li><p>Boredom &#8594; People still need distractions (see: TikTok addiction, memecoin speculation).</p></li><li><p>Loneliness &#8594; People still want to feel special and seen (e.g. social communities).</p></li><li><p>Scarcity &#8594; People will pay more for things that others can&#8217;t have (e.g. Birkins, natural resources).</p></li></ul><p>But while there's potential to profit in these sectors, it's difficult to deploy hundreds of billions of dollars into incumbents while achieving outsized returns. Capital needs a way to access BLAST assets at scale, in novel investments.</p><p><strong>The ultimate BLAST investment is an entirely new city or country. </strong>Land is inherently scarce, particularly oceanfront property. New city development opens the door to building luxury housing, but also luxury services. Elite private schools, exclusive gyms, and high-end detox clinics absorb capital while entertaining residents and fostering community.</p><p>Key questions:</p><ul><li><p>Where should these cities be built? Possibilities include land near existing cities, expensive oceanfront locations, or cheap land in the middle-of-nowhere.</p></li><li><p>What makes them unique? These cities could be friendly to choice industries.</p></li><li><p>What productive assets within them can absorb venture-scale capital?</p></li></ul><h2>Software Founders: Castles Without Moats</h2><p>Thin AI wrappers should not be raising venture at all. Raising a $10M Series A with $5M ARR is a bad deal for the founder because it constrains optionality, forcing the founder to shoot for a billion dollar exit instead of pulling as much cash as possible during the (likely limited) shelf life of their product.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k7GO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k7GO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png 424w, https://substackcdn.com/image/fetch/$s_!k7GO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png 848w, https://substackcdn.com/image/fetch/$s_!k7GO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png 1272w, https://substackcdn.com/image/fetch/$s_!k7GO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k7GO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png" width="1188" height="1408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1408,&quot;width&quot;:1188,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k7GO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png 424w, https://substackcdn.com/image/fetch/$s_!k7GO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png 848w, https://substackcdn.com/image/fetch/$s_!k7GO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png 1272w, https://substackcdn.com/image/fetch/$s_!k7GO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c5d8785-5793-4143-b112-265f0506a19e_1188x1408.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://x.com/arfurrock/status/1892977861604036863?s=46">https://x.com/arfurrock/status/1892977861604036863?s=46</a></figcaption></figure></div><p>Making commitments to keep growing in today&#8217;s software environment is too risky, because as <a href="https://docs.google.com/document/d/103cGe8qixC7ZzFsRu5Ww2VEW5YgH9zQaiaqbBsZ1lcc/edit?tab=t.0">Chris Paik</a> puts it, the "end of software" is near. The new playbook:</p><ol><li><p>Use AI tooling to quickly and cheaply build projects that spit out cash immediately.</p></li><li><p>Forget about long-term defensibility, and as a result, <strong>don't take venture capital</strong>.</p></li><li><p>Move onto the next project.</p></li></ol><p>Many founders are avoiding these "castles without moats." But moats only matter when building is hard and expensive. If software takes days and &lt;$10K to launch, you have so many shots on goal that your probability of succeeding is much higher, so smaller projects become positive EV.</p><p>This arbitrage only exists for so long, because soon, AI agents themselves will be the ones rapidly spitting out new cash-generating software projects.</p><p>The closest thing to a "moat" for software built today: <strong>The fact that there are so many ideas available, it&#8217;s easier for competitors to build something new than copy you.</strong> (At least at first.)</p><p>Alternatively, if you insist on raising venture, you need to go full moonshot. That means pursuing ideas that are so outlandish that they seem irrational, projects demanding massive capital, breakthrough research, or other forms of strong defensibility.</p><p>SaaS wasn't contrarian enough anyway. The next wave will be weirder, riskier, and hopefully more interesting.</p>]]></content:encoded></item><item><title><![CDATA[Privatize the FDA]]></title><description><![CDATA[I advocate for privatizing the FDA by replacing its drug approval monopoly with a competitive market to reduce delays, lower costs, and foster innovation.]]></description><link>https://www.neelsomaniblog.com/p/privatize-the-fda</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/privatize-the-fda</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Sat, 11 Jan 2025 01:34:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/40cd1649-ea00-4cb6-bb8d-ed90ed7fae8a_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The FDA, among other goals, is mandated by Congress to ensure that drugs are safe and effective. However, this directive has come at an unforeseen cost: <strong>delays that have resulted in preventable deaths, and drugs that are difficult for informed patients to obtain.</strong></p><p>Examples include vaccines such as Fluad, which became available in the US <a href="https://www.cato.org/regulation/fall-2019/fda-needs-more-accountability-not-more-independence">18 years</a> after gaining widespread use in Europe, and peptides that demonstrate promising early results but face FDA distribution <a href="https://www.fda.gov/news-events/press-announcements/fda-roundup-december-17-2024">warnings</a>.</p><p>These delays have potentially lost hundreds of thousands, if not millions, of <a href="https://ascopost.com/issues/october-25-2015/delays-in-drug-approval-are-deadly-highlighting-the-need-for-improved-regulatory-efficiency/">life-years</a> annually. Moreover, the FDA's centralized, inefficient processes stifle innovation, discouraging biotech investment and leaving society with fewer life-saving and life-enhancing options.</p><p>This essay advocates for dismantling the FDA's drug approval function and transferring this responsibility to privatized entities known as "Drug Certification Bodies" (DCBs).</p><h2><strong>Problem: The FDA holds a monopoly on drug approvals.</strong></h2><p>While the FDA does not explicitly make unapproved drugs illegal, its regulatory powers over manufacturing, marketing, and distribution render many drugs de facto impossible to obtain.</p><p>There is no natural counterbalance to the FDA's power. It has no economic competitors, and legal action against the FDA is impractical for bio companies, who risk retaliation and future delays. The major issues are as follows:</p><p>1. Time and Cost of Approval</p><p>Submitting a drug for review costs over <a href="https://www.clinicaltrialsarena.com/news/fda-cost-revealed-2025-application-drug/">$4 million</a> in the United States, compared to less than <a href="https://www.ema.europa.eu/en/documents/other/explanatory-note-general-fees-payable-european-medicines-agency-1-april-2024_en.pdf">$250,000</a> in Europe. On average, it takes 10 years and <a href="https://www.genengnews.com/gen-edge/the-unbearable-cost-of-drug-development-deloitte-report-shows-15-jump-in-rd-to-2-3-billion/">$1-2 billion</a> to bring a new drug to market. This timeline includes not only the FDA's own review process but also the extensive trials and data collection required by the agency. These burdens result in fewer drugs funded and developed.</p><p>2. Economic Inefficiency</p><p>Since Congress has granted the FDA a monopoly, the agency has no incentive to maximize its ROI. Pharmaceutical companies must work with the FDA, allowing the agency to manually set "user fees" without market competition. These user fees&#8212;payments from drug companies for reviewing their products&#8212;make up ~45% of the FDA's $7.2 billion annual spend, which supports 18,000 employees.</p><p>While user fees have led to <a href="https://www.ncbi.nlm.nih.gov/books/NBK603243/">faster drug approvals</a>, the lack of a competitive market means there's no standard for what these fees should be. For example, pharmaceutical companies might be willing to pay even higher fees to receive quicker reviews.</p><p>3. A One-Size-Fits-All Approach</p><p>The FDA's approval process generally applies the same standards to all patients, regardless of each patient's individual risk tolerance or demographic profile. This binary system, where a drug is either approved for all or none, ignores the diverse needs of patients and leaves many without viable options. While Congress allows the FDA to make limited exceptions for terminally ill patients and orphan drugs, these pathways are insufficient to address the broader systemic issues.</p><h2><strong>We should not eliminate all drug regulations.</strong></h2><p>A variety of approval models could be explored, but the first step is for Congress to relax the constraint that the FDA has final authority over all approvals. On the other hand, complete deregulation isn't likely to end well:</p><p>1. Private actors have demonstrated a poor track record in self-regulating, such as promoting <a href="https://www.fda.gov/files/about%20fda/published/The-Sulfanilamide-Disaster.pdf">unsafe</a> drugs or <a href="https://x.com/ktkadakia/status/1612847108523958280?s=46">inadequately testing</a> devices before market release when not mandated. The FDA was initially created to address these misaligned incentives.</p><p>2. Laypeople are often unequipped to interpret statistical data on their own, so the laissez-faire market suffers from inefficiencies similar to those caused by imperfect information. The FDA acts as a trusted resource that the public relies on.</p><p>3. Eliminating the FDA runs the risk of enabling drug abuse. Addictive drugs are a national security and public health risk, e.g. the Opium Wars.</p><p>For these reasons, it is inadvisable to remove the FDA drug approval process with no alternative in its place.</p><h2><strong>Solution: Build a privatized FDA alternative.</strong></h2><p>My recommendation is for a competitive, privatized system of <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1503162">Drug Certification Bodies</a> to be established. A Drug Certification Body (DCB) is a private sector entity that conducts drug approvals and subjects itself to relevant regulations.</p><p>DCBs should handle all drug approvals and adopt the following three reforms:</p><p>1. Separate the safety and efficacy approvals, where safety is the only requirement for usage: Safety and efficacy are inherently connected, as approvals weigh a drug's benefits against its risks. However, even when efficacy has not yet been demonstrated, drugs should be approved if they meet a sufficient safety threshold. This approach mirrors current off-label drug use practices, where <a href="https://www.cato.org/commentary/abolish-fda">20-30%</a> of prescriptions involve drugs prescribed for conditions beyond FDA approval. In such cases, physicians must assess whether the potential benefits justify the risks. Additionally, we might require that patients consuming a yet-to-be-proven drug consent to their physician sharing their medical records to build an argument for efficacy.</p><p>For instance, Tafamidis, a treatment for transthyretin amyloid cardiomyopathy (ATTR), could have reached the US market sooner. Initially, the FDA rejected Tafamidis as a safe but ineffective treatment for polyneuropathy. Later, it was found effective for a different condition, ATTR, and approved in the United States in 2019. During the interim, Tafamidis was only available in Europe pending its second FDA review. Such cases are common, as pharmaceutical companies investing billions in a safe drug are motivated to identify conditions where it proves efficacious.</p><p>DCBs can separately offer efficacy certifications. Efficacy will remain a desirable approval to receive, since insurers and payors will prefer to cover drugs that are proven to be effective. Safety approvals should allow for greater side effects when a drug's benefits outweigh its risks.</p><p>2. Adopt a heterogeneous review process, allowing for demographic-specific approvals: The current FDA review process is cumbersome and inconsistent, with multiple pathways. The timeline typically consists of preliminary tests and three phases of clinical trials (~6 years). This all leads to a 100,000+ page new drug application, which takes 1+ year to review alongside a manufacturing inspection.</p><p>Ironically, Congress implicitly acknowledges that a faster approval process is possible via the FDA's emergency pathways, such as the process for COVID-19 vaccines. Under a privatized system, DCBs will develop voluntary standards tailored to each drug's unique risks and benefits.</p><p>In some cases, DCBs might approve drugs for only <a href="https://www.neelsomaniblog.com/race-science-for-non-racists.php">subsets of the population</a> based on available data, allowing faster access for targeted groups. New studies should include diverse demographic groups in line with the DEPICT Act of 2023, promoting equitable access, but this relaxation allows DCBs to utilize historical data, data from non-US jurisdictions like Honduras, or more unusual data that wouldn't meet the standard of a full clinical trial, such as human challenge studies for small target populations.</p><p>Some might argue that a lack of diversity in patient trials has led to failures in fields like <a href="https://www.amazon.co.uk/Malignant-Policy-Evidence-People-Cancer/dp/1421437635">oncology</a> where studies did not generalize, but the real error is that the FDA granted efficacy approvals when only safety was properly supported.</p><p>3. Establish public credibility by bearing the cost of unsafe approvals &amp; publishing results: A chief concern is that DCBs will only focus on minimizing approval times or offering competitive user fees, without prioritizing safety.</p><p>To align incentives, DCBs should offer liability insurance (up to a reasonable limit) to pharmaceutical companies for health hazards resulting from the use of approved drugs. The minimum required amount of liability insurance is to be determined.</p><p>All drug reviews should be published for public auditability. The results might be posted somewhere as simple as Arxiv.</p><h2><strong>How can this get congressional support?</strong></h2><p>The FDA should reduce its role to ensuring DCBs are run properly, similar to the regulation of credit rating agencies:</p><ul><li><p>Conflict-of-interest protections: DCBs must adhere to stringent rules, like those used by the Department of Defense, to prevent bribery or undue influence.</p></li><li><p>Enforcement of labeling: Relevant marketing claims made by drug manufacturers must be approved by DCBs.</p></li><li><p>Fraud prevention: Any data submitted to a DCB must be accurate.</p></li></ul><p>With a more limited scope, the FDA can focus its resources on accomplishing the above efficiently, while DCBs focus on approvals.</p><p>To get widespread support, the model must first be trialed successfully. Precedents, such as the expansion of the Third-Party Review Program for Class II medical devices under the 1997 Modernization Act, offer a roadmap for scaling privatized reviews. This involved selecting a representative sample of Class II devices and providing access to FDA databases and review templates.</p><p>The logical starting point might be "wellness therapies." Congress defines a drug as anything intended to (a) treat or prevent disease or (b) otherwise affect the body's structure or function. Privatized approvals may be more suitable for drugs that solely affect the body's structure or function ("wellness therapies"), like peptide injections.</p><p>Wellness therapies often involve novel mechanisms, which means the FDA must commit substantial resources to their review, unlike generics. This category of drugs is underserved, since there are already existing pathways for expedited approvals and expanded access for terminal illness drugs or orphan drugs. Wellness therapies have lower risks associated with an erroneous approval or rejection, since the therapies do not directly treat diseases, and their consumers are often high-paying and informed. Lastly, this category naturally expands to cover other drugs like homeopathic treatments, which have a fraught history with the FDA.</p><p>I am interested in collaborating with others who are working toward FDA privatization. While this possibility has been discussed for <a href="https://www.jstor.org/stable/26659541">decades</a>, the next four years under the Trump administration present a rare opportunity to finally reform this broken process. This might involve spinning up the first DCB that functions as a true alternative to the FDA.</p>]]></content:encoded></item><item><title><![CDATA[The New Economy]]></title><description><![CDATA[We have massively overallocated our youth to roles like software engineering and medicine - which will be shortly replaced en masse by AI. I propose how the working population should reallocate.]]></description><link>https://www.neelsomaniblog.com/p/the-new-economy</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/the-new-economy</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Fri, 25 Oct 2024 00:59:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5973729f-4f80-4de8-838b-e73c98cd6d70_225x225.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this essay, I propose that the labor force will reallocate away from jobs like software engineering &amp; medicine, and instead toward construction in the short-term and entertainment in the long-term.</p><p>Homelessness &amp; joblessness is obviously a rampant problem in San Francisco; India's youth population (~400 million people) is 20%+ unemployed; and AGI will render millions of high-earners even in the United States (e.g. software engineers) without work.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gbit!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gbit!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png 424w, https://substackcdn.com/image/fetch/$s_!Gbit!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png 848w, https://substackcdn.com/image/fetch/$s_!Gbit!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png 1272w, https://substackcdn.com/image/fetch/$s_!Gbit!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gbit!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png" width="728" height="620.6330935251799" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:948,&quot;width&quot;:1112,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:315693,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gbit!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png 424w, https://substackcdn.com/image/fetch/$s_!Gbit!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png 848w, https://substackcdn.com/image/fetch/$s_!Gbit!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png 1272w, https://substackcdn.com/image/fetch/$s_!Gbit!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F106a6bf0-12bc-4bd7-9eaf-1621e4f99947_1112x948.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">UC Berkeley Grads with 4.0 GPAs Cannot Find Jobs (<a href="https://www.linkedin.com/posts/jamesfobrien_tech-jobs-have-dried-upand-arent-coming-activity-7242613292479696897-gCyT?utm_source=share&amp;utm_medium=member_desktop">LinkedIn</a>)</figcaption></figure></div><p>These are all instances of the same problem. What I am interested in is a human equivalent of Bitcoin mining. Bitcoin mining allows us to take idle compute resources and contribute them toward something valuable. <strong>What can we do with idle people?</strong></p><h2>Refining the Problem Statement</h2><p>AGI poses many interesting questions, but this is a separate question from:</p><p>- What goals should we align humanity around in the long-term? It may be arrogant to suggest that such alignment is even possible or desirable. Instead, the problem here describes a medium-term issue facing the labor market. AGI's role may not be to guide humanity toward one grand objective, but rather to support a diversity of purposes and individual goals.</p><p>- If universal basic income is implemented, how should we capture and re-distribute it? Unrelated, but a useful lens for the profiteer.</p><p>- How should we keep people entertained, so they don't go insane? Entertainment/fulfillment is valuable, but not the only thing to optimize for.</p><p>- What should we use "extra" (~$0 marginal cost) inference power for? The problem above refers to the "extra" people, not compute.</p><p>Even without full-blown AGI, a solution for economically unproductive people is useful even today as automation continues to displace workers.</p><p>When AGI fully arrives, access to AGI might not be globally democratized, and the economic surplus from AI won't be re-distributed purely to the same individuals who held those high-paying jobs. People might either prefer to operate at their previous level of income/wealth, or they might find it fulfilling to contribute toward something greater than themselves, so this question remains interesting to me.</p><h2>What constitutes a valid solution?</h2><p>1. Is the work valuable? "Value" might not be measured in dollars generated, and it might sometimes be debatable whether something is valuable. It might be measured in fulfillment. This criterion eliminates meaningless redistributions of wealth or <a href="https://www.economist.com/buttonwoods-notebook/2010/07/19/keynes-at-work">digging holes to fill them up again</a>.</p><p>2. Is the job scalable? The solution must be able to employ hundreds of thousands, if not millions of people.</p><p>3. Is it ethical? At the same time, unethical solutions are still worth highlighting only because they might lead to an ethical solution.</p><p>4. Is it AGI-resistant? The solution should still be useful even in a post-AGI world. This is difficult to predict and might not be possible. Solutions that aren't AGI-resistant are still useful in the interim period where masses are unemployed, but AGI cannot yet produce everything desired.</p><p>To satisfy the last criterion, a valid solution cannot rely on superior thinking ability to produce value. The timeframe matters for the last criterion, since certain tasks might take years longer to replace than others.</p><h2>What are the possible solutions?</h2><p>I've ranked the following attributes alongside my view of their AGI-resistance:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j-tQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j-tQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png 424w, https://substackcdn.com/image/fetch/$s_!j-tQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png 848w, https://substackcdn.com/image/fetch/$s_!j-tQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png 1272w, https://substackcdn.com/image/fetch/$s_!j-tQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j-tQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png" width="1152" height="425" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:425,&quot;width&quot;:1152,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67294,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j-tQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png 424w, https://substackcdn.com/image/fetch/$s_!j-tQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png 848w, https://substackcdn.com/image/fetch/$s_!j-tQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png 1272w, https://substackcdn.com/image/fetch/$s_!j-tQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F700c9e17-ad94-4b12-b08e-49a70efeced2_1152x425.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Below, I dive into each of these categories to illustrate what the reallocation of labor resources might look like:</p><h3>Medium-Term Solutions</h3><p>While fields like construction and elderly care are fulfilling, even these tasks can likely be replaced by agents/robotics on a reasonable timeframe. At the same time, construction in particular offers a compelling vision that can align and employ millions in the near-term.</p><p>1. Construction: Build something grand that justifies the use of manual labor.</p><p>An example here would be if India were to commission the development of a Taj Mahal 2. The Indian government would guarantee a living wage to anyone who contributed, and the development of such magnificent structures is worthwhile as works-of-art.</p><p>The United States administration should institute a "new" <a href="https://en.wikipedia.org/wiki/New_Deal">New Deal</a>, where the government employs millions of Americans to develop bleeding-edge public infrastructure. This is the most compelling vision to me in the short-term.</p><p>This might involve directing resources toward the necessary construction for interstellar exploration and colonization: space ports, launch infrastructure, and habitats for human life on Mars.</p><p>This work would be valuable, scalable, and ethical, but not fully AGI-resistant.</p><p>2. Community work: Care for other humans, in capacities where humans are preferred.</p><p>I am interested in someone defining what it means to be healthy, and then designing the surrounding environment and checks to promote healthy child rearing. This comes from a concern that children who lack real human interaction in their upbringing will be emotionally and socially stunted.</p><p>This category might also include elderly care or running in-person, human-only communities.</p><p>3. Biological utility: Use your human body to produce biological data or materials.</p><p>Individuals like Bryan Johnson generate tremendous high-fidelity biological data that could theoretically be used to improve drug development. A more dystopian expression would be financially incentivizing experimental drug testing, since real human bodies will be superior to biological simulations or models for some period of time.</p><p>Another instance of this might be surrogates, if babies from surrogates are superior to lab-grown babies. We can reject unethical examples like organ donation or <a href="https://en.wikipedia.org/wiki/Fifteen_Million_Merits">using the human metabolism for energy production</a>, which is too metabolically inefficient anyway.</p><p>These roles are not particularly scalable, and they provoke ethical questions.</p><h3>Long-Term Solutions</h3><p>1. Bias mitigation: Some decisions inherently require bias, and we prefer these decisions be made by biased humans rather than biased models.</p><p>Such examples include the interpretation of the law/ethics, including judges, lawyers, or governance of systems related to AGI itself. This might also include oversight of some types of AGI output.</p><p>I don't view this as particularly scalable, and I'm concerned that human error rates might be too high to be useful.</p><p>2. Entertainment: Self-explanatory; entertain/serve other humans.</p><p>Unethical examples abound in this category, from prostitution to real-life Squid Games, gladiators, and so forth. But entertainment has so far proven resistance to AI alternatives. For example, while <a href="https://en.wikipedia.org/wiki/Human%E2%80%93computer_chess_matches">chess bots are definitively stronger than human players</a>, we still prefer watching humans play, though widespread sentiment can of course change over time.</p><p>This includes the service industry, where it's higher status to have human labor over machines, or the Olympics where we even limit drugs that could potentially interfere with natural human performance. This work is valuable, scalable, and there are sufficient ethical examples.</p><h2>How do we act on this?</h2><p>I am concerned about a potential shock to the labor market where large segments of the population are rendered unemployed very quickly. I have serious doubts that governments would adapt to quickly issuing UBI for a variety of reasons. In my view, it is wise for us to proactively institute the relevant legislation (e.g. a "new" New Deal) or large-scale private funding to incentivize a shift toward roles that are sustainable in the medium-term.</p><p>I'm interested in hearing feedback and connecting with others who are interested in this problem space. You can reach me on Twitter at <a href="https://twitter.com/neelsomani?lang=en">@neelsomani</a>.</p>]]></content:encoded></item><item><title><![CDATA[A Year Around The Sun With Eclipse]]></title><description><![CDATA[I reflect on the Eclipse team's journey in building the Eclipse Mainnet, Ethereum's fastest L2. A reflection on rollups and challenges for the app-specific rollup thesis.]]></description><link>https://www.neelsomaniblog.com/p/a-year-around-the-sun-with-eclipse</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/a-year-around-the-sun-with-eclipse</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Tue, 03 Oct 2023 01:07:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c0a0fb28-5a0e-4562-a085-b8a7df29fc1b_400x400.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Eclipse team has been hard at work for over a year now, so I thought I'd take this opportunity to reflect a bit on how we got here.</p><p>For those who have been following Eclipse, you'll know that our focus has zeroed in on building <a href="https://x.com/EclipseFND/status/1704178668543824309?s=20">Eclipse Mainnet</a>: Ethereum's fastest L2, powered by the SVM. Its architecture represents the culmination of our learnings from deploying our rollup framework for a variety of applications.</p><p>Before I get into where we came from, I want to touch on where we are now. I don't want to oversell where Eclipse is today. We're very much at the beginning of our journey. We still need to launch mainnet, grow an active ecosystem, decentralize our proofs, strengthen bridge contract upgradability, and many other things. That won't happen overnight. We're heads down focused on a successful mainnet launch over the coming months and will continue to build for years after.</p><p>Nonetheless, we have made some great progress already. This is what we've learned along the way:</p><h2>The Dawn of Eclipse</h2><p>We started talking to app developers a bit over a year ago, offering to spin up <a href="https://mirror.xyz/neelsalami.eth/rvhK5mEcFTOjyu_DFsqS2cYR7U6Fjvbw3nf8tI-pr-Q">customizable rollups</a> using the Solana Virtual Machine (SVM). A <a href="https://ethereum-magicians.org/t/a-rollup-centric-ethereum-roadmap/4698">rollup-centric roadmap</a> seemed to imply a world with thousands of rollups, so there was a lot of interest. We ended up running 30+ testnet chains alongside the application teams trying them out.</p><p>The operational burden was non-trivial. When chains went down at 2AM, our Head of Engineering David would receive the call. When teams had issues with their infrastructure integrations, our core engineers were expected to act as a liaison and intermediate. When apps wanted to launch a native token with the chain, we coordinated with all necessary parties. This workload wasn't scalable or sustainable.</p><p>Even for our customers, app-specific rollups weren't optimal. It's more difficult to onboard users, bridge between rollups, compose with apps on the L1, and bootstrap meaningful economic activity. Each additional rollup added complexity and reduced interoperability.</p><p>We dove into the myriad proposed solutions. Self-service bridges, shared sequencers, indexers-as-a-service, new "settlement layers" as liquidity hubs. Dozens of companies have been assembled to service the purportedly imminent influx of thousands of app-specific rollups. Trying to solve self-engineered complexity by adding new layers of complexity didn't strike us as convincing. We started to re-evaluate our position toward app-specific rollups.</p><h2>More Rollups More Problems</h2><p>What the world truly needs is just one more rollup &#8211; especially if it's ours.</p><h3>Problem 1: App-specific rollups are uneconomical for most applications.</h3><p>We discovered the open secret that most app-specific rollups have a very high fixed cost. I even gave a talk at the Modular Summit about it: <a href="https://www.youtube.com/watch?v=EIekN6przb0">Rollups-as-a-Service Are Going To Zero</a>.</p><p>After running 30+ testnet chains ourselves, we quickly realized the magnitude of these fixed costs. Even a bare minimum rollup configuration demands significant expenses, including:</p><ul><li><p>Sequencer</p></li><li><p>Full nodes for the executor, verifier(s), fast finality bridge</p></li><li><p>Indexers</p></li><li><p>Engineering support</p></li><li><p>Posting state commitments and sometimes additional sequencer data to the L1</p></li></ul><p>...before considering additional infrastructure integrations. These expenses are higher for mainnet chains.</p><p>Aside from the costs above, app-specific rollup developers face high startup costs from infrastructure partners. Major RaaS providers charge on the order of $60K-$100K+ annually as well as take a percentage of all sequencer fees. Additionally, these teams face the implicit costs of increased developer complexity and user friction.</p><p>It's also worth noting that certain popular rollup stacks make the economics for smaller app-chains (which lack very high transaction throughput) even more challenging today. For example, <a href="https://subscriptions.theinformation.com/newsletters/slow-burn/archive/a-dapp-developers-guide-to-appchains">OP Stack chains carry particularly high fixed costs</a> because they routinely post to the L1 regardless of L2 activity. (Note that this specific inefficiency <a href="https://twitter.com/liamihorne/status/1690790715037470720">can be changed in the future</a>, and not all stacks have this issue.)</p><p>Overall, it's far more economically efficient to reduce overhead, deduplicate work, and amortize these high infrastructure costs across a single shared chain.</p><h3>Problem 2: The customizations offered by app-specific rollups are largely unnecessary.</h3><p>This lesson hurt, because customizability was one of the original motivations for Eclipse.</p><p>Customizing your own chain sounds nice in theory, but the reality is most apps don't need or want it. These changes are generally far more trouble than they're worth, since they must be audited from both a technical and cryptoeconomic perspective. Each novel customization means increased complexity and potentially worse interoperability. <a href="https://www.neelsomani.com/blog/rollups-as-a-service-are-going-to-zero.php">This equally applies to L1 app chains</a>:</p><p><em>"The Cosmos SDK is incredibly generic and yet it never inspired the plethora of diverse chains that you might expect. This could be because customization requires too much technical sophistication, or more likely because the long tail of applications is well-suited by a handful of architectures."</em></p><p>To be fair, it's worth mentioning that there are some cases where we think app chains <a href="https://forum.makerdao.com/t/explore-a-fork-of-the-solana-codebase-for-newchain/21822/24?u=neelsomani">make sense</a>, but it's not really about customization. These cases are driven by <a href="https://x.com/0xSydney/status/1692355457611129129?s=20">ownership</a>, sovereignty, and the community's ability to control forks and upgrades.</p><h3>Problem 3: Non-Ethereum settlement layers are a nerd trap.</h3><p>The original idea for Eclipse was to launch our own settlement layer with the other app-specific Eclipse rollups deployed as "L3s":</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zc7W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zc7W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png 424w, https://substackcdn.com/image/fetch/$s_!zc7W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png 848w, https://substackcdn.com/image/fetch/$s_!zc7W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png 1272w, https://substackcdn.com/image/fetch/$s_!zc7W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zc7W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png" width="1456" height="509" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:509,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:383777,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zc7W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png 424w, https://substackcdn.com/image/fetch/$s_!zc7W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png 848w, https://substackcdn.com/image/fetch/$s_!zc7W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png 1272w, https://substackcdn.com/image/fetch/$s_!zc7W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98dd7c0-c37c-4334-94c9-46f9c6ded20c_1600x559.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Old and Not-So-Great Eclipse Architecture</figcaption></figure></div><p>We did this primarily because it made the settlement process easier. For a naive implementation of optimistic SVM settlement, a custom settlement layer gave us the optionality to introduce custom precompiles or other operations to facilitate the settlement of our rollups. It also would've been cheaper than using Ethereum as a settlement layer.</p><p>But we always wanted to use Ethereum. A good settlement layer has a lot of native liquidity, high security (both safety and liveness), easy verifiability, and credible neutrality. Ethereum checks all boxes. ETH is the lingua franca of crypto: it's how we pay our gas, denominate our trades, and purchase our NFTs. Bitcoin is the only chain that's competitive on those properties, and Bitcoin doesn't have the functionality needed to support enshrined settlement.</p><p>Our engineering team made quick progress on our <a href="https://mirror.xyz/eclipsemainnet.eth/me7bXLWJDS177V6nl8j1uzF1mxpX6nbGOLNeyBAwXgs">zk-VM</a>, which made zk-fault proofs on Ethereum feasible. And it turns out settlement is pretty cheap, even on Ethereum. An optimistic rollup pays on the order of ~<a href="https://x.com/neelsalami/status/1688718618660622336?s=20">$5 per day</a> to Ethereum. For these reasons, we abandoned our L2 settlement layer, and instead we opted to use Ethereum for settlement.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KaxN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KaxN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png 424w, https://substackcdn.com/image/fetch/$s_!KaxN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png 848w, https://substackcdn.com/image/fetch/$s_!KaxN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png 1272w, https://substackcdn.com/image/fetch/$s_!KaxN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KaxN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png" width="1456" height="336" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:336,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:253814,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KaxN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png 424w, https://substackcdn.com/image/fetch/$s_!KaxN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png 848w, https://substackcdn.com/image/fetch/$s_!KaxN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png 1272w, https://substackcdn.com/image/fetch/$s_!KaxN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b0e6c9-6e12-4f23-bef4-f038d6b3ed55_1600x369.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Eclipse Mainnet Architecture</figcaption></figure></div><p>And as mentioned above, it's difficult for a non-Ethereum settlement layer to be economically sustainable, because settlement layers generate very little revenue directly. Settlement transactions are super cheap, especially for optimistic rollups. It's just writing a handful of bytes (a state commitment) to the settlement layer periodically.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZRPa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZRPa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png 424w, https://substackcdn.com/image/fetch/$s_!ZRPa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png 848w, https://substackcdn.com/image/fetch/$s_!ZRPa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png 1272w, https://substackcdn.com/image/fetch/$s_!ZRPa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZRPa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png" width="1182" height="1174" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1174,&quot;width&quot;:1182,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:416170,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZRPa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png 424w, https://substackcdn.com/image/fetch/$s_!ZRPa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png 848w, https://substackcdn.com/image/fetch/$s_!ZRPa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png 1272w, https://substackcdn.com/image/fetch/$s_!ZRPa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5781de-5987-44f3-8f4b-97f7c5d96a44_1182x1174.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://twitter.com/effortcapital/status/1688907679933014016?s=46&amp;t=dLgKzm8V9vF6gP2FmfUQdw">@zmanian on Twitter</a></figcaption></figure></div><p>The only way for a settlement layer to be economically sustainable is by indirect value capture. Most importantly, ETH becomes the de facto money everyone holds. (Ethereum also has native L1 transactions which generate gas fees. But I suspect this is not nearly as important for ETH as its "moneyness.")</p><p>Trying to build a new settlement layer at this point feels like a complicated and unnecessary form of lock-in. Just use Ethereum.</p><h2>We Can "Have Our Cake And Eat It Too"</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dfSJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dfSJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png 424w, https://substackcdn.com/image/fetch/$s_!dfSJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png 848w, https://substackcdn.com/image/fetch/$s_!dfSJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png 1272w, https://substackcdn.com/image/fetch/$s_!dfSJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dfSJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png" width="1194" height="1252" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1252,&quot;width&quot;:1194,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:416082,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dfSJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png 424w, https://substackcdn.com/image/fetch/$s_!dfSJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png 848w, https://substackcdn.com/image/fetch/$s_!dfSJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png 1272w, https://substackcdn.com/image/fetch/$s_!dfSJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ed1463-75c5-45f8-a732-1d016a081a70_1194x1252.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://x.com/cburniske/status/1598043017071718400?s=20">@cburniske on Twitter</a></figcaption></figure></div><p>Finally, our learnings from this past year coupled with several technical advancements brought us to the Eclipse Mainnet architecture. A shared general-purpose L2 that addresses the challenges that app developers actually face without sacrificing UX or fragmenting liquidity.</p><p>We're excited to build in public and support the cutting-edge apps that developers build, kicking off a new wave of innovation on Ethereum.</p>]]></content:encoded></item><item><title><![CDATA[Rollups-as-a-Service Are Going To Zero]]></title><description><![CDATA[The app-specific rollup space is poorly defined, so I have taken it upon myself to define the market landscape and explain the economics for the uninitiated.]]></description><link>https://www.neelsomaniblog.com/p/rollups-as-a-service-are-going-to</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/rollups-as-a-service-are-going-to</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Wed, 09 Aug 2023 01:13:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ca973768-3629-4430-8c8e-8a0326c879b8_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Long live Rollups-as-a-Service.</h2><p><em>This blog post is adapted from a <a href="https://www.youtube.com/watch?v=EIekN6przb0">presentation</a> that I gave at Modular Summit.</em></p><p>At Eclipse, we're building customizable app-specific rollup infrastructure to support verticals like gaming &amp; social, DePIN, and DeFi.</p><p>Since we've been working on this for ~10 months, I feel compelled to push back on misconceptions in the space. Here are some thoughts about app-specific rollups:</p><h2>Existing market segmentations have it wrong.</h2><h3>Rollup frameworks aren't charities.</h3><p>Rollup frameworks like OP Stack are codebases that implement the key components of a rollup. They're not going to charge you to use their code, but they need to capture value somehow. At a high level, there are three places to capture value:</p><ol><li><p>Execution: sequencing transactions, executing, and (for a zk-rollup) proving</p></li><li><p>Settlement: bridging and verifying validity proofs or fault proofs</p></li><li><p>Data availability: publishing the order of transactions</p></li></ol><p>But only execution is suitable as the rollup framework's business model:</p><ul><li><p>Settlement: Post-Bedrock, Optimism only pays <a href="https://dune.com/optimismfnd/optimism-l1-batch-submission-fees-security-costs">~$5 a day</a> to Ethereum for settlement. The rest of the OP Stack costs are from posting blocks and the associated overhead. A competitive settlement layer would likely earn even less.</p></li><li><p>Data availability: A fragmented DA layer will have less stake securing the network compared to a shared DA layer such as Celestia. Many rollups don't want to move their DA off of Ethereum anyway because they would sacrifice their Ethereum-alignment.</p></li></ul><p>Any market segmentation should also include rollup frameworks in at least one category related to execution, and any product that offers execution is competitive with the rollup framework.</p><h3>Isolated Rollups-as-a-Service aren't defensible.</h3><p>The naive interpretation of RaaS is actually <strong>isolated Sequencers-as-a-Service</strong> (iSaaS). These are companies who have no protocol of their own, but they're deploying existing open-source rollup frameworks and running a sequencer. OP Stack has a partnership with an iSaaS.</p><p>The business model for iSaaS is to charge some recurring fiat amount in addition to some percent of sequencer fees. (Additional support services, consulting, or custom feature development don't represent scalable business models.) To be clear, this would be a direct competitor to shared sequencer networks such as Espresso, Astria, Radius, and more; but they have some fatal disadvantages.</p><p>A big problem with iSaaS is that it is at odds with the rollup framework. As described above, an optimistic rollup framework like OP Stack has to monetize via sequencer fees. (A zk-rollup framework might be okay with neglecting the sequencer fees and keeping only prover fees.)</p><p>Other high-level problems with such a business are that it is commoditized, easy to enter, and there are no network effects, unlike a shared sequencer. iSaaS lacks the economies of scale of a shared sequencer since each sequencer is isolated.</p><h3>Optimistic rollup frameworks must offer their own sequencer-as-a-service.</h3><p>To play nice, the iSaaS might return sequencer fees to the optimistic rollup framework, keeping only the recurring fiat payment for itself.</p><p>But now the iSaaS and the rollup framework must both independently be profitable. For a large enterprise, the ideal pricing would be a high recurring fiat payment but low sequencer fee. But the iSaaS doesn't have the flexibility to decrease sequencer fees, since the sequencer fees aren't theirs to begin with; they're passed back to the rollup framework. If the iSaaS doesn't share revenue with the rollup framework, the rollup framework can deploy their own iSaaS and likely more deeply penetrate the market due to established trust.</p><p>The reason why so many iSaaS are popping up is because it seems attractive to the unsophisticated reader. It looks like SaaS, so a non-crypto investor might find it easier to reason about the fiat revenue. But iSaaS will have difficulty competing with a rollup framework with its own sequencer-as-a-service, which has protocol native revenue and a token. The latter has more optionality in pricing, and the token can be used to subsidize customer acquisition costs and <a href="https://mirror.xyz/electriccap.eth/SD0wT7qSSfis9gLT_Ki1gY6_oTYEqgwcGE0hDw7kMDY">fixed costs</a> of running a chain (described below) for promising projects, which pays itself off as protocol native revenue.</p><p>Protocol-native network effects and amortized fixed costs will create stronger unit economics for protocols with traction, making rollup providers somewhat winner-takes-all.</p><h3>Refined Market Maps</h3><p>Now I can show how I'd adjust the graphic in this <a href="https://messari.io/report/the-rollups-as-a-service-ecosystem">Messari piece</a>, which I thought looked reasonable at the time:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!em-v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!em-v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png 424w, https://substackcdn.com/image/fetch/$s_!em-v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png 848w, https://substackcdn.com/image/fetch/$s_!em-v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png 1272w, https://substackcdn.com/image/fetch/$s_!em-v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!em-v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png" width="799" height="673" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:517226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!em-v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png 424w, https://substackcdn.com/image/fetch/$s_!em-v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png 848w, https://substackcdn.com/image/fetch/$s_!em-v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png 1272w, https://substackcdn.com/image/fetch/$s_!em-v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b97248e-61c5-4107-99f2-1cd15db9677c_799x673.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Messari Market Map</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P1t7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P1t7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png 424w, https://substackcdn.com/image/fetch/$s_!P1t7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png 848w, https://substackcdn.com/image/fetch/$s_!P1t7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png 1272w, https://substackcdn.com/image/fetch/$s_!P1t7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P1t7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png" width="1350" height="950" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:950,&quot;width&quot;:1350,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60493,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!P1t7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png 424w, https://substackcdn.com/image/fetch/$s_!P1t7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png 848w, https://substackcdn.com/image/fetch/$s_!P1t7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png 1272w, https://substackcdn.com/image/fetch/$s_!P1t7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12de9d87-f4f1-433e-bd4a-47943d7891d7_1350x950.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Refined Messari Market Map</figcaption></figure></div><p>I'd rename the No Code Deployment category, and I would rename Rollup SDKs to Rollup Frameworks, because many rollup frameworks don't provide a full SDK to developers. I would also modify this Celestia ecosystem diagram:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3S6Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3S6Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3S6Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3S6Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3S6Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3S6Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg" width="1000" height="582" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:582,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:203129,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3S6Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3S6Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3S6Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3S6Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73f9e3-15a7-4d5b-8ef2-601f2ab8dacc_1000x582.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Celestia Ecosystem Map</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x8KY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x8KY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg 424w, https://substackcdn.com/image/fetch/$s_!x8KY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg 848w, https://substackcdn.com/image/fetch/$s_!x8KY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!x8KY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x8KY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg" width="1100" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44365,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x8KY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg 424w, https://substackcdn.com/image/fetch/$s_!x8KY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg 848w, https://substackcdn.com/image/fetch/$s_!x8KY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!x8KY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2c3cd1b-745e-4dbc-a7d9-7fce416410ce_1100x450.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Refined Celestia Ecosystem Map</figcaption></figure></div><p>I'd remove Rollups-as-a-Service, Settlement Layers, and Virtual Machines. And for projects in the Rollup Framework bucket, they will almost certainly have to find themselves in another category as well, because otherwise they can't monetize.</p><h2>No Free Lunch: Economic and Technical Limits</h2><h3>Most apps should not have their own rollup.</h3><p>The easiest way to demonstrate the economics of app-specific rollups is by looking at a live rollup: Optimism (post-Bedrock). Props to the Optimism team for making this <a href="https://dune.com/optimismfnd/optimism-l1-batch-submission-fees-security-costs">Dune dashboard</a>.</p><p>The following assumes a ~25 gwei gas price on Ethereum:</p><ol><li><p>One-time cost of deployment for an OP Stack mainnet chain: ~1 ETH</p></li><li><p>Fixed cost of an OP Stack chain, even if 0 transactions are run: ~0.5 ETH a day</p></li><li><p>Variable cost: 7.5 * 10^-5 ETH per transaction</p></li></ol><p>To get the fixed cost, I took the average overhead cost per transaction and multiplied by the number of transactions run that day, and confirmed it by running an OP Stack chain on mainnet.</p><p>This variable cost is cheap but not quite Solana-level cheap, and the fixed cost can get amortized over many transactions. In the future with EIP-4844, we might generously assume this cost to come down by 10x. Still, assuming a $2,000 ETH price this represents something like a $0.015 lower bound per transaction plus some amortized fixed cost.</p><p>We might consider something like .00001 ETH (~$0.02 at the time of writing) as a reasonable transaction markup to cover this fixed cost, so we need 50,000 transactions per day for an app-specific rollup to make sense. The price for each transaction is roughly $0.17 before EIP-4844, and optimistically $0.03 after. We might add a small premium so it's economical for a (shared) sequencer to support the chain.</p><p>So as cool as something like <a href="https://twitter.com/doganeth_en/status/1640062610161688577">Opclave</a> is (I really like the idea, we're chatting with Dogan's team and we might incorporate this feature into Eclipse rollups), it doesn't make sense as a mainnet OP Stack chain. The constraint here is that the OP Stack chains are anchored to Ethereum which has expensive blockspace, and Optimism is intent on Ethereum-alignment.</p><p>With these unit economics in mind, dApps that don't make a lot of sense for their own chain are small DeFi dApps and NFT projects. What might make sense for these dApps is to subsidize the cost for gas if the long run unit economics of an Ethereum L2 make sense for their dApp, or they could be willing to take a loss on their app chain.</p><p>If an app requires too much transaction volume, then an Ethereum-anchored rollup doesn't work either, because a transaction fee greater than $0.01 is likely too high. These kinds of apps would require a novel approach such as what Eclipse is building with our highly parallelized virtual machine and sovereign rollup architecture.</p><h3>Customizable rollups must be constrained.</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!re5B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!re5B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png 424w, https://substackcdn.com/image/fetch/$s_!re5B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png 848w, https://substackcdn.com/image/fetch/$s_!re5B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png 1272w, https://substackcdn.com/image/fetch/$s_!re5B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!re5B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png" width="1456" height="471" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:471,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:180395,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!re5B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png 424w, https://substackcdn.com/image/fetch/$s_!re5B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png 848w, https://substackcdn.com/image/fetch/$s_!re5B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png 1272w, https://substackcdn.com/image/fetch/$s_!re5B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7733b42f-ba35-4c41-a7bd-47aed5fcc128_1700x550.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://stack.optimism.io/docs/build/hacks/">Introduction to OP Stack Hacks</a></figcaption></figure></div><p>As mentioned in the screenshot above, OP Hacks won't be included as part of Optimism's Superchain. That makes sense because in order to properly settle or provide stateful sequencing for the rollup, we need some invariants to hold true. Any modifications also need an audit before supporting real economic value.</p><p>Another good reason to constrain app-specific rollups is by looking at the adoption of Cosmos. The Cosmos SDK is incredibly generic and yet it never inspired the plethora of diverse chains that you might expect. This could be because customization requires too much technical sophistication, or more likely because the long tail of applications is well-suited by a handful of architectures. On the other hand, sector-specific templates can solve popular pain points for different verticals and provide repeatable architectures.</p><p>I'm interested to get the community's thoughts. Feel free to reach out via Twitter <a href="https://twitter.com/neelsomani">@neelsomani</a> or <a href="https://twitter.com/EclipseFND">@EclipseFND</a>.</p>]]></content:encoded></item><item><title><![CDATA[An Alternate Interchain Security Proposal]]></title><description><![CDATA[This post outlines a Cosmos interchain security proposal that involves a CLI tool and a novel fee market, streamlining the process to launch an app chain.]]></description><link>https://www.neelsomaniblog.com/p/an-alternate-interchain-security</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/an-alternate-interchain-security</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Wed, 06 Jul 2022 01:20:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/789a398e-1e28-4458-a6d5-5c86f418c67b_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this post, I give two recommendations to the Cosmos ecosystem:</p><ol><li><p>A CLI script to easily deploy a new app chain</p></li><li><p>A novel fee market for interchain security</p></li></ol><h2>Where Is Everyone?</h2><p>After the Terra de-peg, I put my <a href="https://www.neelsomani.com/blog/future-of-terra-defi.php">Terra EVM</a> project on pause. I was thinking about where to build next, and I found I have issues with almost every ecosystem:</p><ul><li><p>Ethereum is incredibly slow at 15 TPS</p></li><li><p>Solana has known <a href="https://cointelegraph.com/news/solana-suffers-7th-outage-in-2022-as-bots-invade-the-network">stability issues</a></p></li><li><p>Cosmos has low TVL ($600m) and the activity is dwarfed by Ethereum</p></li></ul><p>But I like Cosmos the best architecturally. The app chain thesis has been vindicated with the recent <a href="https://dydx.exchange/blog/dydx-chain">deployment of dYdX</a>, an indictment of the theory that everything would soon become an Ethereum rollup. There are several advantages of making a Cosmos app chain vs. a dApp elsewhere:</p><ul><li><p>Tendermint has <a href="https://galois.com/blog/2021/07/formally-verifying-the-tendermint-blockchain-protocol/">formally verified</a> liveness.</p></li><li><p>If a serious error occurs, your governance can vote to roll back the chain or take remedial action. This isn't possible if a tragedy happens on a chain like Ethereum, such as the $30m <a href="https://blog.openzeppelin.com/on-the-parity-wallet-multisig-hack-405a8c12e8f7/">Parity multisig hack</a>.</p></li><li><p>There is no congestion or competition for block space between other dApps.</p></li><li><p>With ABCI, you can theoretically use whatever language you want.</p></li><li><p>App chains avoid state bloat, which every monolithic L1 will have to deal with.</p></li></ul><p>So it begs the question: Why isn't everyone deploying as a Cosmos app chain? Moreover, why do people still prefer deploying their dApps as smart contracts to general-purpose chains?</p><h2>The Status Quo</h2><h3>It's too complicated to launch an app chain.</h3><p>For a developer who codes Solidity dApps, now you must <a href="https://tutorials.cosmos.network/academy/1-what-is-cosmos/">learn about</a> Tendermint, ABCI, and the Cosmos SDK. Even with the help of <a href="https://docs.ignite.com/guide/hello#say-hello-ignite-cli">Ignite CLI</a>, just to instantiate a "Hello World" application, we need to modify protocol buffers and learn about <a href="https://docs.cosmos.network/master/building-modules/keeper.html">Keeper</a>. And deploying to production is a beast of its own: there is no <code>ignite deploy</code> command.</p><h3>You can't bootstrap a validator set.</h3><p>After implementing your app with the Cosmos SDK and launching your chain, now you need to bootstrap your validator set. The difficulty here is that no one wants to validate for a token with no known economic value.</p><p>The <a href="https://interchainsecurity.dev/">current proposal</a> for interchain security: you basically apply for interchain security, and the governance for the "provider chain" votes on whether they want to validate for you. $ATOM delegators and validators are rewarded via additional fees from your chain via the <a href="https://github.com/cosmos/gaia/blob/main/docs/interchain-security.md#provider-chain-distribution-module">distribution module</a>.</p><p>What's good about the current proposal is that it creates a use case for $ATOM. $ATOM certainly needs use cases beyond governance. Some issues:</p><ul><li><p>Security: Security for your chain is actually better if you use your own token. You can simply hold a large percentage of tokens yourself and make an attack prohibitively expensive or impossible.</p></li><li><p>Flexibility: Once interchain security is turned on for a chain, an individual validator cannot opt out. They're stuck validating for this new chain. (I've been told ICS v2 will allow individual validators to opt in or out.)</p></li><li><p>Resource efficiency: Some consumer chains are resource-intensive to validate, and some validator nodes have greater resources than others. Since all validators from the provider chain must now validate for the consumer chain, we fail to capture this property.</p></li><li><p>Startup time: A governance vote takes time to process.</p></li></ul><h2>The Alternate Proposal</h2><h3>Simplify launching an app chain.</h3><p>The first step is to greatly simplify the process of deploying an app chain with a script. It has to be as easy as deploying a Solidity smart contract to Ethereum, similar to what Informal is describing with their <a href="https://informal.systems/2022/05/09/building-with-interchain-security/">CLI tool</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!suSQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!suSQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png 424w, https://substackcdn.com/image/fetch/$s_!suSQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png 848w, https://substackcdn.com/image/fetch/$s_!suSQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png 1272w, https://substackcdn.com/image/fetch/$s_!suSQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!suSQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png" width="557" height="340" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:340,&quot;width&quot;:557,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:35500,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!suSQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png 424w, https://substackcdn.com/image/fetch/$s_!suSQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png 848w, https://substackcdn.com/image/fetch/$s_!suSQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png 1272w, https://substackcdn.com/image/fetch/$s_!suSQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86efaa1a-3259-4f77-9cb4-24688fac4183_557x340.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Make a fee market for interchain security.</h3><p>We can solve the issues with the current interchain security proposal by making it a market, creating an <strong>even better use case for $ATOM</strong>, although theoretically this works for any token.</p><p>From a validator's perspective, you make an ask of the form: "In exchange for you bonding $x of $ATOM to me, I will provide 1 validator on your app chain."</p><p>For a consumer chain that needs interchain security, the bid structure is more interesting. I would model the problem as each consumer chain requiring k_c validators, and they are willing to pay any price to get them (inelastic demand). In this case, what should be the price for each validator?</p><p>The competitive market structure here is a uniform clearing-price (UCP) auction. We order the validators from cheapest to most expensive, and if there is demand for k* total validators, then the price that the k*th validator offered is the price for all validators. This market structure incentivizes validators to make their ask price as low as possible, and it enables orders to be filled quickly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6KGA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6KGA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png 424w, https://substackcdn.com/image/fetch/$s_!6KGA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png 848w, https://substackcdn.com/image/fetch/$s_!6KGA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png 1272w, https://substackcdn.com/image/fetch/$s_!6KGA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6KGA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png" width="500" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11699,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6KGA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png 424w, https://substackcdn.com/image/fetch/$s_!6KGA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png 848w, https://substackcdn.com/image/fetch/$s_!6KGA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png 1272w, https://substackcdn.com/image/fetch/$s_!6KGA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271c0a60-5c05-4a01-9e67-ddecfab7c9d4_500x500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We could modify the structure to allow consumer chains to include a bid price: "In exchange for $x of $ATOM, you will serve as a validator for my app chain." This scheme allows validators and consumer chains to place orders consistent with their resource specifications. The market clearing price for validators is simply where the supply and demand curves intersect. The result is something like a decentralized Amazon Web Services for validators to provide app-specific compute.</p><p>When you think your token has enough economic value to get your own validator set, you can simply unbond your $ATOM, and your old validators will fill the next orders placed to the fee market consistent with their ask. As the fee market becomes more popular, we might include a devops environment for validators to quickly spin up new nodes for networks without eventual resource starvation.</p><h2>Who's In Charge?</h2><p>Where would we run a fee market like this? The most natural option might be the Hub itself, but dApps like Gravity Bridge and Gravity DEX have proven to be controversial. The right answer might be a dedicated chain that supports a central limit order book.</p><p>I wanted to keep this post short, but I am interested to hear the Cosmos community's thoughts. Let me know what you think on Twitter <a href="https://twitter.com/neelsomani">@neelsomani</a> or email at neeljaysomani [at] gmail.com.</p>]]></content:encoded></item><item><title><![CDATA[The Future of Terra DeFi]]></title><description><![CDATA[Terra is a rapidly growing blockchain offering UST, the largest uncollateralized stablecoin ever. In this post, I give recommendations to the Terra ecosystem and propose Terranova: an EVM on Terra.]]></description><link>https://www.neelsomaniblog.com/p/the-future-of-terra-defi</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/the-future-of-terra-defi</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Wed, 06 Apr 2022 01:24:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4805cfe0-d9f8-4ebb-abdc-62632a5fb26b_650x650.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>tl;dr: 75% of $UST is held in Anchor because of its high fixed yield. When the yield drops, in order for $UST to hold its peg, there need to be organic use cases for $UST. I propose a Terra EVM (<a href="https://www.terranova.finance/">Terranova</a>) in addition to some novel DeFi infrastructure below.</em></p><p>History will view algorithmic stablecoins as either a) a disaster as inevitable as the subprime mortgage crisis or b) the greatest recent innovation in financial history.</p><p>In this post, I will give an economic analysis of Terra's $UST, leading to concrete recommendations for the Terra ecosystem.</p><h2>A Brief History of Terra</h2><p>If you're already familiar with Terra, you can skip this section.</p><p>Context for the uninitiated: $UST is the largest <a href="https://www.coindesk.com/tech/2021/07/06/the-quest-for-a-truly-decentralized-stablecoin/">algorithmic stablecoin</a>, with other examples being <a href="https://bean.money/">Beanstalk</a> and Basis (rest in peace). Much of the draw to $UST is because of the high fixed yield available to $UST holders via Anchor Protocol, Terra's lending platform: if you leave your money in $UST for one year, Anchor will guarantee 20% APY. And this isn't 20% of imaginary Monopoly money, because $UST is pegged to the US dollar.</p><p>So what's the problem? Well, some people say that 20% can't last forever. Anchor is only able to afford paying out that 20% yield by loaning out $UST, but if they aren't able to make the 20% through cash flows, then it comes from their <a href="https://thedefiant.io/anchor-all-time-high-tvl-savings-luna/">reserve</a> - which is being depleted. Eventually, the Anchor Protocol will not be able to provide this 20% APY. Recently, Anchor adopted a proposal for a <a href="https://forum.anchorprotocol.com/t/dynamic-anchor-earn-rate/3042">dynamic rate</a>, which will gracefully reduce the APY.</p><p>When the APY drops, people might sell their $UST. This is a big deal, since about <strong>75% of <a href="https://coinmarketcap.com/currencies/terrausd/">all $UST</a> is locked in <a href="https://app.anchorprotocol.com/">Anchor</a></strong>! While Terra will attempt to maintain the $UST peg by minting $LUNA, this <a href="https://mirror.xyz/damsondao.eth/OVeBrmrfcWm7uKLlA2Q4W1XTVkFU3cMKfNWhgf7mQuM">great article</a> by <a href="https://twitter.com/damsondao">@damsondao</a> explains the risk of a death spiral:</p><p><em>"UST redemptions in favor of LUNA that is being sold on the market by arbitrageurs leads to a significant decrease in its price, which necessitates more LUNA being minted for each UST burned, creating a hyper-inflationary loop in LUNA's supply. This then trigger [sic] a crisis of confidence in LUNA's ability to retain value that further reduces demand for UST until the mechanism implodes as it fails to adequately reduce supply and UST's peg inevitably breaks."</em></p><p>It leads to the question: <strong>How can we incentivize people to hold their $UST once the Anchor yields decline?</strong> (We can ask the same question about $LUNA, since it is used to stabilize $UST. A simplified explanation is that when $UST falls below peg, the system issues $LUNA and buys back $UST until it re-pegs. When $UST rises above peg, the system sells $UST and burns the $LUNA.)</p><h2>Make Something People Want</h2><p>A natural starting point is to ask how the US dollar does it. It all comes down to supply and demand:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yqyw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yqyw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yqyw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yqyw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yqyw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yqyw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg" width="341" height="313" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:313,&quot;width&quot;:341,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11828,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yqyw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yqyw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yqyw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yqyw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5041708-8467-49e2-af98-f81efc1ca7a8_341x313.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At any given "price" for dollars, some people decide to continue holding dollars. The US dollar must provide sufficient incentives for people to hold this currency over all others, or else the currency will lower in "price" (depreciate).</p><p>Moreover, the US dollar must be especially good at incentivizing people to hold, since it qualifies as a <a href="http://thinking.farm/essays/2021-01-17-beware-the-coupon-clipper/">coupon coin</a>. While the Federal Reserve can temporarily decrease the money supply by selling securities through open market operations, eventually the principal must be repaid in addition to any interest, so the money supply only increases in the long run. So the supply curve in the diagram above is continually shifting right, and yet the dollar doesn't have a depreciation crisis.</p><p>In fact, the US does such a good job incentivizing us to hold $USD that most of us don't even think about whether we're going to trade our dollars for some other currency. <strong>The primary reason to hold the US dollar is because we use it in our day-to-day life: to receive our paychecks, to pay for our groceries, to buy our stocks.</strong></p><p>Which leads me to this tweet by Do Kwon:</p><blockquote><p>8/ New algo stablecoin developers need to remember that their challenge is economy building &gt; mech design.<br><br>The only way to stability sustainable use cases around the stablecoins, and stability will increase as these use cases become more sticky, distributed and uncorrelated.</p><p>&#8212; Do Kwon &#127765; (@stablekwon) <a href="https://twitter.com/stablekwon/status/1405742215960219659?ref_src=twsrc%5Etfw">June 18, 2021</a></p></blockquote><p>How can we think about the sources of demand for a currency? Here is a (very incomplete) framework:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XD4G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XD4G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png 424w, https://substackcdn.com/image/fetch/$s_!XD4G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png 848w, https://substackcdn.com/image/fetch/$s_!XD4G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png 1272w, https://substackcdn.com/image/fetch/$s_!XD4G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XD4G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png" width="960" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6992cc4-2475-4fed-8aa0-614710124e48_960x540.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65444,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XD4G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png 424w, https://substackcdn.com/image/fetch/$s_!XD4G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png 848w, https://substackcdn.com/image/fetch/$s_!XD4G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png 1272w, https://substackcdn.com/image/fetch/$s_!XD4G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6992cc4-2475-4fed-8aa0-614710124e48_960x540.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the diagram above, the recently-introduced <a href="https://www.coindesk.com/markets/2022/02/23/terras-luna-jumps-15-as-ust-stablecoin-gets-1b-bitcoin-reserve/">Bitcoin reserve</a> increases the intrinsic value of $UST. While the Anchor yields will decline over time, as more protocols launch on Terra, we'll see staking yields climb.</p><h2>The Terra DeFi Ecosystem Must Grow</h2><p>Recommendations:</p><h3>1. EVM compatibility</h3><p>Composability drives network effects: the more that is built on Terra, the stickier it becomes.</p><p>In summer 2021, Binance Smart Chain usage exploded with a variety of applications that resembled popular Ethereum dApps: SushiSwap vs. UniSwap, Ellipsis vs. Curve, etc. <strong>This is why I'm building <a href="https://www.terranova.finance/">Terranova</a>: an Ethereum Virtual Machine (EVM) on Terra.</strong></p><p>EVM compatibility on Terra will win for several reasons. In a multichain future, EVM compatible L1s will capture the growing transaction volume and mitigate Ethereum network congestion. Terra becomes a contender as the dominant EVM compatible chain in the Cosmos ecosystem (cf. Evmos). Ethereum dApps will interoperate easily with Terra dApps through native "cross chain messaging" between Terranova and Terra, giving access to yields through Anchor, synthetics through Mirror, etc. And Ethereum dApps will unlock more use cases for $UST.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cqdp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cqdp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png 424w, https://substackcdn.com/image/fetch/$s_!cqdp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png 848w, https://substackcdn.com/image/fetch/$s_!cqdp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png 1272w, https://substackcdn.com/image/fetch/$s_!cqdp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cqdp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png" width="960" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a7310041-e4eb-4129-9009-93707ac212fd_960x817.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:97965,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cqdp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png 424w, https://substackcdn.com/image/fetch/$s_!cqdp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png 848w, https://substackcdn.com/image/fetch/$s_!cqdp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png 1272w, https://substackcdn.com/image/fetch/$s_!cqdp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7310041-e4eb-4129-9009-93707ac212fd_960x817.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>An EVM on Terra brings the top Ethereum dApps (e.g., money markets, option protocols, DEXes) to Terra and empowers developers who are already familiar with Ethereum tooling. Terra gains access to the tremendous amount of liquidity on EVM compatible chains, since it's possible some percent of that liquidity is unfamiliar with non-EVM compatible chains. If you are interested in working on this problem, email me at neeljaysomani [at] gmail.com.</p><h3>2. Build novel DeFi infrastructure</h3><p>I see Terra as a high-potential contender to have the most sophisticated DeFi infrastructure in crypto, creating network effects that both usher in and retain institutional capital.</p><p>Some novel areas we should explore:</p><ul><li><p>A wider variety of oracles: Given my background as a quant in power pricing, some oracles that are interesting to me involve the weather, power prices, and gas prices. Such oracles would enable institutional investors to participate in popular commodities trading devices, such as <a href="https://quant.stackexchange.com/questions/1687/what-is-a-heat-rate-option">heat rate call options</a>, <a href="https://www.eia.gov/todayinenergy/detail.php?id=9911">spark spreads</a>, and <a href="https://www.cmegroup.com/education/articles-and-reports/weather-options-overview.html">weather options</a>. The large institutional electricity trading market is a great fit for trading on the blockchain because <a href="https://www.cnbc.com/2021/12/04/bitcoin-miners-say-theyre-fixing-texas-electric-grid-ted-cruz-agrees.html">crypto is becoming increasingly relevant for power traders</a> anyway. For example, Talen Energy (owner of coal plant Brandon Shores) recently announced a <a href="https://talenenergy.investorroom.com/2021-08-03-Talen-Energy-Corporation-Announces-Zero-Carbon-Bitcoin-Mining-Joint-Venture-with-TeraWulf-Inc">Bitcoin mining venture</a>.</p></li><li><p>More sophisticated financial instruments: First we had Mirror for synthetics, soon we'll have <a href="https://sig.finance/sigma_wp.pdf">Sigma</a> for options. Next we should develop <a href="https://en.wikipedia.org/wiki/Futures_contract">futures</a> and forwards on Terra. These would likely rely on oracles like those described above. By combining forwards on the price of Bitcoin and the electricity price, a Bitcoin miner can hedge out their price exposure. In addition, Terra's native stablecoins are perfect for constructing derivatives like <a href="https://en.wikipedia.org/wiki/Currency_swap">currency swaps</a>.</p></li></ul><h2>What's next?</h2><p>We're about to witness the outcome of a groundbreaking experiment in DeFi. $UST's success depends on whether we can collectively build the necessary network effects to prevent it from collapsing. I'm actively building a <a href="https://www.terranova.finance/">Terra EVM</a> to support the ecosystem and would love to talk with others who are interested in working on some of the related problems. I'm curious to hear the Terra community's thoughts, so let me know what you think on Twitter (<a href="https://www.twitter.com/neelsomani">@neelsomani</a>, <a href="https://www.twitter.com/TerranovaEVM">@TerranovaEVM</a>) or at neeljaysomani [at] gmail.com.</p>]]></content:encoded></item><item><title><![CDATA[Explaining to My Parents What I Do]]></title><description><![CDATA[My parents don't understand what I do as a quant, so I wrote this to help explain to them. This post goes into the basics of power pricing and the electricity market.]]></description><link>https://www.neelsomaniblog.com/p/explaining-to-my-parents-what-i-do</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/explaining-to-my-parents-what-i-do</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Sun, 09 Jan 2022 02:29:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RcH8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Albert Einstein said: "If you can't explain it to a six year old, you don't understand it yourself." I like to think this is true about explaining things to my parents.</p><p>As is the case for many people my age, my parents don't understand what I do. They're both in medicine and they have no background in math nor economics, while I'm a quantitative researcher at a hedge fund. In this post I will attempt to give a high-level explanation of one of the problems that I work on solving: <strong>power pricing</strong>. This is a simplified explanation that neglects details and edge cases.</p><h2>Intro: the power balance</h2><p>Power (or electricity) must be generated by power plants. That power is consumed by "load" (demand), often a load-serving entity (LSE) that distributes that power to other people. My parents are in northern California, where the LSE is Pacific Gas &amp; Electric. Your electricity bill comes in kilowatt-hours (kWhr), but I typically think about things in megawatt-hours (MWhr).</p><p>At any given moment, the amount of power produced must exactly equal the amount of power consumed. This is called the "balance." If the grid does not balance, then the frequency of the grid will lower (or increase) from the 60 Hz that it must operate at. If the frequency of the grid is not exactly right, then it can cause serious damage to the power generators. This is why when the grid does not balance, generators prefer to completely shut down, causing brownouts or blackouts.</p><h2>Why does the price for power change?</h2><p>For some supply, it is very cheap to produce each MW of power. This is called the "marginal cost": the cost to produce an additional unit of power. Fuel types with very low marginal cost are solar, wind, and nuclear. Gas and coal have higher marginal costs.</p><p>If the demand is very low (such as in the middle of the night, called off-peak hours), then the cheapest producers will supply the electricity, and therefore the price is not very high. When demand increases, we must incentivize the less efficient producers to supply that power. It costs them more money to make the power, so the price increases.</p><p>The power is always supplied by the producer that minimizes the total cost to the system. This is ensured by a central organization called a regional transmission organization (RTO), to which all of the producers submit their marginal costs and LSEs submit their demand. The RTO in California is called CAISO, and my parents are in northern California, so they are specifically in the region called NP-15. Other RTOs include PJM, which covers where I live in Chicago, and ERCOT, which you might have heard about in Texas.</p><h2>How is power traded?</h2><p>Power is traded in many ways (you can trade the average price of power, the theoretical profitability of a coal-powered plant, the number of hot or cold days in a month, the ratio or difference between power and gas prices, etc.). I'm going to just give a couple of examples in this section.</p><p>A "forward" is an agreement to buy or sell something at a certain price in the future. So let's say I buy a forward on a banana for $10 one year from now. One year from now the contract settles and the market is pricing the banana at $25. That means I buy the banana at $10 (because of my forward) and I get to sell at $25, so I made $15 profit. If the price went down, then I would have lost money.</p><p>Now onto power: there is a "day-ahead" market for power, and a "real-time" market. I am not going to go into the details of the day-ahead market, but you can buy forwards on the day-ahead price.</p><p>The real-time market recalculates the price for power every 5 minutes throughout the day. The price is calculated using an optimization that is simplified below.</p><p>One of the most basic instruments is a "bal-day," short for balance of the day. The bal-day represents the average price for power over the peak hours in the real-time market. So in the morning you might see that you can buy the bal-day at $30, but you think the average price in the day will be $40. You would buy the bal-day, and then wait until the end of the day when the contract will settle, and you will make or lose the difference.</p><p>Another distinction is that most people will trade the average price of power over a given area called a "zone." But there are thousands of physical busses that people connect to, each of which has its own power price, and you can trade those via nodal trading (with instruments like FTRs and ARRs, again not worth going into).</p><h2>How does the RTO figure out the market price for power?</h2><p>So you might have seen graphs like this before:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RcH8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RcH8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png 424w, https://substackcdn.com/image/fetch/$s_!RcH8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png 848w, https://substackcdn.com/image/fetch/$s_!RcH8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png 1272w, https://substackcdn.com/image/fetch/$s_!RcH8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RcH8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png" width="1282" height="1036" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1036,&quot;width&quot;:1282,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105335,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RcH8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png 424w, https://substackcdn.com/image/fetch/$s_!RcH8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png 848w, https://substackcdn.com/image/fetch/$s_!RcH8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png 1272w, https://substackcdn.com/image/fetch/$s_!RcH8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3be2a10-9901-4135-a9cf-bbc89474b10e_1282x1036.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The line sloping downwards represents demand. As quantity increases (bottom axis), the price people are willing to pay also decreases. The upward sloping line represents supply. As we talked about above, for small quantities, the power does not cost much to produce, so people are willing to supply it for very cheap prices. As the quantity increases, the price that producers require also increases.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TrXe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TrXe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png 424w, https://substackcdn.com/image/fetch/$s_!TrXe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png 848w, https://substackcdn.com/image/fetch/$s_!TrXe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png 1272w, https://substackcdn.com/image/fetch/$s_!TrXe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TrXe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png" width="1282" height="1036" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1036,&quot;width&quot;:1282,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:78112,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.neelsomaniblog.com/i/156969936?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TrXe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png 424w, https://substackcdn.com/image/fetch/$s_!TrXe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png 848w, https://substackcdn.com/image/fetch/$s_!TrXe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png 1272w, https://substackcdn.com/image/fetch/$s_!TrXe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55f0aa9-0eff-46a0-8385-496bc9520405_1282x1036.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The area in green is called "economic surplus." It is good! We want to maximize that area, because it represents all of the people who are getting a good deal. The people who demand power got it way cheaper than they were willing to pay for it, and the producers are selling power for higher than they were willing to sell it for. In fact, the "efficient market price" is the price that maximizes this area.</p><p>For power specifically, demand is described as "inelastic," meaning it doesn't really change much even if the price changes. So the downward sloping line is <em>almost</em> a straight line down. Therefore maximizing this area is equivalent to just picking the cheapest suppliers to supply the power!</p><p>The problem is that we have "constraints." Power lines can only transmit so much power and producers can only supply so much. There are actually tons of other constraints that I won't go into. But in short, the problem statement looks something like this:</p><p>Maximize the the economic surplus (minimize the cost to produce power), subject to:</p><ol><li><p>(power supplied) = (demand)</p></li><li><p>The amount of power transmitted across each line cannot exceed the line's capacity</p></li><li><p>Each producer cannot produce more power than they have capacity</p></li></ol><p>Or the formulation in the ISO New England (ISONE) slides:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!i1wW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!i1wW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png 424w, https://substackcdn.com/image/fetch/$s_!i1wW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png 848w, https://substackcdn.com/image/fetch/$s_!i1wW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!i1wW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!i1wW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png" width="1456" height="973" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:973,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:209619,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!i1wW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png 424w, https://substackcdn.com/image/fetch/$s_!i1wW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png 848w, https://substackcdn.com/image/fetch/$s_!i1wW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!i1wW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7adad1ee-b3e3-4e04-99ac-08cb35dcd168_1592x1064.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This model is called a production cost model. One thing that makes this more complicated is that generators have a "start cost": it costs them some amount of money to even turn on. Another complication is that power generators must either be fully on or fully off, and when they are on, they must pay a "no load cost," the cost of just existing while you're on.</p><p>This problem becomes complicated enough that it is a well-known type of problem, called a mixed integer program (MIP), equivalent to the hardest problems in computer science. What's nice about this model is that if you solve it, one of the outputs ends up being the price at each location. It takes hours to solve the complete version of this problem.</p><h2>So what do I do exactly?</h2><p>I develop various models to get the possible outcomes for a variety of different investments. I hope this is helpful!</p>]]></content:encoded></item><item><title><![CDATA[How To Derive Useful Financial Approximations]]></title><description><![CDATA[Many useful financial rules of thumb can be derived using Taylor approximations. In this blog post, I solve for common approximation formulas.]]></description><link>https://www.neelsomaniblog.com/p/how-to-derive-useful-financial-approximations</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/how-to-derive-useful-financial-approximations</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Fri, 16 Oct 2020 02:58:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/eb7431b5-6563-48a7-9de1-a657e84c2021_1526x1526.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I was recently doing some finance trainings for my new job, and I thought it would be interesting to demonstrate how you can derive useful financial approximations using Taylor polynomials. These rules of thumb can get you close enough to the truth to do napkin math. In this post, I'll work through some examples.</p><h2>Calculating Bond Yields</h2><h3>Prerequisite Knowledge</h3><p>For the purpose of this section, a bond (loan) is a financial instrument where you pay some price P, and every year you receive dollar amounts C (called coupons). At the end of the loan (n years) you receive back what's called the par value V.</p><p>One metric that investors are interested in is called the yield y of the bond. This is essentially a measure of the bond's rate of return, or the percent yearly return you'd have to make on another investment in order to be indifferent between that other investment and this bond. We find the yield by solving for y in:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P = \\frac{C}{(1 + y)^1} +\\ ...\\ + \\frac{C}{(1 + y)^n} + \\frac{V}{(1 + y)^n}&quot;,&quot;id&quot;:&quot;EPHJBAZBBQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Since in general y &gt; 0, we are valuing payments lower the further they are in the future (called "discounting").</p><h3>The Approximation</h3><p>Here is <a href="https://www.iotafinance.com/en/Formula-Bond-Yield-Quick-Approximation.html">an approximation for the bond's yield</a>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\approx \\frac{C}{V} + \\frac{\\frac{V - P}{V}}{n}\n\\approx \\frac{C}{V} + \\frac{\\frac{V - P}{V}}{n}\n\n&quot;,&quot;id&quot;:&quot;KBPZSRNGAE&quot;}" data-component-name="LatexBlockToDOM"></div><p>The intuitive understanding is that the coupon payments contribute C/V to the yield every year. To explain the other term, you initially paid P and you ultimately receive V, contributing about (V-P)/V to the yield over the entire course of the bond. So for each year, it's about [(V - P)/(V)]/n in yield.</p><p>Why does this approximation work? We start with the normal equation for calculating bond yield, but we don't discount the coupon payments - that is, we pretend like we don't care about when the coupon payments are made:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P \\approx \\underbrace{C +\\ ...\\ + C}_{n\\ \\text{times}} + \\frac{V}{(1 + y)^n}\n= n * C + V * \\frac{1}{(1 + y)^n}&quot;,&quot;id&quot;:&quot;PTVPBLHSSY&quot;}" data-component-name="LatexBlockToDOM"></div><p>To simplify the remainder of the equation, we recall the <a href="https://en.wikipedia.org/wiki/Linear_approximation">first-order Taylor approximation</a> centered on 0. That is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;f(y) \\approx f(0) + f'(0) * y&quot;,&quot;id&quot;:&quot;OYMPZPVEWR&quot;}" data-component-name="LatexBlockToDOM"></div><p>We'll just use the Taylor approximation as a tool in this post rather than discussing it at length. We see that:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{d}{dy}\\frac{1}{(1 + y)^n} = \\frac{-n}{(1 + y)^{n + 1}}\n\n\\implies \\frac{1}{(1 + y)^n} \\approx 1 + \\frac{-n}{(1 + 0)^{n + 1}} * y = 1 - n * y\n\n&quot;,&quot;id&quot;:&quot;DHWMLJFZPV&quot;}" data-component-name="LatexBlockToDOM"></div><p>So after simplifying and isolating for y:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;(P \\approx n * C + V * (1 - n * y))\n\n\\implies (y \\approx \\frac{C}{V} + \\frac{\\frac{V - P}{V}}{n})&quot;,&quot;id&quot;:&quot;TRBNMWMATX&quot;}" data-component-name="LatexBlockToDOM"></div><h2>Time For Money To Double: The Rule Of 72</h2><h3>Prerequisite Knowledge</h3><p>Let's say that you have an investment that grows by r% every year. So for some initial investment P, after n years, you'll have P * (1 + r)^n. The question: after how many years does your investment double?</p><h3>The Approximation</h3><p>The common approximation given is that <a href="https://en.wikipedia.org/wiki/Rule_of_72">you divide 72 by your rate of return</a>, and that gives you the number of years it takes to double. So for an investment that gives a 6% return, it should take about 12 years to double.</p><p>We start with the exact equation:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;2 * P = P * (1 + r)^n\n\n\\implies 2 = (1 + r)^n\n\n\\implies ln(2) = n * ln(1 + r)&quot;,&quot;id&quot;:&quot;IXFRQRMPHR&quot;}" data-component-name="LatexBlockToDOM"></div><p>First we observe that ln(2) ~= .693. Then we calculate the first-order Taylor approximation centered on 0 of the right-hand side:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{d}{dr}ln(1 + r) = \\frac{1}{1 + r} \\implies ln(1 + r) \\approx \\frac{1}{1 + 0} * r + 0 = r&quot;,&quot;id&quot;:&quot;PVKLIEBUCW&quot;}" data-component-name="LatexBlockToDOM"></div><p>So we finally get:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;n \\approx \\frac{.693}{r}&quot;,&quot;id&quot;:&quot;JRDLQGKCMW&quot;}" data-component-name="LatexBlockToDOM"></div><p>where .693 is frequently substituted with .72 to keep the math simple, since 72 is easily divisible by many numbers. To see when it is most accurate, we can see when this function is equal to the <a href="https://en.wikipedia.org/wiki/Taylor%27s_theorem#Statement_of_the_theorem">second-order Taylor approximation</a>:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{d}{dr}\\frac{1}{1 + r} = -\\frac{1}{(1 + r)^2} \\implies ln(1 + r) \\approx r - \\frac{r^2}{2}&quot;,&quot;id&quot;:&quot;YIBFJYHKIX&quot;}" data-component-name="LatexBlockToDOM"></div><p>so we solve for:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{ln(2)}{r - \\frac{r^2}{2}} = \\frac{.72}{r}&quot;,&quot;id&quot;:&quot;PQQFKGIFJA&quot;}" data-component-name="LatexBlockToDOM"></div><p>which simplifies to a linear equation. By solving for r, we see that the rule is most accurate for rates around 7.46%.</p><h2>Chance Of Winning A Poker Hand: The Rule Of Fours</h2><h3>Prerequisite Knowledge</h3><p>This is a fun one. In a standard game of Texas hold'em, you're holding on to two cards, and there are some cards on the table. Your goal is to make the best hand possible out of the cards in your hand plus the cards on the table. Once five cards are on the table, no more cards will be drawn. So you might be curious what your chance of winning is if there are three cards out (and two to go, known as the flop).</p><h3>The Approximation</h3><p>The common rule is that you count the number of cards that would lead to your win (called "outs") and <a href="https://www.pokerstarsschool.com/lessons/the-rule-of-two-and-four/689/">multiply that number by 4%</a> for your approximate chance of winning.</p><p>For example, if you have a 3, 4, 5, and 6, then you are hoping for a 2 or a 7 to complete your <a href="https://www.pagat.com/poker/rules/ranking.html#standard">straight</a>. There are four 2's and four 7's, so you have roughly a 8 * 4% = 32% chance of getting the straight.</p><p>We start with the exact probability as usual. Since you have two cards in your hand and three on the table, there are 52 - 5 = 47 cards remaining. The easiest way to calculate your chance of winning is to calculate 1 - Pr[you lose]. Since there are k cards that would lead to your win, there are 47 - k cards that would not lead to you winning after the next card is drawn, or a (47 - k)/47 chance. The final card has a (46 - k)/46 chance of you not winning by the same reasoning. So:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{Pr[you win]} = 1 - \\frac{47 - k}{47} * \\frac{46 - k}{46} = 1 - \\frac{47 * 46 - 93 * k + k^2}{47 * 46} = \\frac{93 * k}{47 * 46} - \\frac{k^2}{47 * 46}&quot;,&quot;id&quot;:&quot;LQNXPPADTA&quot;}" data-component-name="LatexBlockToDOM"></div><p>Since this is a polynomial, the first-order Taylor approximation is just the first term of the polynomial:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\approx \\frac{93 * k}{47 * 46} \\approx .043 * k \\approx .04 * k&quot;,&quot;id&quot;:&quot;SKRUOPYUYK&quot;}" data-component-name="LatexBlockToDOM"></div><p>To see for how many outs k the approximation 4% * k is most accurate, we compare it to the exact solution and solve for k:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;.04 * k = \\frac{93 * k}{47 * 46} - \\frac{k^2}{47 * 46}&quot;,&quot;id&quot;:&quot;WCQOLUCISJ&quot;}" data-component-name="LatexBlockToDOM"></div><p>which gives about k = 6.52.</p>]]></content:encoded></item><item><title><![CDATA[Three Controversial Beliefs About Living Things]]></title><description><![CDATA[I describe a few of my more unusual beliefs about evolution and the nature of living things. I think that evolution is frequently misunderstood.]]></description><link>https://www.neelsomaniblog.com/p/three-controversial-beliefs-about</link><guid isPermaLink="false">https://www.neelsomaniblog.com/p/three-controversial-beliefs-about</guid><dc:creator><![CDATA[Neel Somani]]></dc:creator><pubDate>Thu, 24 Oct 2019 03:06:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ec83ccb5-fa76-487d-9ddb-28e503d54999_532x532.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I've always been interested in philosophy of science, and earlier today I was thinking a bit about some commonly-held ideas in the field of biology which I disagree with. Here are a few controversial views that I have about living things and evolution. They're strong beliefs that are loosely held. What are your thoughts?</p><h2>1. There is no essential difference between a living thing and a non-living thing.</h2><h3>We cannot define life.</h3><p>Below are a few proposed definitions of "life," but they don't work. I'll explain how they either encompass too much or too little (<a href="http://www.aim.univ-paris7.fr/enseig/exobiologie_PDF/Biblio/Cleland%20and%20Chyba%20_2002.pdf">Cleland and Chyba 2002</a>).</p><p><em>Life is matter that can reproduce itself and evolve as survival dictates</em> (<a href="https://web.archive.org/web/20120322185054/http://www.etsu.edu/physics/lutter/courses/astr1020/a1020chap12.pdf">source</a>). This common definition stems from Darwinian evolution. Let's consider someone who is infertile. They cannot reproduce, yet we still clearly consider them to be living, so the definition doesn't work. (In the <a href="http://localhost:8888/neelsomani/blog/three-controversial-beliefs-about-living-things.php#section-2">next section</a>, I'll demonstrate that "evolve as survival dictates" is meaningless.)</p><ul><li><p>That definition seems to stem from a misunderstanding of evolution anyway. Survival and reproduction are just terms that come up in the description of natural selection. That is, organisms that have a better chance of survival and reproduction are going to be overrepresented in the next generation (i.e., have greater reproductive fitness). Those aren't essential features of life.</p></li></ul><p>For any other definition, I can come up with exceptions just as easily. Something with a metabolism? Look at a car. Something with <a href="https://en.wikipedia.org/wiki/Entropy_and_life#Negative_entropy">negative entropy</a>? Here's my refrigerator.</p><p>In <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC516796/#pbio-0020302-McKay1">McKay's 2004 article</a> on how to search for life, he acknowledges (and cites) the difficulties associated with defining life, but ultimately concludes that we should search for "energy, carbon, liquid water, and a few other elements such as nitrogen, sulfur, and phosphorus." I don't see the reason to search for any of those things if we're not even confident that we would identify "life" if we saw it.</p><p>I'll admit that even if we can't define what it means for an individual thing to be alive, there might still be notable characteristics about a <em>group</em> of things that are alive. My point is that there is nothing fundamentally different about an individual living thing.</p><p>I'll also admit that just because we can't define something doesn't mean that there's nothing that fundamentally distinguishes it. As Cleland and Chyba suggest, we had no acceptable definition of water before the development of molecular theory. To be fair, a similar definition for life would require discovering some sort of fundamental "life force" that falls outside of our current framework for explaining things. That seems unlikely to me and almost more like a religious belief.</p><h3>What about the argument that life is a spectrum?</h3><p>One solution is that life is a spectrum from "undeniably non-living" to "undeniably living." That would explain why there's no hard and fast rule as to whether something is alive.</p><p>If that's the case, then the extreme of "living" still needs to be described. I think that if it is a spectrum, then the side of "living" is very human-centric (or anthropocentric). That is, when we say something is "living," we really mean "more like humans." No one would deny that a gorilla is alive (it looks so much like us), a sponge is questionable at first glance, and a virus barely doesn't meet the cutoff. Of course we'd place humans on the extreme of "undeniably living," which is suspicious to me and makes me think that it's still an arbitrary spectrum.</p><h3>Why do people feel like there's something special about life?</h3><p>You might wonder why we have such a strong urge to believe that there's something fundamentally different about living things if it's not actually the case. My argument is that our propensity for identifying things as "living" just served evolutionary utility. To identify something as living allowed us to see it as predator or prey and respond accordingly. The organisms that didn't have this sense were at a huge disadvantage.</p><h2>2. Evolution is just a description of how a group of entities, whether living or non-living, can change over time. It is not a "force."</h2><p>Example of the misconception:</p><p>"Evolution is the single greatest force in the universe; it is the only thing that is permanent and it drives everything." - Ray Dalio's <a href="https://inside.bwater.com/publications/principles_excerpt">"Principles"</a></p><p>In the quote, Dalio fails to understand that evolution is not a force, and it does not drive anything. (In Dalio's defense, he might not have been using "evolution" in the traditional biological sense.)</p><h3>None of the mechanisms of evolution require things to be living.</h3><p>Evolution happens in a few different ways, but there's nothing about those mechanisms that's specific to living things. They're just logical statements that basically amount to describing the only ways that a population of things (anything, not just life) could possibly change. Evolution isn't a "force."</p><ul><li><p><a href="https://en.wikipedia.org/wiki/Natural_selection">Natural selection</a> is almost circular. It's saying that if something has characteristics that make it more likely to exist in the following generation, then it will be overrepresented in the next generation. There's nothing about that process that's specific to living things. For example, maybe we're looking at a bowl of Starburst candies (red, pink, orange, and yellow) each day. If people tend to eat the red and pink candies more frequently, then we're going to be left with a bucket of yellow and orange candies. That's essentially the mechanism of natural selection. (I'm sort of clumping artificial selection in this same group.)</p></li><li><p>Genetic drift is the random fluctuation of the proportions of different groups. For example, I might have a group of shirts. One day, I accidentally spill something on one of my blue shirts, so I have to throw it out. The proportion of blue shirts in the group has decreased, but it had nothing to do with the color or characteristics of those shirts. It just happened by chance, as opposed to natural selection.</p></li><li><p>Gene flow is like another group joining the first one and affecting the proportions. Let's say I get my brother's wardrobe when he moves out. If he owned a higher proportion of white shirts than I did, then my wardrobe is going to have a higher proportion of white shirts than before.</p></li></ul><p>Although the names "genetic drift" and "gene flow" appear to be tied to living things, the underlying concepts are not.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!v5zm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!v5zm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png 424w, https://substackcdn.com/image/fetch/$s_!v5zm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png 848w, https://substackcdn.com/image/fetch/$s_!v5zm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png 1272w, https://substackcdn.com/image/fetch/$s_!v5zm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!v5zm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png" width="893" height="635" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e59285ce-7fec-48f0-a79d-92398f45e161_893x635.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:635,&quot;width&quot;:893,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72385,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!v5zm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png 424w, https://substackcdn.com/image/fetch/$s_!v5zm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png 848w, https://substackcdn.com/image/fetch/$s_!v5zm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png 1272w, https://substackcdn.com/image/fetch/$s_!v5zm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe59285ce-7fec-48f0-a79d-92398f45e161_893x635.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>3. Evolution does not help us understand the "purpose" of humans in any way whatsoever.</h2><p>Here's a random blog post that makes reference to this misconception: <a href="https://blogs.scientificamerican.com/guest-blog/is-the-meaning-of-your-life-to-make-babies">https://blogs.scientificamerican.com/guest-blog/is-the-meaning-of-your-life-to-make-babies</a></p><p>"So is making babies -- and having genes survive through the generations -- the meaning of life? The answer is yes -- from an evolutionary gene's eye view&#8230; This is modern knowledge that is not to be taken lightly."</p><p>I disagree. Evolution doesn't imply that it's "good" for your genes to survive or that your genes have a "meaning" to perpetuate themselves. It's just that the genes that do happen to perpetuate themselves will be the genes that exist next generation. That's all there is to it. To evolution, it's not a good thing, it's not a bad thing, it's just necessarily the case.</p><p>To add on, the only reason why we have a will to survive and reproduce is because the organisms without that drive were at a reproductive disadvantage. Fewer of them made it to the following generations, so we're all basically left with the will to survive and reproduce. It still doesn't make it good or bad.</p><p>That means that evolution doesn't imply that the "purpose" of living things is to survive and reproduce. It's not your purpose, and it's not your genes' purpose.</p><p>It also means that evolution doesn't justify any other behavior. For example, someone might argue that hoarding resources makes them more likely to survive, so it's justified by evolution. That's not the case, since evolution does not imply that our purpose is to increase our reproductive fitness. If you want that to be your purpose, that's fine, but that's not what evolution says.</p>]]></content:encoded></item></channel></rss>