Data Engineering Weekly
Sebastian Raschka: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design
This article examines the structural changes and architectural developments in modern Large Language Models (LLMs), such as DeepSeek-V3, OLMo 2, Gemma 3, and Llama 4, rather than focusing on benchmark performance or training algorithms. The author details key innovations, including Multi-Head Latent Attention (MLA), Mixture-of-Experts (MoE), various normalization layer placements (Pre-Norm, Post-Norm, and QK-Norm), and sliding window attention, which primarily aim to enhance computational efficiency, memory usage, and training stability.
Paul Levchuk: The Metric Tree Trap
The article defines a Metric Tree as a hierarchical decomposition of a top-level business goal into measurable drivers, acknowledging its value primarily for visualisation and team alignment of key performance indicators. However, the author critically argues that Metric Trees are unreliable for making robust decisions, as they frequently obscure crucial operational insights due to issues such as contradictory metric definitions, inconsistent granularity, hidden trade-offs, and confounding factors, making the effective identification of key drivers, root cause analysis, and accurate prioritization challenging. To mitigate these “traps” and ensure reliable conclusions, the author strongly advises pairing Metric Tree insights with rigorous root cause analysis, scenario testing, and a thorough cost-benefit assessment.