[{"content":"","date":"20 November 2025","externalUrl":null,"permalink":"/myblog/categories/ai/ml/","section":"Categories","summary":"","title":"AI/ML","type":"categories"},{"content":"","date":"20 November 2025","externalUrl":null,"permalink":"/myblog/categories/","section":"Categories","summary":"","title":"Categories","type":"categories"},{"content":"","date":"20 November 2025","externalUrl":null,"permalink":"/myblog/tags/deep-learning/","section":"Tags","summary":"","title":"Deep-Learning","type":"tags"},{"content":"","date":"20 November 2025","externalUrl":null,"permalink":"/myblog/tags/machine-learning/","section":"Tags","summary":"","title":"Machine-Learning","type":"tags"},{"content":" Abstract\nThis piece explains neural networks from the intuition upward: what a neuron is, why layers help, how gradient descent changes weights, and why backpropagation is less mystical than it sounds. It is written for readers who want the concepts to feel concrete before diving deeper into the math.\nI remember the first time I tried to understand neural networks. It was like 2am, I was three YouTube videos deep, and this guy was drawing circles and arrows on a whiteboard saying \u0026ldquo;and then the network just learns\u0026rdquo; while waving his hands around. Like, okay thanks dude, super helpful.\nI\u0026rsquo;d bounce between articles that treated me like a five-year-old (\u0026ldquo;neural networks are like brains!\u0026rdquo;) and papers that opened with equations that looked like someone sneezed on a keyboard. There was no middle ground.\nSo this is my attempt to explain neural networks the way I wish someone had explained them to me back then. Actual intuition, not hand-waving. Not gonna lie though, there\u0026rsquo;s still some math—but the kind that actually helps you understand what\u0026rsquo;s going on.\nWhy Most Explanations Suck # Look, I\u0026rsquo;ve seen approximately a thousand neural network tutorials at this point. They basically all do the same thing:\nOption A: \u0026ldquo;It\u0026rsquo;s just like your brain! Neurons! Synapses! Magic!\u0026rdquo; Cool metaphor bro, but my brain doesn\u0026rsquo;t use gradient descent and I still can\u0026rsquo;t build anything with this information.\nOption B: Opens with this absolute unit of an equation:\n∂L/∂w = ∂L/∂a * ∂a/∂z * ∂z/∂w And I\u0026rsquo;m supposed to just\u0026hellip; know what that means? Where did \u0026lsquo;z\u0026rsquo; even come from?\nWhat I actually needed was someone to say \u0026ldquo;here\u0026rsquo;s what\u0026rsquo;s happening, here\u0026rsquo;s why it works, and oh by the way here\u0026rsquo;s the math if you want to go deeper.\u0026rdquo; That middle ground barely exists, which is weird because that\u0026rsquo;s where most people actually need help.\nWhat Neural Networks Actually Do (For Real) # Forget everything you\u0026rsquo;ve heard. At their core, neural networks are just function approximators. That\u0026rsquo;s literally it.\nYou give them inputs → they do some math → you get outputs. The interesting part is how they figure out which math to do.\nHere\u0026rsquo;s the problem they solve: let\u0026rsquo;s say you want to tell if a picture is a cat or a dog. Old school programming would be like \u0026ldquo;okay, IF pointy_ears AND whiskers THEN cat.\u0026rdquo; But\u0026hellip; how do you code \u0026ldquo;whiskers\u0026rdquo;? What if the cat is fat and its ears look round? What if it\u0026rsquo;s a picture from behind and you can\u0026rsquo;t see the ears at all?\nYou\u0026rsquo;d go insane trying to write all the rules.\nNeural networks are basically like \u0026ldquo;screw it, show me 10,000 cat pictures and 10,000 dog pictures, I\u0026rsquo;ll figure it out myself.\u0026rdquo; And somehow it works. Not because of magic—it\u0026rsquo;s just really really determined trial and error.\nThe Building Block: One Stupid Neuron # Okay, let\u0026rsquo;s start with the simplest possible thing: a single neuron. This is gonna seem almost too simple, but stay with me.\nImagine you\u0026rsquo;re deciding whether to go outside. You check two things: temperature and if it\u0026rsquo;s raining. A neuron is basically just a really simple decision-maker that looks at those inputs and tells you yes or no.\ndef simple_neuron(temperature, is_raining): # Each input gets a weight - how much do we care about this? temp_weight = 0.7 rain_weight = -0.9 # negative because rain sucks # Just multiply and add decision = (temperature * temp_weight) + (is_raining * rain_weight) # If positive, go outside return decision \u0026gt; 0 That\u0026rsquo;s it. That\u0026rsquo;s a neuron. I\u0026rsquo;m not even joking.\nIt\u0026rsquo;s literally just:\nTake some inputs Multiply each one by a weight Add them all up If the total is positive, say \u0026ldquo;yes\u0026rdquo;, otherwise say \u0026ldquo;no\u0026rdquo; The weights are what the neuron \u0026ldquo;learns.\u0026rdquo; Big positive weight = this input really matters and pushes toward yes. Negative weight = this pushes toward no. Close to zero = who cares about this input.\nWhen I first saw this I was like \u0026ldquo;wait, that\u0026rsquo;s it?\u0026rdquo; Yes. That\u0026rsquo;s it. One neuron is almost embarrassingly simple. The magic happens when you stack a bunch of them together, but we\u0026rsquo;ll get there.\nThe One Extra Thing: Activation Functions # Real neurons have one more piece: an activation function. Instead of just checking \u0026ldquo;is it positive?\u0026rdquo;, you pass the number through a function first.\nWhy? Because hard cutoffs (positive = yes, negative = no) are really hard to optimize. Smooth curves are way easier to work with when you\u0026rsquo;re trying to learn. It\u0026rsquo;s like the difference between a light switch and a dimmer—the dimmer gives you way more control.\nThe most popular activation function is called ReLU, and when I first saw it I literally laughed because it\u0026rsquo;s so simple:\ndef relu(x): return max(0, x) That\u0026rsquo;s it. If the number is negative, return 0. If it\u0026rsquo;s positive, return the number. This absurdly simple function powers like 90% of modern neural networks. Sometimes the simplest things just work.\n(There are other ones like sigmoid and tanh that you\u0026rsquo;ll see, but ReLU won the popularity contest because it\u0026rsquo;s fast and doesn\u0026rsquo;t have some annoying math problems the others have. You can Google \u0026ldquo;vanishing gradient\u0026rdquo; if you want to go down that rabbit hole, but honestly you don\u0026rsquo;t need to worry about it yet.)\nOkay But One Neuron Is Boring # One neuron can make one simple decision. Cool. Not very useful.\nThe magic happens when you stack a bunch of neurons in layers. The output of one layer feeds into the next layer as input. And this is where things get actually interesting.\nHere\u0026rsquo;s what blew my mind when I finally understood it: each layer automatically learns to detect more and more complex stuff.\nLike in image recognition:\nFirst layer learns edges. Just basic lines—horizontal, vertical, diagonal. Boring but necessary. Second layer combines those edges into shapes. Circles, corners, that kind of thing. Third layer combines shapes into recognizable parts. Eyes, ears, noses. Fourth layer is like \u0026ldquo;oh yeah that\u0026rsquo;s definitely a cat\u0026rsquo;s face.\u0026rdquo; And here\u0026rsquo;s the crazy part: you don\u0026rsquo;t program any of this. You don\u0026rsquo;t tell it \u0026ldquo;layer 1, you learn edges.\u0026rdquo; It just\u0026hellip; figures it out on its own. Each layer is just trying to minimize errors, and somehow these hierarchical features emerge naturally. It\u0026rsquo;s almost creepy how well it works.\nHow Does It Learn Though? # Alright, so we\u0026rsquo;ve got this network full of neurons with weights. But how do we find the right weights? Like, there are thousands or millions of possible weight combinations. How do you find good ones?\nThis is where the learning actually happens. And honestly? It\u0026rsquo;s way less magical than people make it sound.\nHere\u0026rsquo;s the whole process:\nStep 1: Start with completely random weights. Just roll some dice. Your network will suck. That\u0026rsquo;s fine.\nStep 2: Show it an image. Ask it to predict. It\u0026rsquo;ll probably say something stupid like \u0026ldquo;that dog is definitely a boat.\u0026rdquo;\nStep 3: Calculate how wrong it was. This is called the \u0026ldquo;loss.\u0026rdquo; Big wrong = big loss.\nStep 4: Figure out which direction to adjust each weight. If you made this weight bigger, would the loss go up or down? This is called the gradient, and it\u0026rsquo;s just calculus (derivatives, specifically).\nStep 5: Nudge all the weights a tiny bit in the direction that reduces loss.\nStep 6: Repeat this like 50,000 times.\nEventually, after seeing tons of examples and making tons of tiny adjustments, the weights settle into something that actually works.\nIt\u0026rsquo;s basically like: make a guess, see how bad it is, adjust a little bit, repeat until it stops being terrible. That\u0026rsquo;s machine learning. There\u0026rsquo;s no secret sauce, it\u0026rsquo;s just very persistent trial and error.\nThe \u0026ldquo;gradient\u0026rdquo; part is just asking \u0026ldquo;which way is downhill?\u0026rdquo; because we\u0026rsquo;re trying to walk downhill on this imaginary landscape where height = how wrong you are. Get to the bottom of the valley = minimize your errors = good network.\nBackpropagation (Or: How to Distribute Blame) # Okay so everyone treats backpropagation like it\u0026rsquo;s this super complicated algorithm that only geniuses understand. I\u0026rsquo;m gonna let you in on a secret: it\u0026rsquo;s just the chain rule from calculus. That\u0026rsquo;s it. That\u0026rsquo;s the big mystery.\nHere\u0026rsquo;s the idea: your network made a mistake at the end. The output was wrong. Now you need to figure out which weights screwed up so you can adjust them.\nThe last layer\u0026rsquo;s weights obviously contributed—they directly made that wrong output. But the layer before that also contributed, because it fed into the last layer. And the layer before that contributed to the layer before that. It\u0026rsquo;s turtles all the way down.\nBackpropagation is literally just walking backwards through the network going \u0026ldquo;okay, how much was THIS weight responsible for the mistake?\u0026rdquo; It\u0026rsquo;s distributing blame. That\u0026rsquo;s why it\u0026rsquo;s called backprop—you\u0026rsquo;re propagating the error backward.\nThe math for this is the chain rule. Remember that from calculus? Probably not, but that\u0026rsquo;s fine. The point is, it\u0026rsquo;s not some mysterious black magic. It\u0026rsquo;s just calculus doing accounting to figure out who messed up and by how much.\nEvery time I see someone write a 10-page explanation of backprop I want to scream because it can literally be summed up as \u0026ldquo;use the chain rule to distribute blame backwards.\u0026rdquo; Done.\nBut Like\u0026hellip; Why Does This Work? # Good question. Honestly, the full answer is \u0026ldquo;we don\u0026rsquo;t completely know,\u0026rdquo; but there are some things we DO know.\nFirst: there\u0026rsquo;s this thing called the Universal Approximation Theorem that basically says \u0026ldquo;with enough neurons, you can approximate literally any continuous function.\u0026rdquo; When I first heard this I was like \u0026ldquo;wait, ANY function?\u0026rdquo; Yeah. Any function. That\u0026rsquo;s wild.\nBut (and this is a big but), that doesn\u0026rsquo;t mean a neural network will automatically find the right function. It just means the right function EXISTS somewhere in the space of all possible weight configurations. Actually finding it? That\u0026rsquo;s the hard part.\nThis is why deep learning needs so much data and computing power. We\u0026rsquo;re basically searching through this unbelievably massive space of possible weights, trying to stumble onto a combination that works. More data = more clues about where to search. More compute = we can search faster and try more combinations.\nIt\u0026rsquo;s kind of like saying \u0026ldquo;somewhere in this library of infinite books is the exact book you need.\u0026rdquo; Cool, but you still gotta find it. That\u0026rsquo;s what training is—the search.\nThings People Get Wrong # \u0026ldquo;Neural networks are complete black boxes\u0026rdquo; Eh, sort of? We\u0026rsquo;re getting better at peeking inside. You can visualize what different layers are detecting, you can use attention maps to see what the network is looking at. We might not understand every single weight (there are millions of them), but we can understand the general patterns it learned. It\u0026rsquo;s not completely opaque.\n\u0026ldquo;More layers = better\u0026rdquo; Nope. I fell for this one early on. I kept adding layers thinking my network would get smarter. Sometimes it just got worse. More layers CAN help with really complex problems, but they\u0026rsquo;re also way harder to train and can overfit like crazy. Sometimes a simple shallow network beats a deep one. It depends.\n\u0026ldquo;Neural networks learn like brains\u0026rdquo; Okay this one annoys me. Yeah, the name comes from neurons in brains. But that\u0026rsquo;s where the similarity ends. Your brain doesn\u0026rsquo;t learn by having someone show you a million labeled examples while you do gradient descent. The learning process is completely different. It\u0026rsquo;s like saying a plane flies like a bird—inspired by birds, sure, but actually works nothing like them.\nBuilding Your First Network # If you want to try this yourself, here\u0026rsquo;s a dead-simple neural network in PyTorch:\nimport torch import torch.nn as nn class SimpleNet(nn.Module): def __init__(self): super().__init__() # Two hidden layers with 128 and 64 neurons self.layer1 = nn.Linear(784, 128) # Input: 28x28 image flattened self.layer2 = nn.Linear(128, 64) self.layer3 = nn.Linear(64, 10) # Output: 10 classes (digits 0-9) def forward(self, x): x = torch.relu(self.layer1(x)) # Hidden layer 1 with ReLU x = torch.relu(self.layer2(x)) # Hidden layer 2 with ReLU x = self.layer3(x) # Output layer (no activation) return x This network can classify handwritten digits (MNIST). Not state-of-the-art, but it\u0026rsquo;ll get 95%+ accuracy, which is honestly pretty good for like 20 lines of code.\nThe Point of All This # Neural networks aren\u0026rsquo;t magic. They\u0026rsquo;re honestly kind of boring when you break them down:\nMultiply some numbers Add them up Pass through an activation function Do this a bunch of times in layers Adjust the weights based on how wrong you were Repeat until it stops sucking That\u0026rsquo;s it. The \u0026ldquo;intelligence\u0026rdquo; comes from doing this stupid-simple process millions of times until something useful emerges. It\u0026rsquo;s like how ants are individually dumb but collectively build complex colonies. Except here it\u0026rsquo;s math instead of ants.\nThe reason they seem magical is scale. When you have millions of these simple operations happening together, you get emergent behavior that can recognize faces, translate languages, or beat humans at chess. But zoom in and it\u0026rsquo;s all just multiplication and addition.\nWhere To Go From Here # If you actually understood all this, you\u0026rsquo;re in good shape. Seriously. It took me way longer to get here.\nFrom here you can learn about:\nConvolutional layers (for images) Recurrent layers (for sequences) Attention mechanisms (what powers ChatGPT and friends) Regularization (how to stop your network from cheating) Better optimizers (faster ways to train) But all of these are just fancy variations on what we talked about. The core idea is the same: adjust weights using gradient descent until the thing works.\nThe math will start to matter more as you go deeper. But having the intuition first makes the math actually make sense instead of just being symbols on a page.\nActually Go Build Something # I\u0026rsquo;m serious. Reading this is cool and all, but you won\u0026rsquo;t really GET it until you build a model, watch it fail in weird ways, debug it for three hours, and finally get it working.\nThat\u0026rsquo;s where real understanding comes from. Not from reading tutorials (even good ones), but from the painful process of making it work yourself.\nSo yeah. Go break some stuff. It\u0026rsquo;s fun, I promise.\nRelated Reading # Build Your First ML Model: A No-BS Guide 5 ML Mistakes I Made So You Don\u0026rsquo;t Have To Welcome to Neural Odyssey Questions? Think I\u0026rsquo;m wrong about something? Good, let\u0026rsquo;s argue about it—hit me up.\n","date":"20 November 2025","externalUrl":null,"permalink":"/myblog/posts/neural-networks-intuition/","section":"Writing on Machine Learning, Systems, and Engineering","summary":"\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eAbstract\u003c/strong\u003e\u003cbr\u003e\nThis piece explains neural networks from the intuition upward: what a neuron is, why layers help, how gradient descent changes weights, and why backpropagation is less mystical than it sounds. It is written for readers who want the concepts to feel concrete before diving deeper into the math.\u003c/p\u003e","title":"Neural Networks: Building Intuition Beyond the Math","type":"posts"},{"content":"","date":"20 November 2025","externalUrl":null,"permalink":"/myblog/tags/neural-networks/","section":"Tags","summary":"","title":"Neural-Networks","type":"tags"},{"content":"","date":"20 November 2025","externalUrl":null,"permalink":"/myblog/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"},{"content":"","date":"20 November 2025","externalUrl":null,"permalink":"/myblog/tags/tutorial/","section":"Tags","summary":"","title":"Tutorial","type":"tags"},{"content":"","date":"20 November 2025","externalUrl":null,"permalink":"/myblog/categories/tutorials/","section":"Categories","summary":"","title":"Tutorials","type":"categories"},{"content":" Abstract\nThis post is a checklist of failure modes that quietly ruin ML projects: bad data inspection, leakage, weak evaluation, class imbalance, and irreproducible experiments. The point is not to be dramatic about mistakes, but to make the debugging habits explicit before they cost days of work.\nMachine Learning tutorials are all lies.\nOkay not really, but they make everything look so smooth. Load data, build model, train, boom—95% accuracy! Ship it!\nNobody shows you the part where you get 99% accuracy in training and then deploy it and it\u0026rsquo;s complete garbage. Or when you spend three entire days trying to figure out why your experiments aren\u0026rsquo;t reproducible and you\u0026rsquo;re slowly losing your mind. Or that moment when you realize you\u0026rsquo;ve been cheating the whole time and all your results are meaningless.\nI\u0026rsquo;ve done all of these. Multiple times. Some of these mistakes cost me hours. Some cost me days. One of them cost me a presentation where I had to stand in front of people and explain why my \u0026ldquo;working\u0026rdquo; model was actually completely broken.\nSo here are my greatest hits of ML failure. Learn from my pain.\nMistake #1: Not Checking Your Data First # The mistake: Jumping straight into building your model without actually looking at the data first.\nHow I learned this: So there I was, two days into debugging this image classifier that was performing like absolute trash. I tried everything. Different architectures. Different learning rates. Different optimizers. Batch normalization. Dropout. More layers. Fewer layers. I was grasping at straws.\nFinally, at like 11pm on day two, completely out of ideas, I did something I should\u0026rsquo;ve done on day zero: I actually looked at the images.\nAbout 30% of them were just\u0026hellip; black. Completely black. Empty.\nMy data loading script had been failing silently on certain files and just returning arrays of zeros. And I\u0026rsquo;d been trying to train a model on this garbage for two days. The model was actually doing pretty well considering it was trying to learn patterns from random noise and black squares.\nWhat to actually do:\nLook. At. Your. Data. First.\nBefore you even think about models:\n# Load your data import pandas as pd df = pd.read_csv(\u0026#39;data.csv\u0026#39;) # Actually look at it print(df.head(20)) # Not just 5 rows—check more print(df.describe()) # Statistics for each column print(df.isnull().sum()) # Missing values? # Check distributions import matplotlib.pyplot as plt df.hist(figsize=(12, 10)) plt.show() # For images: plot a batch for i in range(16): plt.subplot(4, 4, i+1) plt.imshow(images[i]) plt.show() I know, I know, you want to jump straight to the fun part. But trust me, spending 10 minutes looking at your data will save you literal days of debugging later.\nStuff to watch out for:\nMissing values where you weren\u0026rsquo;t expecting them Completely insane outliers (people aged 250 years, prices of -$500) Severe class imbalance (10,000 examples of \u0026ldquo;normal\u0026rdquo;, 5 examples of \u0026ldquo;fraud\u0026rdquo;) Corrupted or unparseable files that got loaded as garbage Data that just\u0026hellip; doesn\u0026rsquo;t look right If something looks weird, it probably is. Fix it now, not after three days of failed training runs.\nMistake #2: Data Leakage (The Silent Killer) # The mistake: Including information in your training data that you won\u0026rsquo;t have when you actually need to make predictions.\nMy story: I built a customer churn prediction model. Got 94% accuracy in cross-validation. I was SO proud. Showed it to my team. They were impressed. Deployed it. Disaster.\nIn production it performed at basically random chance. Like flipping a coin. What the hell happened?\nTook me way too long to figure it out, but eventually I found the problem: I\u0026rsquo;d included a feature called \u0026ldquo;days_since_last_login.\u0026rdquo;\nSeems reasonable, right? Except\u0026hellip; for customers who HAD churned, this number was always huge. Because they\u0026rsquo;d stopped using the service. The model learned \u0026ldquo;oh, if days_since_last_login is high, they definitely churned.\u0026rdquo; Which is\u0026hellip; technically correct, but also completely useless.\nWhen you\u0026rsquo;re trying to predict if someone WILL churn, you can\u0026rsquo;t use information that only exists AFTER they churn. That\u0026rsquo;s not a prediction. That\u0026rsquo;s just reading the answer key.\nHow to not do this:\nFor every single feature, ask: \u0026ldquo;Will I actually have this information at prediction time?\u0026rdquo;\nIf the answer is no, or even \u0026ldquo;maybe not,\u0026rdquo; get rid of it.\nWays people accidentally cheat:\nUsing the target (or something that\u0026rsquo;s basically the target) as a feature Using information from the future Using stuff that only exists after the thing you\u0026rsquo;re trying to predict This is insidious because your model will look amazing in training and then completely fall apart in production. Your metrics will all look great right up until they don\u0026rsquo;t.\n# WRONG: Using information from test set scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # This looks at ALL the data! # Then you split into train/test... but you already cheated # CORRECT: Only learn from training data X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # Just transform, don\u0026#39;t refit The test set is supposed to simulate future data you haven\u0026rsquo;t seen. Any operation that looks at the test set is cheating.\nMistake #3: Not Having a Proper Train/Val/Test Split # The mistake: Not having proper data splits, or (worse) checking your test set over and over.\nWhat I used to do: Train model. Check test set. \u0026ldquo;Hmm, only 82%, not good enough.\u0026rdquo; Tweak something. Check test set again. \u0026ldquo;83%, better but not quite\u0026hellip;\u0026rdquo; Tweak more. Check again. Again. Again.\nSeemed fine at the time.\nBut here\u0026rsquo;s the problem: every time you look at the test set and make a decision based on it, you\u0026rsquo;re kind of\u0026hellip; training on it. Not directly, but you\u0026rsquo;re using that information to guide what you do next.\nAfter checking the test set 30 times and making changes based on what you see, your model isn\u0026rsquo;t evaluated on truly unseen data anymore. It\u0026rsquo;s optimized for that specific test set. And then you deploy it and find out your \u0026ldquo;95% accuracy\u0026rdquo; was a lie.\nWhat to do instead:\nYou need three sets:\nfrom sklearn.model_selection import train_test_split # First split: separate out test set X_temp, X_test, y_temp, y_test = train_test_split( X, y, test_size=0.15, random_state=42 ) # Second split: create train and validation sets X_train, X_val, y_train, y_val = train_test_split( X_temp, y_temp, test_size=0.176, random_state=42 # ~15% of total ) # Now you have: 70% train, 15% validation, 15% test The actual rules:\nTraining set: Train your model on this Validation set: Check this as much as you want while developing Test set: Look at this ONCE at the very end. Then never again. Think of it like school. Training set = your textbook. Validation set = practice problems. Test set = the actual final exam.\nYou don\u0026rsquo;t study from the final exam. That\u0026rsquo;s cheating. Same idea here. The test set is supposed to tell you how well you\u0026rsquo;ll do on real data you\u0026rsquo;ve never seen. If you\u0026rsquo;ve already seen it 50 times, it\u0026rsquo;s not really \u0026ldquo;unseen\u0026rdquo; anymore.\nMistake #4: Ignoring Class Imbalance # The mistake: Training on imbalanced classes and celebrating your 99% accuracy.\nStory time: Built a fraud detection model. Got 99.5% accuracy! I was pumped. This was going to save the company so much money. Deployed it on Friday afternoon feeling like a rockstar.\nMonday morning: \u0026ldquo;Hey, uh, your model isn\u0026rsquo;t detecting any fraud. Like, at all. It just says everything is fine.\u0026rdquo;\nShit.\nWent back and checked. The model had learned a very clever strategy: just predict \u0026ldquo;not fraud\u0026rdquo; for everything. Since only 0.3% of transactions were actually fraudulent, this lazy approach was correct 99.7% of the time.\nThe model basically went \u0026ldquo;detecting fraud is hard and I\u0026rsquo;m rarely right when I try, but if I just say \u0026rsquo;looks good!\u0026rsquo; every time, I\u0026rsquo;ll be correct 99% of the time and everyone will think I\u0026rsquo;m doing great.\u0026rdquo;\nIt technically had high accuracy. It was also completely useless.\nWhat to actually do:\nWhen your classes are imbalanced (which they almost always are in real problems), accuracy is a trap. Use literally anything else:\nfrom sklearn.metrics import classification_report, confusion_matrix # Get predictions y_pred = model.predict(X_test) # Much better than just accuracy print(classification_report(y_test, y_pred)) # See exactly where you\u0026#39;re failing print(confusion_matrix(y_test, y_pred)) Strategies for imbalanced data:\nResampling: Oversample the minority class or undersample the majority Class weights: Tell your model to care more about minority class errors Different metrics: Use F1-score, precision, recall, or AUC-ROC Ensemble methods: Random forests and XGBoost handle imbalance better # Example: Using class weights in PyTorch from torch import nn import torch # If class 0 has 1000 examples and class 1 has 100 pos_weight = torch.tensor([1000.0 / 100.0]) criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight) Mistake #5: Not Setting Random Seeds # The mistake: Not setting random seeds and then going crazy trying to figure out why your results keep changing.\nMy descent into madness: Got amazing results in my notebook one day. 94% accuracy! Wrote it all up in my report with nice tables and everything.\nNext day, tried to verify the results before sharing. Ran the same notebook. 89% accuracy. Wait, what?\nRan it again. 91%.\nAgain. 87%.\nSpent the next hour convinced I\u0026rsquo;d somehow introduced a bug. Checked git diffs. Nothing changed. Restarted kernel. Cleared outputs. Prayed to various deities.\nFinally realized: I just\u0026hellip; hadn\u0026rsquo;t set any random seeds. The data split was random. The weight initialization was random. Every single run was completely different.\nI\u0026rsquo;d just gotten lucky on that first run and written a whole report about it like those were the real numbers. Whoops.\nFix it:\nSet your damn random seeds. Not just for other people. For your own sanity.\nimport random import numpy as np import torch def set_seed(seed=42): random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) torch.cuda.manual_seed_all(seed) # For full reproducibility (slower but deterministic) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False # Call this at the start of your script set_seed(42) Also set seeds in specific operations:\n# In sklearn train_test_split(X, y, random_state=42) # In PyTorch DataLoader DataLoader(dataset, shuffle=True, generator=torch.Generator().manual_seed(42)) Yeah, it\u0026rsquo;s a few extra lines. But the alternative is spending hours debugging problems that aren\u0026rsquo;t actually problems, they\u0026rsquo;re just randomness. Not worth it.\nBonus Round: Overfitting (And Not Even Noticing) # This one is so common I have to mention it.\nTraining accuracy: 99% Validation accuracy: 85%\nYou: \u0026ldquo;Pretty good!\u0026rdquo;\nMe: \u0026ldquo;That\u0026rsquo;s overfitting, my friend.\u0026rdquo;\nYour model didn\u0026rsquo;t learn patterns. It memorized the training data. It\u0026rsquo;s like a student who memorized all the practice problems but doesn\u0026rsquo;t actually understand the concepts. Works great on homework, bombs the test.\nSigns of overfitting:\nTraining accuracy much higher than validation accuracy Training loss keeps decreasing but validation loss starts increasing Model performs great on training data, terrible on anything else Quick fixes:\nAdd dropout layers Use L2 regularization (weight decay) Collect more data Use data augmentation Early stopping (stop training when validation loss stops improving) # Example: Adding dropout in PyTorch class BetterNet(nn.Module): def __init__(self): super().__init__() self.layer1 = nn.Linear(784, 128) self.dropout1 = nn.Dropout(0.3) # Drop 30% of neurons self.layer2 = nn.Linear(128, 64) self.dropout2 = nn.Dropout(0.3) self.layer3 = nn.Linear(64, 10) def forward(self, x): x = torch.relu(self.layer1(x)) x = self.dropout1(x) x = torch.relu(self.layer2(x)) x = self.dropout2(x) x = self.layer3(x) return x The Pattern # Here\u0026rsquo;s what all these mistakes have in common: they\u0026rsquo;re silent.\nYour code runs fine. Your model trains. You get numbers. The numbers look good. Everything seems great.\nAnd then you deploy it and it\u0026rsquo;s a disaster. Or you try to reproduce it and can\u0026rsquo;t. Or you present to stakeholders and someone asks one question that makes your entire analysis fall apart.\nThe scary part is that you can make all of these mistakes and not know until it\u0026rsquo;s too late. That\u0026rsquo;s what makes them so dangerous.\nThe solution isn\u0026rsquo;t \u0026ldquo;be more careful\u0026rdquo; or \u0026ldquo;be smarter.\u0026rdquo; It\u0026rsquo;s to build these checks into your workflow automatically:\nLook at your data first, always Set random seeds by default Use train/val/test splits correctly Check multiple metrics Question your features These aren\u0026rsquo;t best practices. They\u0026rsquo;re the bare minimum for not shooting yourself in the foot.\nWhat I Do Now (That Actually Works) # My current workflow has all these checks baked in:\nLoad data Actually look at the data (histograms, sample images, whatever) Set random seed at the top of the file Split into train/val/test properly Check class distribution Pick metrics that make sense (not just accuracy) Train a simple baseline Check for data leakage Monitor train vs validation metrics Only at the very end: check test set ONCE It feels like more work upfront. It\u0026rsquo;s not. It\u0026rsquo;s way less work than debugging for three days because you skipped step 2 and didn\u0026rsquo;t notice your data was garbage.\nPlus, you know, your models actually work. Which is kind of the point.\nRelated Reading # Build Your First ML Model: A No-BS Guide Neural Networks: Building Intuition Beyond the Math Welcome to Neural Odyssey Got your own horror stories? I\u0026rsquo;d love to hear them. Bonus points if they\u0026rsquo;re worse than mine.\n","date":"19 November 2025","externalUrl":null,"permalink":"/myblog/posts/ml-mistakes-to-avoid/","section":"Writing on Machine Learning, Systems, and Engineering","summary":"\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eAbstract\u003c/strong\u003e\u003cbr\u003e\nThis post is a checklist of failure modes that quietly ruin ML projects: bad data inspection, leakage, weak evaluation, class imbalance, and irreproducible experiments. The point is not to be dramatic about mistakes, but to make the debugging habits explicit before they cost days of work.\u003c/p\u003e","title":"5 ML Mistakes I Made So You Don't Have To","type":"posts"},{"content":"","date":"19 November 2025","externalUrl":null,"permalink":"/myblog/tags/best-practices/","section":"Tags","summary":"","title":"Best-Practices","type":"tags"},{"content":"","date":"19 November 2025","externalUrl":null,"permalink":"/myblog/tags/debugging/","section":"Tags","summary":"","title":"Debugging","type":"tags"},{"content":"","date":"19 November 2025","externalUrl":null,"permalink":"/myblog/tags/lessons-learned/","section":"Tags","summary":"","title":"Lessons-Learned","type":"tags"},{"content":"","date":"18 November 2025","externalUrl":null,"permalink":"/myblog/tags/beginner-friendly/","section":"Tags","summary":"","title":"Beginner-Friendly","type":"tags"},{"content":" Abstract\nThis post walks through a first end-to-end ML workflow using a real image classification task. The goal is not just to train a model, but to build the habits that matter in practice: checking data, splitting correctly, choosing a simple baseline, and evaluating results without fooling yourself.\nMost ML tutorials start with MNIST (handwritten digits) or some perfectly cleaned dataset. That\u0026rsquo;s fine for learning, but it doesn\u0026rsquo;t teach you what building a real model actually feels like. Real projects have messy data, unclear requirements, and a dozen decisions you have to make without clear \u0026ldquo;right\u0026rdquo; answers.\nSo let\u0026rsquo;s build something real: a model that can classify images of different types of vehicles. We\u0026rsquo;ll use a subset of real data, deal with actual problems that come up, and make real decisions along the way.\nBy the end, you\u0026rsquo;ll have a working model and—more importantly—you\u0026rsquo;ll understand the process of building one.\nWhat You\u0026rsquo;ll Need # Before we start, make sure you have:\npip install torch torchvision numpy matplotlib pillow scikit-learn And a basic understanding of Python. You don\u0026rsquo;t need to be an expert, but you should be comfortable with functions, loops, and basic concepts.\nStep 1: Get Some Data # We\u0026rsquo;ll use the CIFAR-10 dataset, which has 60,000 small images across 10 categories: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.\nWhy CIFAR-10? Because it\u0026rsquo;s:\nReal images (not synthetic) Small enough to train quickly (even without a GPU) Complex enough to be interesting Already cleaned (we\u0026rsquo;ll mess it up ourselves for realism) import torch import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import numpy as np # Download the data transform = transforms.Compose([ transforms.ToTensor(), # Convert images to tensors transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # Normalize ]) trainset = torchvision.datasets.CIFAR10( root=\u0026#39;./data\u0026#39;, train=True, download=True, transform=transform ) testset = torchvision.datasets.CIFAR10( root=\u0026#39;./data\u0026#39;, train=False, download=True, transform=transform ) First decision: Why normalize the images? Because neural networks train better when inputs are roughly centered around zero. The (0.5, 0.5, 0.5) values are the mean and standard deviation for each color channel. This transformation maps pixel values from [0, 1] to approximately [-1, 1].\nStep 2: Actually Look at Your Data # This seems obvious, but I can\u0026rsquo;t count how many times I\u0026rsquo;ve seen people skip this step.\nclasses = (\u0026#39;plane\u0026#39;, \u0026#39;car\u0026#39;, \u0026#39;bird\u0026#39;, \u0026#39;cat\u0026#39;, \u0026#39;deer\u0026#39;, \u0026#39;dog\u0026#39;, \u0026#39;frog\u0026#39;, \u0026#39;horse\u0026#39;, \u0026#39;ship\u0026#39;, \u0026#39;truck\u0026#39;) # Function to show an image def imshow(img): img = img / 2 + 0.5 # Unnormalize npimg = img.numpy() plt.imshow(np.transpose(npimg, (1, 2, 0))) plt.show() # Get some random training images dataloader = torch.utils.data.DataLoader( trainset, batch_size=16, shuffle=True ) dataiter = iter(dataloader) images, labels = next(dataiter) # Show images imshow(torchvision.utils.make_grid(images)) print(\u0026#39;Labels:\u0026#39;, \u0026#39; \u0026#39;.join(f\u0026#39;{classes[labels[j]]}\u0026#39; for j in range(16))) Look at those images. Are they what you expected? Are any corrupted? This is your sanity check.\nStep 3: Split Your Data Properly # CIFAR-10 comes pre-split into train and test sets, but we need a validation set too.\nfrom torch.utils.data import random_split # Split training data into train and validation train_size = int(0.85 * len(trainset)) val_size = len(trainset) - train_size train_dataset, val_dataset = random_split( trainset, [train_size, val_size], generator=torch.Generator().manual_seed(42) # Reproducibility! ) # Create data loaders trainloader = torch.utils.data.DataLoader( train_dataset, batch_size=64, shuffle=True, num_workers=2 ) valloader = torch.utils.data.DataLoader( val_dataset, batch_size=64, shuffle=False, num_workers=2 ) testloader = torch.utils.data.DataLoader( testset, batch_size=64, shuffle=False, num_workers=2 ) print(f\u0026#39;Training samples: {len(train_dataset)}\u0026#39;) print(f\u0026#39;Validation samples: {len(val_dataset)}\u0026#39;) print(f\u0026#39;Test samples: {len(testset)}\u0026#39;) Second decision: Why batch_size=64? It\u0026rsquo;s a balance. Larger batches train faster but use more memory. Smaller batches can sometimes generalize better. 64 is a reasonable default.\nStep 4: Define Your Model # Let\u0026rsquo;s build a simple CNN (Convolutional Neural Network). CNNs are the standard for image tasks because they\u0026rsquo;re designed to recognize spatial patterns.\nimport torch.nn as nn import torch.nn.functional as F class SimpleCNN(nn.Module): def __init__(self): super(SimpleCNN, self).__init__() # Convolutional layers self.conv1 = nn.Conv2d(3, 32, 3, padding=1) # 3 input channels (RGB), 32 output self.conv2 = nn.Conv2d(32, 64, 3, padding=1) self.conv3 = nn.Conv2d(64, 128, 3, padding=1) # Pooling layer self.pool = nn.MaxPool2d(2, 2) # Fully connected layers self.fc1 = nn.Linear(128 * 4 * 4, 512) self.fc2 = nn.Linear(512, 10) # 10 classes # Dropout for regularization self.dropout = nn.Dropout(0.3) def forward(self, x): # Conv block 1 x = self.pool(F.relu(self.conv1(x))) # 32x32 -\u0026gt; 16x16 # Conv block 2 x = self.pool(F.relu(self.conv2(x))) # 16x16 -\u0026gt; 8x8 # Conv block 3 x = self.pool(F.relu(self.conv3(x))) # 8x8 -\u0026gt; 4x4 # Flatten x = x.view(-1, 128 * 4 * 4) # Fully connected layers x = self.dropout(F.relu(self.fc1(x))) x = self.fc2(x) return x model = SimpleCNN() print(model) This architecture is simple but effective:\nConv layers extract features (edges, shapes, patterns) Pooling reduces spatial dimensions Dropout prevents overfitting Fully connected layers make the final classification Third decision: Why 3 conv layers? Honestly? It\u0026rsquo;s a starting point. You might need more for complex tasks or fewer for simple ones. This is where experimentation comes in.\nStep 5: Training Setup # Now we need to define how the model will learn.\nimport torch.optim as optim # Move model to GPU if available device = torch.device(\u0026#34;cuda:0\u0026#34; if torch.cuda.is_available() else \u0026#34;cpu\u0026#34;) model = model.to(device) print(f\u0026#39;Training on: {device}\u0026#39;) # Loss function and optimizer criterion = nn.CrossEntropyLoss() # Standard for classification optimizer = optim.Adam(model.parameters(), lr=0.001) # Adam is a good default # Learning rate scheduler (optional but helpful) scheduler = optim.lr_scheduler.ReduceLROnPlateau( optimizer, mode=\u0026#39;min\u0026#39;, factor=0.5, patience=3, verbose=True ) Fourth decision: Why Adam optimizer? It\u0026rsquo;s generally more forgiving than SGD and requires less hyperparameter tuning. The learning rate of 0.001 is a standard starting point.\nStep 6: The Training Loop # Here\u0026rsquo;s where the actual learning happens:\ndef train_model(model, trainloader, valloader, epochs=20): best_val_loss = float(\u0026#39;inf\u0026#39;) train_losses = [] val_losses = [] for epoch in range(epochs): # Training phase model.train() running_loss = 0.0 for i, (inputs, labels) in enumerate(trainloader): inputs, labels = inputs.to(device), labels.to(device) # Zero the gradients optimizer.zero_grad() # Forward pass outputs = model(inputs) loss = criterion(outputs, labels) # Backward pass loss.backward() optimizer.step() running_loss += loss.item() # Print progress every 100 batches if i % 100 == 99: print(f\u0026#39;[Epoch {epoch+1}, Batch {i+1}] loss: {running_loss/100:.3f}\u0026#39;) running_loss = 0.0 # Validation phase model.eval() val_loss = 0.0 correct = 0 total = 0 with torch.no_grad(): for inputs, labels in valloader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) loss = criterion(outputs, labels) val_loss += loss.item() _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() val_loss = val_loss / len(valloader) val_acc = 100 * correct / total print(f\u0026#39;Epoch {epoch+1}: Val Loss: {val_loss:.3f}, Val Acc: {val_acc:.2f}%\u0026#39;) # Save best model if val_loss \u0026lt; best_val_loss: best_val_loss = val_loss torch.save(model.state_dict(), \u0026#39;best_model.pth\u0026#39;) print(\u0026#39;Saved new best model!\u0026#39;) # Adjust learning rate scheduler.step(val_loss) train_losses.append(running_loss) val_losses.append(val_loss) return train_losses, val_losses # Train the model train_losses, val_losses = train_model(model, trainloader, valloader, epochs=20) This is a lot of code, but here\u0026rsquo;s what it does:\nLoop through epochs For each epoch, loop through training batches Make predictions, calculate loss, update weights After each epoch, check validation performance Save the model if it\u0026rsquo;s the best so far Step 7: Evaluate on Test Set # Only after training is completely done do we check the test set:\n# Load the best model model.load_state_dict(torch.load(\u0026#39;best_model.pth\u0026#39;)) model.eval() correct = 0 total = 0 class_correct = [0] * 10 class_total = [0] * 10 with torch.no_grad(): for inputs, labels in testloader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) _, predicted = torch.max(outputs, 1) total += labels.size(0) correct += (predicted == labels).sum().item() # Per-class accuracy for i in range(len(labels)): label = labels[i] class_correct[label] += (predicted[i] == label).item() class_total[label] += 1 print(f\u0026#39;Overall Test Accuracy: {100 * correct / total:.2f}%\u0026#39;) print(\u0026#39;\\nPer-class accuracy:\u0026#39;) for i in range(10): acc = 100 * class_correct[i] / class_total[i] print(f\u0026#39;{classes[i]}: {acc:.2f}%\u0026#39;) If you\u0026rsquo;ve followed along, you should get around 70-75% accuracy. That\u0026rsquo;s pretty good for a simple model!\nStep 8: See Where It Fails # Let\u0026rsquo;s look at some mistakes:\ndef show_predictions(model, dataloader, num_images=8): model.eval() images, labels = next(iter(dataloader)) images, labels = images.to(device), labels.to(device) outputs = model(images) _, predicted = torch.max(outputs, 1) # Find some mistakes mistakes = (predicted != labels).nonzero(as_tuple=True)[0] if len(mistakes) \u0026gt; 0: plt.figure(figsize=(15, 3)) for idx, i in enumerate(mistakes[:num_images]): plt.subplot(1, num_images, idx+1) img = images[i].cpu() img = img / 2 + 0.5 plt.imshow(np.transpose(img, (1, 2, 0))) plt.title(f\u0026#39;True: {classes[labels[i]]}\\nPred: {classes[predicted[i]]}\u0026#39;) plt.axis(\u0026#39;off\u0026#39;) plt.show() show_predictions(model, testloader) This shows you where the model is confused. Often you\u0026rsquo;ll find it confuses cats and dogs, or trucks and cars—mistakes that make sense.\nWhat\u0026rsquo;s Next? # You\u0026rsquo;ve just built a working image classifier. It\u0026rsquo;s not state-of-the-art, but it\u0026rsquo;s real.\nWays to improve it:\nAdd more conv layers Use data augmentation (flip, rotate images) Try transfer learning with pretrained models Adjust hyperparameters (learning rate, batch size) Train for more epochs More importantly, you now understand the workflow:\nGet data Split it properly Build a model Train while monitoring validation performance Evaluate on test set once This process applies whether you\u0026rsquo;re classifying images, predicting stock prices, or detecting fraud. The specifics change, but the workflow stays the same.\nThe Real Lesson # Building ML models isn\u0026rsquo;t about knowing the perfect architecture or the best hyperparameters. It\u0026rsquo;s about:\nUnderstanding what you\u0026rsquo;re trying to do Checking your assumptions Iterating on what works Debugging when things fail You\u0026rsquo;ll make mistakes. Your first model will probably underperform. You\u0026rsquo;ll spend hours debugging stupid errors. That\u0026rsquo;s normal. That\u0026rsquo;s the process.\nThe difference between someone who\u0026rsquo;s good at ML and someone who isn\u0026rsquo;t isn\u0026rsquo;t intelligence or math skills. It\u0026rsquo;s the willingness to iterate, debug, and learn from what doesn\u0026rsquo;t work.\nNow go build something. Start simple, make sure it works, then make it better.\nRelated Reading # 5 ML Mistakes I Made So You Don\u0026rsquo;t Have To Neural Networks: Building Intuition Beyond the Math Welcome to Neural Odyssey Code not working? Find a bug? Let me know—I\u0026rsquo;m always happy to help debug.\n","date":"18 November 2025","externalUrl":null,"permalink":"/myblog/posts/build-your-first-ml-model/","section":"Writing on Machine Learning, Systems, and Engineering","summary":"\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eAbstract\u003c/strong\u003e\u003cbr\u003e\nThis post walks through a first end-to-end ML workflow using a real image classification task. The goal is not just to train a model, but to build the habits that matter in practice: checking data, splitting correctly, choosing a simple baseline, and evaluating results without fooling yourself.\u003c/p\u003e","title":"Build Your First ML Model: A No-BS Guide","type":"posts"},{"content":"","date":"18 November 2025","externalUrl":null,"permalink":"/myblog/tags/hands-on/","section":"Tags","summary":"","title":"Hands-On","type":"tags"},{"content":"","date":"18 November 2025","externalUrl":null,"permalink":"/myblog/tags/pytorch/","section":"Tags","summary":"","title":"Pytorch","type":"tags"},{"content":"Neural Odyssey is my technical blog about machine learning, GPU computing, and applied AI engineering.\nI use it to share tutorials, implementation notes, and lessons from building real systems, with an emphasis on clear explanations and practical work.\n","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/","section":"","summary":"\u003cp\u003eNeural Odyssey is my technical blog about machine learning, GPU computing, and applied AI engineering.\u003c/p\u003e\n\u003cp\u003eI use it to share tutorials, implementation notes, and lessons from building real systems, with an emphasis on clear explanations and practical work.\u003c/p\u003e","title":"","type":"page"},{"content":" Hey, I\u0026rsquo;m Danial Jafarzadeh # I work at the intersection of machine learning and systems. Most of my time goes into understanding how models behave, how training pipelines fail, and how to make performance-critical code less wasteful.\nMy Journey # My path into software started with curiosity, but what kept me here was the engineering reality underneath the abstractions. I like the part where ideas meet constraints: memory limits, latency budgets, numerical stability, and code that needs to survive contact with production.\nWhat Drives Me # I am particularly interested in:\nMachine learning systems: training loops, evaluation, reproducibility, and failure modes CUDA and performance engineering: kernels, memory movement, and practical optimization Model intuition: building mental models for why methods work, not just how to call them Technical writing: turning debugging experience into material other engineers can use Why This Blog? # This blog exists for a few reasons:\nThink clearly in public: writing forces me to close gaps in my own understanding. Document hard-won lessons: especially the ones that only show up after things break. Build a durable body of work: practical posts are more useful than generic hot takes. Share implementation detail: the real work is usually in the edge cases and tradeoffs. What You Will Find Here # ML tutorials that explain both the mechanics and the tradeoffs Notes on debugging model training and data pipelines Systems-focused posts on performance, tooling, and implementation choices Occasional project writeups when something is worth documenting properly Contact # If you want to talk about ML engineering, CUDA, or anything I have written here:\nEmail: Danialj999@gmail.com Check out my projects to see what I\u0026rsquo;ve been building Browse the blog for tutorials and insights Thanks for reading.\n","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/about/","section":"","summary":"\u003ch2 class=\"relative group\"\u003eHey, I\u0026rsquo;m Danial Jafarzadeh \n    \u003cdiv id=\"hey-im-danial-jafarzadeh\" class=\"anchor\"\u003e\u003c/div\u003e\n    \n    \u003cspan\n        class=\"absolute top-0 w-6 transition-opacity opacity-0 ltr:-left-6 rtl:-right-6 not-prose group-hover:opacity-100\"\u003e\n        \u003ca class=\"group-hover:text-primary-300 dark:group-hover:text-neutral-700 !no-underline\" href=\"#hey-im-danial-jafarzadeh\" aria-label=\"Anchor\"\u003e#\u003c/a\u003e\n    \u003c/span\u003e        \n    \n\u003c/h2\u003e\n\u003cp\u003eI work at the intersection of machine learning and systems. Most of my time goes into understanding how models behave, how training pipelines fail, and how to make performance-critical code less wasteful.\u003c/p\u003e","title":"About Danial Jafarzadeh | ML and Systems Engineering","type":"page"},{"content":"","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/categories/announcements/","section":"Categories","summary":"","title":"Announcements","type":"categories"},{"content":"","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/tags/blog-launch/","section":"Tags","summary":"","title":"Blog Launch","type":"tags"},{"content":" Resume # I am a software engineer focused on machine learning systems, performance-sensitive implementation, and technical writing. My work sits at the point where models, tooling, and engineering constraints meet.\nProfile # I am most interested in work that requires both technical depth and engineering discipline: understanding model behavior, debugging failure modes, improving runtime performance, and explaining complex systems clearly. I tend to work from first principles, prioritize reproducibility, and prefer implementation that is measurable rather than performative.\nAreas of Focus # Machine learning systems: model training, evaluation, debugging, and reproducible experimentation Performance-oriented engineering: practical optimization, profiling, and reasoning about bottlenecks CUDA and low-level ML infrastructure: learning the systems side of modern AI workloads Technical communication: writing tutorials and implementation notes that stay concrete Technical Skills # Languages: Python, C++, CUDA, SQL, Bash ML and Data: PyTorch, NumPy, scikit-learn, experiment design, model evaluation Systems and Tooling: Git, Linux, profiling, debugging, reproducible workflows Development Practices: testing, documentation, benchmarking, code review, performance analysis Current Work # I am currently building a body of work through technical writing, hands-on ML implementation, and systems-focused experiments. This site serves as a public record of that work: tutorials, engineering notes, and project writeups that show how I think through problems and how I implement solutions.\nRepresentative strengths include:\nturning broad ideas into runnable experiments and clear implementation steps debugging training instability, data problems, and evaluation mistakes writing technical explanations that connect intuition to code working across high-level ML workflows and lower-level performance concerns Selected Strengths # Clear written communication for technical audiences Strong bias toward measurement, reproducibility, and debugging Comfort working across modeling, tooling, and systems details Consistent focus on practical engineering rather than surface-level demos Education # My strongest training so far has been project-driven: building, measuring, and documenting systems until the theory connects to the implementation. I prefer to present only concrete, verifiable credentials here rather than filler.\nFor a detailed resume or to discuss opportunities, please contact me.\n","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/resume/","section":"","summary":"\u003ch2 class=\"relative group\"\u003eResume \n    \u003cdiv id=\"resume\" class=\"anchor\"\u003e\u003c/div\u003e\n    \n    \u003cspan\n        class=\"absolute top-0 w-6 transition-opacity opacity-0 ltr:-left-6 rtl:-right-6 not-prose group-hover:opacity-100\"\u003e\n        \u003ca class=\"group-hover:text-primary-300 dark:group-hover:text-neutral-700 !no-underline\" href=\"#resume\" aria-label=\"Anchor\"\u003e#\u003c/a\u003e\n    \u003c/span\u003e        \n    \n\u003c/h2\u003e\n\u003cp\u003eI am a software engineer focused on machine learning systems, performance-sensitive implementation, and technical writing. My work sits at the point where models, tooling, and engineering constraints meet.\u003c/p\u003e","title":"Danial Jafarzadeh | Resume","type":"page"},{"content":"","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/categories/general/","section":"Categories","summary":"","title":"General","type":"categories"},{"content":"","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/tags/intro/","section":"Tags","summary":"","title":"Intro","type":"tags"},{"content":" Projects # This page will collect the projects worth documenting in depth: model-building work, systems experiments, and implementation-heavy side projects.\nWhat Will Show Up Here # ML projects: training experiments, evaluation pipelines, and model-focused tooling Systems work: performance experiments, low-level debugging, and infrastructure notes Technical writeups: project breakdowns that explain the engineering choices behind the result Open source work: contributions that are interesting enough to unpack I am keeping this page intentionally small until each project has enough substance to be useful on its own.\nIf a project here overlaps with your work or you want to compare approaches, reach out by email.\n","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/projects/","section":"","summary":"\u003ch2 class=\"relative group\"\u003eProjects \n    \u003cdiv id=\"projects\" class=\"anchor\"\u003e\u003c/div\u003e\n    \n    \u003cspan\n        class=\"absolute top-0 w-6 transition-opacity opacity-0 ltr:-left-6 rtl:-right-6 not-prose group-hover:opacity-100\"\u003e\n        \u003ca class=\"group-hover:text-primary-300 dark:group-hover:text-neutral-700 !no-underline\" href=\"#projects\" aria-label=\"Anchor\"\u003e#\u003c/a\u003e\n    \u003c/span\u003e        \n    \n\u003c/h2\u003e\n\u003cp\u003eThis page will collect the projects worth documenting in depth: model-building work, systems experiments, and implementation-heavy side projects.\u003c/p\u003e\n\n\u003ch2 class=\"relative group\"\u003eWhat Will Show Up Here \n    \u003cdiv id=\"what-will-show-up-here\" class=\"anchor\"\u003e\u003c/div\u003e\n    \n    \u003cspan\n        class=\"absolute top-0 w-6 transition-opacity opacity-0 ltr:-left-6 rtl:-right-6 not-prose group-hover:opacity-100\"\u003e\n        \u003ca class=\"group-hover:text-primary-300 dark:group-hover:text-neutral-700 !no-underline\" href=\"#what-will-show-up-here\" aria-label=\"Anchor\"\u003e#\u003c/a\u003e\n    \u003c/span\u003e        \n    \n\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eML projects\u003c/strong\u003e: training experiments, evaluation pipelines, and model-focused tooling\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eSystems work\u003c/strong\u003e: performance experiments, low-level debugging, and infrastructure notes\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eTechnical writeups\u003c/strong\u003e: project breakdowns that explain the engineering choices behind the result\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eOpen source work\u003c/strong\u003e: contributions that are interesting enough to unpack\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eI am keeping this page intentionally small until each project has enough substance to be useful on its own.\u003c/p\u003e","title":"Projects by Danial Jafarzadeh | ML and Systems Work","type":"page"},{"content":"","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/tags/welcome/","section":"Tags","summary":"","title":"Welcome","type":"tags"},{"content":" Abstract\nThis opening post explains what Neural Odyssey is for: practical writing about machine learning, systems work, debugging, and the messy parts of learning in public. It sets the tone for the blog and the kind of posts that will be worth following.\nSo here we are. After months of thinking \u0026ldquo;I should start a blog\u0026rdquo; and never actually doing it, I finally pulled the trigger. Welcome to Neural Odyssey—my attempt to document what I\u0026rsquo;m learning, share what I\u0026rsquo;ve figured out, and hopefully help someone avoid the mistakes I\u0026rsquo;ve made.\nWhy Another Tech Blog? # Fair question. The internet doesn\u0026rsquo;t exactly lack programming blogs. But here\u0026rsquo;s the thing: most technical content falls into one of two categories.\nThere\u0026rsquo;s the super polished, everything-works-perfectly tutorial that makes you feel dumb when your code doesn\u0026rsquo;t run exactly like theirs. And then there\u0026rsquo;s the overly basic \u0026ldquo;here\u0026rsquo;s what a variable is\u0026rdquo; content that doesn\u0026rsquo;t really help once you\u0026rsquo;re past the absolute beginner stage.\nI want to find the middle ground. The stuff I write here will be:\nActually useful - I\u0026rsquo;m going to focus on the problems I\u0026rsquo;ve genuinely encountered and solutions that actually worked. Not toy examples, not contrived scenarios, but real challenges I\u0026rsquo;ve faced while building things.\nTechnically solid but human - I\u0026rsquo;ll explain the concepts properly, but I\u0026rsquo;ll also tell you when I spent three hours debugging something stupid, or when I still don\u0026rsquo;t fully understand why something works. Because that\u0026rsquo;s reality.\nHonest about failure - Most tutorials only show you the success path. But you learn more from what breaks than what works. So I\u0026rsquo;ll talk about both.\nWhat You\u0026rsquo;ll Find Here # I\u0026rsquo;m primarily focused on machine learning and AI right now, but my interests wander. You\u0026rsquo;ll probably see posts about:\nDeep learning tutorials - Neural networks, transformers, training strategies, and the inevitable debugging stories Practical ML - Data preprocessing, model evaluation, avoiding common pitfalls, and making models that actually work in production Projects and experiments - Things I\u0026rsquo;m building, trying out, or learning from Tools and workflows - The boring-but-essential stuff that makes development smoother I\u0026rsquo;m not going to pretend I\u0026rsquo;m an expert at any of this. I\u0026rsquo;m learning as I go, making mistakes, figuring things out. If that sounds useful to you, stick around.\nA Quick Promise # I won\u0026rsquo;t waste your time. Every post will either teach you something specific, show you how to build something concrete, or save you from a mistake I\u0026rsquo;ve already made. No filler, no fluff, no \u0026ldquo;10 ways to be a better developer\u0026rdquo; listicles.\nIf I write something that isn\u0026rsquo;t useful, please call me out. The goal is to create content that\u0026rsquo;s actually worth reading, not just to hit a publishing schedule.\nLet\u0026rsquo;s Do This # I\u0026rsquo;ve already got a few posts lined up:\nUnderstanding neural networks without drowning in math Common ML mistakes and how to avoid them Building your first real ML model (not MNIST) If there\u0026rsquo;s something specific you want me to cover, or if you have questions about anything I write, reach out. I\u0026rsquo;m here to learn as much as I\u0026rsquo;m here to teach.\nThanks for being here at the start of this. Let\u0026rsquo;s see where this goes.\nRelated Reading # Neural Networks: Building Intuition Beyond the Math 5 ML Mistakes I Made So You Don\u0026rsquo;t Have To Build Your First ML Model: A No-BS Guide — Danial\n","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/posts/welcome/","section":"Writing on Machine Learning, Systems, and Engineering","summary":"\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eAbstract\u003c/strong\u003e\u003cbr\u003e\nThis opening post explains what Neural Odyssey is for: practical writing about machine learning, systems work, debugging, and the messy parts of learning in public. It sets the tone for the blog and the kind of posts that will be worth following.\u003c/p\u003e","title":"Welcome to Neural Odyssey","type":"posts"},{"content":"This page collects the blog\u0026rsquo;s tutorials, implementation notes, and debugging lessons on machine learning, systems work, and engineering tradeoffs.\nIf you are new here, start with one of the guided paths below. If you already know what you want, use the archive to browse everything by date.\n","date":"8 August 2025","externalUrl":null,"permalink":"/myblog/posts/","section":"Writing on Machine Learning, Systems, and Engineering","summary":"\u003cp\u003eThis page collects the blog\u0026rsquo;s tutorials, implementation notes, and debugging lessons on machine learning, systems work, and engineering tradeoffs.\u003c/p\u003e\n\u003cp\u003eIf you are new here, start with one of the guided paths below. If you already know what you want, use the archive to browse everything by date.\u003c/p\u003e","title":"Writing on Machine Learning, Systems, and Engineering","type":"posts"},{"content":"","externalUrl":null,"permalink":"/myblog/authors/","section":"Authors","summary":"","title":"Authors","type":"authors"},{"content":"","externalUrl":null,"permalink":"/myblog/series/","section":"Series","summary":"","title":"Series","type":"series"}]