1. My Story How I Got Into This
Okay so let me be real with you.
Three months ago my typical morning looked like this: coffee in one hand, seventeen browser tabs open, ChatGPT in one, Gmail in another, some random Reddit thread somehow always in a third one. I was “using AI” every day but I wasn’t actually getting more done. I was just busier.
The breaking point came on a Tuesday. I had a client article due, a proposal to write, and three emails I’d been avoiding for four days. I opened ChatGPT, typed half a prompt, got distracted by a notification, ended up on Twitter for twenty minutes, came back, forgot what I was doing, and started over.
Three hours later I had written maybe 400 words. That’s genuinely embarrassing.
My friend Omar he does independent research and content work mentioned he’d moved almost his entire workflow to local AI. No browser tabs. No subscriptions. Everything running on his laptop.
I thought he was being extra about it. But then he screen-shared his setup and showed me his actual daily output. The numbers were hard to argue with.
I spent that weekend copying his setup. Then I tracked everything for the next 30 days words written, tasks completed, hours of focused work. Compared it to the 30 days before.
The difference was big enough that I’m still using it six months later. And now I’m writing this so you don’t have to figure it all out yourself.
2. Why Local AI Actually Beats Scattered Cloud AI Use
This took me a while to understand. And once I got it, everything clicked.
The Real Problem Isn’t the AI It’s the Friction
When you use ChatGPT or Claude normally, here’s what actually happens every single time:
You’re working → you need AI help → you open a browser → you’re now surrounded by every distraction on the internet → you type your prompt → you wait → you read the answer → you try to carry that answer back to your actual work
That whole process takes anywhere from 60 seconds to “oh it’s been 25 minutes and I’m reading about something completely unrelated.”
Local AI cuts that chain. The AI is already where you’re working. You stay in your writing app, your notes app, your files. There’s no browser involved. No tab switching. No Gmail sitting one click away.
Here’s a simple way to see the difference:
Cloud AI Workflow (What Most People Do)
| Step | What Happens | Time Lost |
|---|---|---|
| Need AI help | Open new browser tab | 5 sec |
| Navigate to ChatGPT | Log in if session expired | 10–30 sec |
| Type prompt | Also see Gmail notification | Focus broken |
| Wait for response | Check Twitter “quickly” | 5–20 min |
| Read response | It’s in browser, work is elsewhere | Context switch |
| Go back to work | Try to remember what you were doing | 2–5 min |
| Total cost | Per AI interaction | ~30 min easily |
Local AI Workflow (What I Do Now)
| Step | What Happens | Time Lost |
|---|---|---|
| Need AI help | Press keyboard shortcut | 1 sec |
| AI opens in same app | Already in my workspace | 0 sec |
| Type prompt | No other tabs visible | Focus kept |
| Get response | On my machine, fast | 10–15 sec |
| Use response | Already in my work context | 0 sec |
| Total cost | Per AI interaction | ~20 seconds |
That difference 30 minutes versus 20 seconds sounds extreme. But when you actually track how many times you reach for AI in a day, it adds up to hours.
Internet Off = Deep Work On
Here’s a bonus nobody really talks about.
Local AI works with zero internet. So I started doing two-hour work blocks with my WiFi completely off. No distractions possible. But I still have full AI capability on my laptop.
This combo alone probably accounts for half my productivity improvement. The local AI made it possible. The internet-off habit made it powerful.
Your Data Stays With You
This isn’t the main reason I switched but it’s worth saying. When I type client names, project details, money amounts, sensitive business stuff into ChatGPT that’s leaving my machine. With local AI it never does.
3. My Exact Local AI Setup
No vague recommendations. Here’s literally what I run.
Hardware
My machine: MacBook Pro M2, 16GB RAM, 512GB storage
The M2 chip is genuinely good for local AI. Apple Silicon handles these models efficiently better battery life than you’d expect, no overheating, and the unified memory architecture means the GPU and CPU share RAM, which helps a lot with model performance.
If you’re on Windows: An Nvidia RTX 3060 or better will actually outperform my setup for larger models. CUDA acceleration on Nvidia cards is very fast for AI inference.
Minimum to get started: 8GB RAM (you’re limited to smaller models but it works), any modern CPU from the last four years.
My Full Tool Stack
| Tool | What It Does | Cost | Link |
|---|---|---|---|
| Ollama | Runs AI models locally the engine | Free | ollama.com |
| Msty | Desktop chat interface for local models | Free | msty.app |
| Obsidian | Local notes app my second brain | Free | obsidian.md |
| Smart Connections | AI search across all my Obsidian notes | Free plugin | — |
| MacWhisper | Local voice-to-text transcription | Free/Paid | goodsnooze.com |
| Whisper Desktop | Same but for Windows | Free | GitHub |
| MarkDownload | Browser extension to clip articles to Obsidian | Free | Chrome/Firefox |
| Continue | Local AI coding assistant inside VS Code | Free | continue.dev |
Models I Use and When
| Model | Size | Best For | Speed on M2 |
|---|---|---|---|
| Llama 3.1 8B | 4.7GB | Quick edits, emails, short tasks | Very fast |
| Qwen2.5 14B | 8.2GB | Writing, organising, longer content | Medium |
| DeepSeek-R1 7B | 4.1GB | Step-by-step reasoning, problem solving | Fast |
| Phi-3 Mini | 2.3GB | Super quick tasks, low RAM machines | Very fast |
| LLaVA 7B | 4.5GB | Reading screenshots, image description | Medium |
My daily rule: Use the smallest model that can do the job. Save the bigger models for complex tasks. This keeps response times fast and doesn’t drain battery.
4. How Local AI Actually 2x–3x My Daily Output
Writing Workflow The Biggest Win
Before local AI, starting an article looked like this:
Open doc → stare at blank page → write a bad opening sentence → delete it → check phone → write another bad sentence → repeat for an hour.
Now it looks like this:
Step 1 Brain dump (10 minutes) Open a blank Obsidian note. Type everything in my head about the topic. No structure. No editing. Just raw thoughts, half-ideas, questions, random points. Looks like notes from a confused person on a phone call.
Step 2 Get structure from AI (2 minutes) Paste my mess into Msty and type:
“These are my rough notes on [topic]. Don’t write any content. Just organise these into clear sections and bullet points for an article structure.”
Qwen2.5 14B gives me back a skeleton. It’s usually 75% right. I fix what’s wrong.
Step 3 Write from the outline (fast) Writing from a solid outline is three times faster than writing while figuring out structure at the same time. This is the single biggest change.
Step 4 Edit with AI (quick passes) After the draft is done, quick prompts for editing:
- “Make this paragraph shorter and more direct”
- “This sounds confusing. Rewrite it simply”
- “Too many filler words here. Clean it up”
Llama 3.1 8B handles all of this quickly.
Before vs After Writing an Article
| Task | Before Local AI | After Local AI |
|---|---|---|
| Getting started | 30–45 min (blank page paralysis) | 10 min (brain dump) |
| Creating structure | 20–30 min | 2 min (AI does it) |
| Writing first draft | 2–3 hours | 60–80 min |
| Editing passes | 45 min | 20 min |
| Total time | 3.5–4.5 hours | 1.5–2 hours |
Note Taking and Second Brain
I read a lot. Articles, research, newsletters, documentation. The old problem: I’d read something interesting, save it somewhere vague, and never find it again when I actually needed it.
Now every article I want to keep goes into Obsidian as a plain text file. I use the MarkDownload Chrome extension one click saves any webpage as clean markdown text directly into my Obsidian folder.
Over three months I’ve saved around 340 articles and notes.
When I need something, I open Smart Connections and ask:
“What have I saved about content marketing? Summarise the main ideas.”
It searches my actual notes not the internet and gives me a summary of what I’ve already read. It’s like having a research assistant who read everything you read and remembers all of it.
Note System Comparison
| Feature | Old System (Bookmarks + Notion) | New System (Obsidian + Local AI) |
|---|---|---|
| Saving articles | Browser bookmarks (never found again) | One-click to Obsidian folder |
| Finding old notes | Manual search, usually failed | AI query in plain English |
| Connecting ideas | Manually linking (rarely did it) | Smart Connections finds links |
| Works offline | No | Yes, completely |
| Privacy | Cloud synced | All local |
| Monthly cost | Notion: $8–16/month | Free |
Voice to Actionable Notes The Hidden Gem
This one I didn’t expect to love as much as I do.
I think faster than I type. When I’m trying to work through a problem or capture ideas, typing slows me down and I lose the thought. So I started recording voice memos and using local Whisper to transcribe them.
My exact process:
- Open MacWhisper, hit record
- Just talk ideas, thinking out loud, notes after a meeting, rough draft of something 3 to 5 minutes
- Whisper transcribes it locally, super accurate, no internet
- I paste the transcript into Msty:
“This is a voice memo I recorded. Remove filler words, clean up the transcript, and pull out the key points as a clear numbered list.”
What comes back is clean and immediately usable.
I replaced maybe 40% of my typing with this. It’s faster and honestly my thinking comes out better when I’m just talking than when I’m staring at a screen.
Research and Idea Generation
When I’m starting a new article or project and need angles and ideas, I used to just Google things and hope something sparked.
Now I do this:
Paste in what I already know about a topic and ask:
“I’m writing about [topic] for [audience]. Based on what I’ve shared, give me 10 specific article angles that would actually be useful. Not generic specific and interesting.”
Then I go through the list and pick what resonates. Usually two or three are genuinely good. Takes five minutes instead of twenty.
For deeper research, I use Perplexica an open-source local alternative to Perplexity AI that can actually search the web but runs on your machine. It’s a bit technical to set up but worth it if you do a lot of research work.
5. Real Results After 30 Days Actual Numbers
I tracked everything in a simple spreadsheet. Here’s what I found comparing the 30 days before vs the 30 days after setting up local AI.
My 30-Day Output Comparison
| Metric | Before Local AI | After Local AI | Change |
|---|---|---|---|
| Articles written | 6 | 14 | +133% |
| Average article time | 4.1 hours | 1.8 hours | −56% |
| Words written per day | ~800 | ~2,100 | +162% |
| Deep work hours per day | 1.5 hrs | 3.2 hrs | +113% |
| Tasks completed per week | 23 | 41 | +78% |
| Days I felt “productive” | 12/30 | 24/30 | +100% |
I want to be straight with you though. The first week was not great. I was slower because I was learning the setup. Week two I started finding my rhythm. Week three it started clicking. Week four the numbers above happened.
This isn’t instant. Give it three weeks before you judge it.
6. Complete Step-by-Step Setup Guide
Clear, simple, no skipping steps.
Step 1 Install Ollama (15 minutes)
Go to ollama.com → download for your OS → install it.
Open your terminal (Mac: press Cmd+Space, type “terminal”) and run these one by one:
ollama pull llama3.1
ollama pull qwen2.5:14b
ollama pull deepseek-r1:7b
Each one downloads a model. They’re a few GB each so use good WiFi. You only download once.
To test it’s working, type: ollama run llama3.1 and say hello. If it responds, you’re good.
Step 2 Install Msty (10 minutes)
Go to msty.app → download → install.
Open it. It should automatically find your Ollama models. If it doesn’t, go to Settings → Local AI → set the URL to http://localhost:11434.
Select Qwen2.5 14B and send a test message. If it responds, you’re done.
Set up a keyboard shortcut to open Msty from anywhere on your computer. This is important you want to access it without clicking around.
Step 3 Set Up Obsidian (20 minutes)
Go to obsidian.md → download → install → create a new vault (this is just a folder on your computer).
Inside Obsidian:
- Go to Settings → Community Plugins → turn off Safe Mode
- Click Browse → search “Smart Connections” → install → enable
- Go to Smart Connections settings → set model to your local Ollama URL
Now install MarkDownload in Chrome or Firefox. Go to any article, click the extension icon, it saves a clean text version directly to your Obsidian folder.
Start saving five articles you’d normally just bookmark. That’s your second brain starting.
Step 4 Set Up Voice Transcription (10 minutes)
Mac: Download MacWhisper from goodsnooze.com. Free version works fine to start.
Windows: Download Whisper Desktop from GitHub (search “Whisper Desktop Const me” it’s the main one). Also free.
Open it, record yourself talking for two minutes about anything. Hit transcribe. See how accurate it is. You’ll be impressed.
Step 5 Create Your Prompt Library (30 minutes, ongoing)
Make a new note in Obsidian called “My AI Prompts.” This becomes more valuable over time as you add and refine prompts.
Start with the ones in the next section.
7. Best Prompts I Use Every Single Day
This section alone is worth saving. These are the exact prompts I actually use, not made-up examples.
Writing Prompts
For getting structure from messy notes:
“Here are my rough notes on [topic]: [paste notes]. Don’t write any content. Just organise these into a clear article structure with main sections and bullet points under each.”
For editing a paragraph:
“Make this more direct and cut anything unnecessary. Keep the meaning exactly the same: [paste paragraph]”
For rewriting something simpler:
“Rewrite this so a normal person can understand it easily. No jargon, short sentences: [paste text]”
For writing an introduction:
“Write three different opening paragraphs for an article about [topic]. Each should start with a different angle a story, a surprising fact, and a direct problem statement. Keep each under 80 words.”
Note and Research Prompts
For summarising a long article:
“Summarise this article in 5 bullet points. Focus on what’s actually useful and actionable, not just what it’s about: [paste article]”
For finding connections in your notes:
“I’ve been reading about [topic A] and [topic B]. What’s an interesting angle that connects both of these that most people miss?”
For generating article ideas:
“I write for [your audience] about [your topic]. Give me 10 specific article ideas that would actually help them. Not generic titles specific, useful angles.”
Voice Memo Prompts
For cleaning up a transcript:
“This is a voice memo transcript. Clean it up, remove filler words, and pull out the main points as a clear numbered list: [paste transcript]”
For turning a ramble into a plan:
“I recorded myself thinking out loud about a problem. Turn this into a clear plan with specific next steps: [paste transcript]”
Daily Planning Prompts
Morning planning:
“Here’s everything on my mind for today: [brain dump]. Help me turn this into a simple prioritised task list. Put the most important stuff first.”
End of day review:
“Here’s what I did today: [list]. What should I carry forward to tomorrow and what can I drop?”
Prompt Comparison Vague vs Specific
| Vague Prompt | Specific Prompt | Quality Difference |
|---|---|---|
| “Write an intro for my article” | “Write 3 intro options for [topic] one story-based, one stat-based, one question-based. Under 80 words each.” | Dramatically better |
| “Summarise this” | “Summarise in 5 bullets focused on what’s actionable, not just what it covers” | Much more useful |
| “Help me write better” | “Rewrite this paragraph to be shorter and more direct. Keep the meaning.” | Actually usable |
| “Give me ideas” | “10 specific article angles for [audience] about [topic]. Not generic.” | Night and day |
8. Common Mistakes and How to Avoid Them
Using big models for everything I used Qwen2.5 14B for tasks that Llama 3.1 8B handles easily. Just slower for no reason. Rule: start with the small model. Only go bigger if the output isn’t good enough.
Setting up Smart Connections with an empty vault The AI search is only as good as what’s in your notes. If you have ten notes, it’s not very useful. Save articles consistently for two or three weeks first, then the AI search starts becoming genuinely powerful.
Expecting it to be perfect from day one The first week is slower. You’re learning a new workflow. Don’t judge local AI by week one. Judge it by week four.
Not saving prompts that work When you write a prompt that gets a great response, save it immediately. I lost probably fifteen really good prompts in the first month because I didn’t save them and couldn’t remember them later.
Trying to replace everything at once I tried to move every single tool to local AI in the first week. It was overwhelming. Start with just the writing workflow. Add voice transcription after a week. Add Obsidian after two weeks. Layer it in.
Ignoring keyboard shortcuts If you have to click around to open your AI tool, you won’t use it enough. Set up the Msty keyboard shortcut on day one. Seriously, do this first.
9. Who Should Try This? And Who Shouldn’t
This setup works really well for:
- Writers and bloggers the writing workflow alone is worth it
- Researchers the Obsidian second brain is genuinely powerful
- Freelancers handle sensitive client info without it leaving your machine
- People who work offline a lot planes, travel, bad internet situations
- Anyone paying for multiple AI subscriptions this is free after setup
- People who get distracted easily no browser means fewer distractions
This setup is probably not right for:
- People who need cutting-edge AI quality GPT-4o and Claude are still smarter for complex tasks
- People on low-spec hardware under 8GB RAM will be a frustrating experience
- Non-technical people who hate any setup there’s some initial configuration involved
- Teams who collaborate on AI tools local AI is personal, not collaborative
- Anyone who needs real-time information local models don’t browse the internet by default
Quick Decision Table
| Your Situation | Recommendation |
|---|---|
| Writer or content creator | Definitely try it |
| Handles sensitive client data | Strongly recommended |
| Works offline regularly | Yes |
| Needs best possible AI quality | Stick with cloud AI |
| 8GB RAM or less | Start with Phi-3 Mini only |
| Wants zero setup | Not for you yet |
| Paying $40+/month on AI tools | Very worth trying |
10. Final Thoughts and Where This Is All Going
Six months in, I’m not fully local. I want to be honest about that.
For client-sensitive work, offline work, daily writing, and note processing fully local. For really complex research or when I need the absolute best output quality I still use Claude occasionally.
The hybrid approach is probably where most people land. And that’s completely fine.
What surprised me most was that switching to local AI made me more intentional about how I use AI generally. When there’s no friction in accessing it, you use it without thinking. When you’re deliberate about it, you use it better.
The models are also improving incredibly fast. Llama 3.1 running locally today is better than ChatGPT 3.5 was two years ago. In another year, the gap between local and cloud AI will be even smaller.
The setup takes one weekend afternoon. The cost is zero once it’s running. The privacy benefit is real. And the productivity improvement if you actually build the habits around it is genuinely significant.
Whether you go all-in or just try the writing workflow first, something in this setup will probably stick. Start small. See what happens.
