I Built a Fully Automated Daily Blog That Publishes Itself at Midnight
tl;dr: Claude Code Daily is a fully automated blog that publishes every night at midnight. one Python script scans 5 subreddits, generates a late-night-show-format digest with Opus, validates it against an anti-slop engine, and pushes to production. 8 consecutive issues. zero manual edits. the full architecture is open source.
the question nobody asks
everyone's building with AI. agents that write code, agents that do research, agents that draft emails. but almost nobody lets the agent publish.
the last mile is trust. you'll let Claude write 10,000 lines of code into your repo, but you won't let it publish a 2,000-word blog post? something doesn't add up.
I wanted to find out what it actually takes to trust automated content enough to skip the review step entirely. not as a stunt. as a production system that runs every night while I sleep.
what Claude Code Daily is
a daily digest of the Claude Code ecosystem. think late night TV meets dev news. it scans r/ClaudeCode, r/ClaudeAI, r/vibecoding, and two GTM subreddits. tracks 180+ posts per day. generates sections like "the pulse" (the day's vibe), "hottest thread" (what blew up), "troll of the day" (roasted with love), and "best comment award" (the single best Reddit comment, quoted verbatim).
here's what the March 24 edition opened with:
tuesday in the Claude Code cinematic universe and the community chose violence. two massive stories collided today: Anthropic quietly shipped /dream while simultaneously half the subreddit is ready to cancel their Max subscriptions over a usage limit crisis that's now entering day two.
that was generated by Opus at midnight. pushed to production at 12:03 AM. live on shawnos.ai/claude-daily before anyone woke up.
the 5-phase pipeline
the whole system is one Python script (~1,500 lines) triggered by a macOS launchd cron.
Phase 1: COLLECT scan Reddit public JSON API (5 subs, ~250 posts)
Phase 2: ANALYZE score content angles with Claude CLI (Sonnet)
Phase 3: GENERATE create Reddit/X/LinkedIn versions
Phase 4: BLOG generate the daily digest with Opus
Phase 5: PUBLISH git push + Vercel deploy + Slack + LinkedIn
each phase is independently runnable. if the blog generation looks off, I can re-run just phase 4 without re-collecting. if I want to test a voice change, I run --phase blog --dry-run and see the output without publishing.
data collection (no auth needed)
Reddit's public JSON API is underrated. append .json to any subreddit URL and you get structured data back. no OAuth, no API keys, no rate limit headaches (if you're polite about it).
url = f"https://www.reddit.com/r/{sub_name}/hot.json?limit=50&raw_json=1"
for each post: title, score, comment count, selftext preview, URL. for the top posts: also fetch the 5 best comments with author and score. calculate a velocity score that weights recent posts higher than older ones with more total upvotes.
the whole collection takes about 90 seconds across 5 subreddits.
scoring and analysis
Claude CLI (Sonnet for speed) scores each post on viral potential, voice fit, gap score, and effort level. outputs the top 10 content angles with suggested platforms and post types.
this phase exists because the pipeline also generates standalone Reddit, X, and LinkedIn content from the same data. the blog phase uses both the raw data and the scored angles.
the blog generation
this is the core. Opus gets a system prompt that defines the show format:
This is THE daily show for Claude Code builders. Think late night TV meets dev news. You are the anchor. Confident, witty, a little unhinged but always informed. Think of yourself as the Jon Stewart of Claude Code news.
the user prompt includes every post collected that day, the top 20 comments, any GitHub repos linked, the content angles, and a continuity context block (more on that in a minute).
Opus generates the full digest in one pass. 2,000-3,000 words. real thread titles, real usernames, real upvote counts. specific enough that if you go check the subreddit, you can find every thread it references.
how it remembers yesterday
each issue is generated from scratch. without memory, every issue reads like the pilot episode. someone who followed the usage limit saga on Monday wouldn't see any acknowledgment on Tuesday.
the fix: story_tracker.json. after each issue is generated, the pipeline parses the blog with regex (no extra API calls) and extracts:
- today's troll of the day (user + quote)
- today's best comment (user + quote + upvotes)
- today's repo of the day
- active multi-day stories (usage limits, model launches, etc.)
- running community gags (like Opus telling everyone to go to sleep)
before the next generation, this tracker gets injected into the prompt as continuity context. with rules:
- if an active story has new posts, reference it naturally
- if yesterday's troll posted again, call it out
- running gags get one line max, light touch
- if nothing new, stay silent. forced callbacks are worse than none
the result is a show that feels like it has memory without any expensive context management.
the anti-slop engine
this is the part I'm most proud of and the reason auto-publishing works.
16 regex patterns that catch the most common AI writing tells. em-dashes (Opus loves them). authority signaling ("the uncomfortable truth is..."). narrator setups ("here's where it gets interesting"). dramatic rhetorical framing ("want to know the crazy part?"). hype words ("game changer", "unleash", "supercharge").
def validate_anti_slop(text):
violations = []
fixed = text
for pattern, desc, auto_fixable, fix_func in ANTI_SLOP_RULES:
matches = list(re.finditer(pattern, fixed))
if matches:
violations.append(f"{desc}: {len(matches)} occurrence(s)")
if auto_fixable and fix_func:
fixed = re.sub(pattern, fix_func, fixed)
score = max(0.0, 100.0 - len(violations) * 10.0)
return score, violations, fixed
every generated piece (blog, Reddit post, LinkedIn draft) passes through this. the blog gets a second chance: if the first pass scores under 80%, the pipeline regenerates with the violation list injected as explicit constraints.
in 8 issues, the average anti-slop score has been 90%+. Opus rarely triggers more than 1-2 violations per generation. the retry path has fired exactly once.
but the real trust comes from the voice system.
the voice system
two markdown files loaded into every prompt:
core-voice.md defines who I am and how I write. builder-first perspective. casual competence. lowercase first-line style. specific tool references. pop culture mixed with technical depth. the file is 130 lines of specific rules, not vague guidelines.
anti-slop.md is the 29-pattern detection system. split into "always catch" (auto-delete on sight) and "context-dependent" (flag but consider). it also lists natural patterns that look like AI tells but are actually my voice (ellipses for trailing thoughts, arrows for workflow steps, emoji section markers).
these files evolve independently from the pipeline code. I update core-voice.md when I notice a new pattern I want to codify. the next midnight run picks it up automatically.
this is the key insight: the voice files are the product, not the pipeline. the pipeline is plumbing. the voice files are what make the output sound like a person instead of a summarizer.
the numbers
| metric | value |
|---|---|
| issues published | 8 (consecutive, no gaps) |
| average anti-slop score | 90%+ |
| manual edits | 0 |
| pipeline runtime | ~5 minutes |
| cost per issue | ~$0.15 (one Opus + a few Sonnet calls) |
| posts scanned per day | 180+ |
| subreddits tracked | 5 |
| total upvotes tracked (per day) | 7,000-10,000 |
what I learned
voice DNA is the leverage, not the model. Opus is powerful but generic. the 130-line voice file is what makes the output specific. anyone running the same pipeline with a different voice file would get a completely different publication.
anti-slop is a gate, not a filter. the goal isn't to fix bad content. it's to prevent bad content from shipping. the difference matters because fixing creates artifacts. prevention creates clean output.
continuity costs almost nothing. the story tracker is a single JSON file, updated with regex parsing. no extra API calls. no vector database. no embedding search. just structured data and seven rules for how to use it.
residential IP matters for Reddit. cloud IPs get blocked or rate-limited aggressively by Reddit's public API. running on a Mac Mini with a residential IP means consistent, reliable collection. this is an underrated advantage of local-first infrastructure.
the scoreboard keeps you honest. every issue includes exact numbers: posts tracked, total upvotes, velocity scores. if the model hallucinated a thread that doesn't exist, the numbers would be off and readers would catch it. specificity is a trust mechanism.
build your own
the full pipeline architecture is documented at github.com/shawnla90/claude-code-daily. it's a reference implementation, not a clone-and-run repo (you'd need your own voice files, your own community to scan, your own hosting).
what you actually need:
- a community worth scanning daily (subreddits, forums, Discord)
- a voice DNA file specific enough to constrain the model
- an anti-slop file with patterns specific to your voice
- hosting with auto-deploy from git push
- a cron trigger (launchd, GitHub Actions, whatever runs reliably)
the pipeline is model-agnostic. the voice files are the product. the anti-slop engine is the trust layer. everything else is plumbing.
shawn, the gtme alchemist