Share of work shipped that wouldn’t exist without AI — vs 27% industry average per Anthropic’s 2026 Agentic Coding Trends Report.
industry
27%
aviv
75%
Front-end development
Very High
Usage of AI to design and develop front-end software — landing pages, product dashboards, game clients, interactive demos, mobile web, IDE extension UI.
Usage of AI to build and run a shipped game end-to-end — gameplay code, UI, economy, level design, admin tooling, live-ops analytics — plus game-industry intelligence products.
Usage of AI across core UX-research competencies — qualitative analysis, survey design, instrumentation, quantitative synthesis, and roadmap conversion.
Synthesis to roadmap— Edtech vision updates, Harmony affiliate-pivot decision
Product strategy
High
Usage of AI to define product direction — vision, ICP, business model, MVP scoping, roadmap, positioning — including pivot decisions grounded in shipped-data evidence.
Social listening — 7 platforms— Breakout: Steam · SteamDB · Twitch · TikTok · YouTube · Reddit · Twitter/X (platform-specific scrapers feeding a unified GTM timeline). Also Harmony homeschool-forum listening.
Marketing
High
Usage of AI across the full ad lifecycle — programmatic campaign management, creative generation, copy, A-B testing, performance diagnosis — plus newsletter and social.
Ad lifecycle management— Harmony custom Meta-API campaign tooling
Creative generation— Harmony AI-generated ad images + visuals
Definition. The AI stack the operator wields — agent runtime, MCPs, retrieval infrastructure (RAG, vector DB, embeddings, rerankers), model routing — with attention to closed-loop leverage, not ghost installations.
Also Known As: Power user of agentic coding tools · tool/function calling · RAG · embeddings · vector search · MCP servers · LLM APIs
MCP roster — 10+ servers permissioned and actively called from agent loops: Granola, Gmail, Google Calendar, Notion, Playwright, Supabase, Vercel, Nanobanana, Typefully, RapidAPI-Twttr, chrome-mcp-server, user-sqlite.
Closed-loop workflows via MCP — meeting-processing skill chains Granola → transcript → wiki → CRM (user-sqlite) → Gmail draft → STATUS.md update in one pipeline.
Hybrid SQL + semantic-search agent — Breakout's ai-marketing-agent registers 10 Gemini function-calling tools including find_similar_games (ChromaDB semantic search + Cohere rerank) and query_database (read-only Postgres SELECTs); agent routes between SQL and vector retrieval inside one streaming chat.
Production RAG with retrieve-then-rerank — ChromaDB top-N retrieval followed by optional Cohere reranking across 4,352 Steam strategy games; comparison-mode UI shows "boosted" and "dropped" results so operator can eval rerank quality live (300-500ms no rerank / 800-1500ms with).
Multi-model evaluation router — Steam page scoring engine routes 6 rubrics across 3 providers by per-component strength: GPT-o3 (copy) / Gemini 2.5 Pro (hero trailer video) / GPT-4.1 (tags + screenshots + capsules). One scorer file per component under a shared llm_router.
Multi-model video pipeline with context caching — TechTree's video flow uploads full gameplay video to Gemini File API, creates a CachedContent, reuses across step-1 feature identification + step-2 per-feature captions; rate-limited ThreadPoolExecutor batches 5-10 concurrent under a 900 req/min guardrail (5× throughput).
Vector DB + embeddings across products — ChromaDB managers in both TechTree's VideoProcessingFlow and Frontend; embeddings feed similar-game discovery, canonical-feature matching, and dedup flows. Two shipped products.
CLI / package distribution — Motif CLI on PyPI (v0.7.9, Apache-2.0); Click-based command group, structured-JSON output contracts, versioned analysis prompt independent of package version, daily update check, motif setup installs companion skill into Cursor + Claude Code.
05Context Engineering
ProficiencyVery High
Definition. Deliberately managing what the LLM sees — via hierarchical rule files, custom skills, knowledge structures written for agent consumption, and programmatic hooks.
Also Known As: Context engineering · system prompts · prompt guides · behavioral specifications · AI-readable knowledge base · translating user feedback into prompts
Best Practices
Hierarchical CLAUDE.md files that act as routers
Custom, project-specific skills authored and iterated
Knowledge base designed for agent retrieval
Programmatic hooks that protect the knowledge substrate
Catching context leaks mid-session as a reflex
Evidence
Hierarchical rule files — 6 CLAUDE.md files across the tree: repo root + 5 sub-projects (Harmony, content-strategy, vibe-coding-portfolio, vibe-cli, Motif output artifact). Each scoped to its domain.
Router architecture — root CLAUDE.md (95 lines) is a dispatcher, not a knowledge dump: 9-row workflow-triggers table loads scoped files on demand.
User-written skills — 13 venture-specific SKILL.md files under .cursor/skills/. Zero off-the-shelf. Range from 30-line helpers to 352-line workflow drivers.
Skill iteration after real failure — twitter-engage SKILL.md was re-grounded against 20 real tweets (+187/−69 diff) after producing LinkedIn-flavored drafts.
Knowledge base shaped for agent retrieval — 44-page wiki with mandatory frontmatter (decay_class, weighted connections[]), hop-weighted retrieval protocol (hop 1 weight > 0.5, hop 2 > 0.7, cap 8 pages).
Persistent behavioral memory — 19 files at the user-level memory folder (10 feedback overrides + 5 project-state + index + profile). Some explicitly contradict skill defaults (e.g. Opus-locked subagents overrides a cheaper-model skill recommendation).
Context-leak fence at trust boundaries — private/public repo boundary codified as a catch-yourself guard in vibe-cli/CLAUDE.md line 5.
Root CLAUDE.md is a living artifact — 17 commits from 2025-12-31 through 2026-04-20. Notable iterations: "make wiki retrieval mandatory", "add wiki workflow triggers and architecture layer", "add concurrent agent safety rules", "reorganize repo for context isolation".
06Agent Concurrency
ProficiencyVery High
Definition. Running multiple AI sessions in parallel and deploying multiple agents per session as a cognitive default, not a special technique.
Also Known As: Agentic workflows · multi-agent workflows · orchestration patterns · sub-agents · agent skills · hierarchical orchestrators
Best Practices
Multiple Claude Code / Cursor sessions active in parallel
Routine subagent dispatch from a parent orchestrator
Worktree isolation for parallel-write safety
Long-running background agents while operator continues
Concurrent agents over time · daily hourly-peak, Mar 2025 – Apr 2026
Busiest moment35 agents
Per-session max33 agents
Total work sessions2,324
Evidence
Concurrent sessions — 35 AI sessions simultaneously active at peak (2026-03-10), measured by sweep-line across Claude Code + Cursor session intervals.
AI agents per work session — long-tailed distribution: 62.8% single-agent · 14.4% 2–3 · 8.8% 4–7 · 8.4% 8–13 · 5.6% 14+ (12 sessions, max 33 agents).
Autonomy as throughput — operator frames concurrency as "agent runs unattended once a task is kicked off" — not "multiple windows open." Sets the brief for the autonomous-hooks catalog.
Subagents / agent teams — scoped-prompt specialist subagents dispatched from a parent orchestrator; results merged. The DATA-REFRESH-PLAN in-voice codifies the pattern.
Worktree isolation for parallel-write safety — root CLAUDE.md mandates worktree isolation for 5+ file changes; routine worktree-switch before plan-mode-then-dispatch.
Runtime split across agent clients — Claude Code 54.1% / Cursor 45.9% of messages over 14-month window; CLI-first + GUI-second as parallel first-class tools.
Model routing for subagents — Opus locked as default for all subagents via durable memory; override rule after a Haiku-downgrade regression.
Structure Lock for subagent output — orchestrator rewrites subagent returns to fit existing sections rather than letting them grow the HTML ad hoc (P51).
Self-measurement loop — motif-cli parses Aviv's own ~/.claude/projects/**/*.jsonl + Cursor archive; the chart above is emitted by scripts/motif-concurrency.py, so the evidence is measured by the tool he ships.
07Planning
ProficiencyHigh
Definition. Thinking upfront — with sophistication proportional to the task's scope — before implementation begins. The signal is whether planning produces a reusable artifact the agent and operator can both execute against.
Also Known As: Spec-first discipline · prompt-as-spec · PRDs with AI · plan mode · scoping · rapid prototyping with agentic tools · templated deep research
Best Practices
Write acceptance criteria and constraints before invoking
Use plan mode (Claude / Cursor) before code edits
Prototype in Cursor, Bolt, v0, or Lovable
Iterate the plan itself — multiple revisions, not one-shot
Attach rich context: file paths, references, prior research
Reuse templated research patterns for recurring investigations
Evidence
Spec-first tool chain — superpowers:brainstorming → writing-plans → executing-plans skills loaded as first-class skills in every project session; produces paired spec + plan files before code ships.
Plan / spec files committed before implementation — 16 design specs + 13 matching implementation plans under docs/superpowers/ covering Mar 30 – Apr 17 2026 (multi-child support, plan sharing, scraping pipeline, frontend migration, OAuth, dashboard reactivity, subject-aware planner, questionnaire flow, others). Same-day commits with the matching feature branch.
Plan-file iteration before execution — frontend-migration spec landed, then a review-feedback commit absorbed 5 named design corrections (file paths, scope, replacement sources) before the implementation plan was committed.
Spec revised from real-data learnings — curriculum-scraping-pipeline spec rewritten after initial draft to require stage-by-stage founder review ("No building ahead — each stage may change based on data learnings"), then implementation plan shipped.
Wireframing — visual mockups during UX design — landing-page work generates product mockups as PNG assets before writing copy; interactive React mockup added for stakeholder review before the Platform section was finalized.
Pre-planning research subagents — scoped inventory / recon agents dispatched before design. This Data Refresh Plan sends a Phase 0 subagent to rank the full repo universe before any of the 9 Phase-1 research agents run.
Mid-session plan correction against reality checks — operator redirects the agent's proposed A/B structure with a design principle: "stages are defined by purpose, not methodology." Plan reshaped in-flight.
Plans as reusable artifacts across sessions — a brainstorm spec file from a prior session is resumed by pointing at the saved file; assistant summarizes the 5 open questions, operator picks which thread to continue.
Acceptance criteria + decision tables upfront — every design spec opens with a Decisions table enumerating constraints and rationale before architecture (e.g. frontend-migration spec names 9 decisions with explicit rationale before any implementation description).
08Control & Verification
ProficiencyHigh
Definition. The operator's exercised judgment over AI output — overriding, correcting, verifying — plus the systems built so agents can run long without babysitting.
Also Known As: Control over the AI · verification habits · human-in-the-loop escalation · confidence thresholds · guardrails · prompt-injection defenses · safety by design
Best Practices
Overriding AI suggestions with explicit reasoning
Verification-before-completion patterns
Hooks that programmatically enforce constraints
Corrections codified into rules/memory immediately
Model-failure handling with proactive guardrails
Evidence
Verification rules in config — root CLAUDE.md §Verification names four post-write checks: SELECT after CRM insert/update, file-existence check after creation, date-diff after STATUS.md edit, em-dash/arrow scan before finalizing newsletter content.
Pre-completion checklists — root CLAUDE.md §Pre-Completion Checklist mandates STATUS.md / CRM / orphaned-TODO persistence before session close. "Rule of thumb: if the user closes this conversation, can they find every TODO? If not, persist it now."
Persistent behavioral-feedback memory as override layer — ~12 feedback_*.md files at the user-level memory folder, each overriding a specific AI default with named reasoning. Memory index is categorical: domain knowledge lives in the wiki, memory is for preferences + behavioral feedback + external references only.
Correction-to-rule loop with session provenance — 5 of the ~12 feedback files carry an originSessionId: frontmatter field tracing the rule back to the session where the AI's behavior was corrected. User corrects once → rule committed to memory → future sessions start with the override loaded.
Explicit override of AI defaults — memory entry overrides a named skill's recommendation: writing-plans suggests cheaper models for mechanical tasks; feedback rule forces Opus across all subagents.
Pre-mortem artifacts — autonomous-hooks catalog — 337-line spec cataloging 8 hooks (observability, auto-mode tuning, secret scanner, protected-branch guard, Stop verification, SessionStart orientation) with reference implementation, build-vs-buy verdict, success criterion, and explicit cut/defer rationale per principle "start with ≤8 hooks; add only when a real failure justifies it." Spec on main; implementation on a feature branch.
Hooks implemented with tests — 11 hook-related commits on a feature branch (H0 observability, H2 secret scanner + file-deny, H3 protect-branch with a self-redirection guard, H5 Stop verification, H7 SessionStart orientation) with a T15 smoke test at 7/7 pass. Observation window active; not yet 30-day-validated.
Prompts-vs-hooks design thesis — explicit principle in the hooks catalog: prompts achieve ~70-90% compliance (community consensus); hooks achieve 100% because they run outside model reasoning. Paired with a "use native features before building" rule — Auto mode + permissions.deny handle the permission layer; custom hooks only where native coverage is insufficient.
01
Eval design
ProficiencyHigh
Definition. Designing the evaluation system that decides whether an AI feature is shippable.
Best Practices
Closed-vocabulary failure-mode taxonomy with uncertain fallback
Output-diff harness gating every prompt promotion
Per-output confidence + reasoning on the audit trail
Structured-output contract enforced by unit tests
Multi-stage workflow where each stage's output is independently checkable
Golden datasets benchmarking live output against user intent
Evidence
Closed-vocabulary tagger schema — Harmony's 10-dim taxonomy with explicit uncertain fallback; schema miss breaks the build.
A/B output-diff harness — 50/50 split through parallel tagger pipelines; reports per-side counts and declares winner before a prompt ships.
Multi-stage prompt workflow as eval — Breakout runs chained prompts (MegaTimeline.md → per-game *MegaPrompt.md → EnrichMilestones.md); each stage produces a checkable artifact that the next stage verifies or replaces.
Hybrid SQL + semantic agent eval loop — TechTree chatbot answers grounded in vector+SQL store; behavior tuned via regression iterations against named failure modes.
Per-output confidence + reasoning audit trail — every tagged row carries confidence + reasoning; low-confidence rows are the review queue rather than silent drops.
What would move this to Very High: a checked-in golden dataset (20–50 hand-labeled items) + eval-diff script comparing current output against it, with a prompt-edit commit that references the eval delta.
02
Model management
ProficiencyVery High
Definition. Mapping product surfaces to a model-tier strategy and owning cost, latency, and quality tradeoffs.
Best Practices
Per-surface model routing (one model per component, not global)
Per-component cost and latency budgets
Cost escape-hatch wired in as a first-class mode
Cross-provider selection owned at the product layer
Evidence
Per-component routing config, externalized — steam_page_analyst's config/llm_routing.json maps 7 scoring components to model providers (capsule/tags/screenshots/about/json_parsing → GPT-4.1, description → o3, hero_trailer → Gemini 2.5 Pro video). Swapping a model is a config edit, not a code edit.
Multi-model video pipeline — TechTree's VideoProcessingFlow routes full-video reasoning to Gemini 2.5 Pro (context-cached) and per-frame feature extraction to a second-tier model; two-step prompt contract.
Tiered research-to-extraction pipeline — Harmony: Haiku for wide research, Sonnet for narrow structured extraction; tier matched to task difficulty.
Product-layer cross-provider selection — Breakout scorer routes per rubric across OpenAI o3, Gemini 2.5 Pro (video), and GPT-4.1; choice explicit in product docs, not hidden in an SDK call.
Batch-API cost escape-hatch — Anthropic Batch API wired as a documented ~50% cost fallback, invoked intentionally for large offline runs, not as a fire-drill.
03
Safety & policy
ProficiencyMed
Definition. Defining what the product will and will not do — behavioral defaults, refusal boundaries, abuse-surface review.
Best Practices
Explicit non-claims written into product copy
Refusal language and fallback tested, not assumed
Behavioral defaults tied to product personality
Scoring posture declared (grade vs. signal vs. rubric)
Red-team sessions captured and redacted
Evidence
Scoring-posture decision — Motif vibe-report design work explicitly declared what the report will NOT claim: framework self-placements cut, composite scores removed, honest-pending over inflation.
Behavioral defaults codified as rules — "no tropes, no hyperbole, no opinion" written into the behavior-reset spec as a refusal the system enforces on itself (principles P30–P32).
Rubric authored from public corpora — Safety & Policy PM rubric itself grounded in Anthropic Model Behaviors PM JD language; anti-patterns list doubles as self-check.
What would move this to High: a documented red-team session + classifier-filter vs. human-in-the-loop design decision for one user-facing AI surface.
04
Prompt lifecycle management
ProficiencyVery High
Definition. Production prompts as versioned, tested, owned artifacts — not strings an engineer edits in place.
Best Practices
Versioned prompt contracts with explicit major-version bumps
Prompts stored in a database, not hard-coded in source
Dedicated admin UI for edit, test, diff, and promote
Structured-output validation with re-run on schema miss
Prompt changes tied to eval deltas before promotion
Public changelog for user-facing prompts
Evidence
Prompts in a database + live/draft versioning — steam_page_analyst stores system + user prompts in Supabase with multiple versions per prompt and a live/draft status flag; promoting a draft is a row-level action, not a git merge.
Dedicated prompt admin tool (Streamlit) — ~40-module UI to create/edit/version prompts, run against real game rows, diff outputs across versions, collect human + LLM feedback, swap providers per component.
Structured-output validation with re-run — Harmony tagger asserts JSON contract before rendering; schema miss triggers re-run, not silent degradation.
Public versioned prompt stamps — Motif CLI (PyPI) stamps its analysis prompt 0.3 → 0.7 with public changelog; users know which prompt generated their report.
05
AI UX
ProficiencyHigh
Definition. Designing the human-facing surface around probabilistic output — confidence, uncertainty, error recovery, refusal.
Best Practices
Explicit confidence + uncertainty signaling
Structured input when chat is the wrong UX
Error recovery and citation UX
Refusal / fallback language tested
Evidence
Structured-input onboarding over chat — Harmony rejected chat for onboarding; a 10-dimension questionnaire populates the recommender schema directly. 88% questionnaire completion (C02).
Real-time AI productivity dashboard (TUI) — Motif CLI renders a terminal dashboard with color-coded threshold bands (red / yellow / green / purple) for AIPM, concurrency, and per-agent efficiency; thresholds calibrated from lifetime distribution so users see where their session sits vs. their personal percentile.
IDE-embedded AI status surface — Motif VS Code / Cursor extension brings AIPM + concurrency into the editor status bar + sidebar; dual-source ingestion (hooks primary / SQLite poll fallback) keeps signal live when one source drops.
Operator dashboard for AI-driven data — Chess Arena admin dashboard (Streamlit) surfaces live product + landing-page analytics to non-technical operators; keeps a human reviewer in the loop without routing through a chatbot.
Funnel iteration on commit record — in-modal assessment → dedicated page → questionnaire-first trial; paywall moved after profile completion once session data showed 95% browse / 16% plan-build.
06
Metrics & instrumentation
ProficiencyHigh
Definition. Defining what "working in production" means for a probabilistic system — AI-specific OEC, online quality sampling, and task-level acceptance tracked alongside engagement.
CPA tracking with root-cause diagnosis — $1.68 CAC at best; C03's higher CPC ($0.16) traced through to Meta auto-allocating 38% of budget to the less efficient ad variant. Root cause, not aggregate number.
AI-specific OEC baked into the product — Motif CLI tracks AIPM, concurrency, autonomy ratio, and output density as its core metric set, then exposes a Growth Scorecard comparing first-25% vs last-25% of sessions to detect drift. Engagement metrics wouldn't catch autonomy or density regressions.
Real-time telemetry with peak tracking — motif live records per-session peak AIPM, peak concurrency, session duration, and persists bests to records.json so users see personal distribution shift over time.
Optimization-event choice — Meta campaign optimization set to account_created over higher-volume profile_completed — the latter optimizes for tire-kickers; the former optimizes for committed users.
Campaign-level commit log — raw Meta-Ads data persisted for replay + retrospective eval.
07
Product sense
ProficiencyVery High
Definition. Turning a model affordance into a scoped, shippable product — PRDs with model assumptions, eval plans, kill-criteria.
Best Practices
PRD with model assumptions + eval plans + kill-criteria
Prototypes built personally to validate feasibility
Scope updates when model capability shifts mid-build
Ship in weeks, not quarters
Evidence
Model-affordance-to-product in under a month — Motif: conceived, built, published to PyPI as motif-cli with companion VS Code + Open VSX extension; beta pilot with Lutz Finger (Cornell Tech). Scope stayed tight (Python CLI, 2 dependencies, local-only data); release-notes + branching conventions codified in vibe-cli/CLAUDE.md.
Small-n pilot landing unit economics — Harmony C02: 35 signups at $1.67 CAC, 87.5% questionnaire completion, 16% plan-build rate, 15.4% affiliate click rate; clear diagnosis of which ad variants to keep vs. drop. Product sense + metric discipline in one pilot loop.
Deep-research → product demo in 2 weeks — Goldie: OpenAI o3-deep-research produced a 131KB market analysis that became the product's own demo content; low-code app (Lovable) + hero video (Clueso) shipped alongside research. Background-task polling kept long o3 jobs off the UI thread.
Public post-mortem + knowing when to kill — Breakout: 60+ validation calls including SVP-level at Ubisoft, $1M valuation offer received and declined, followed by a public LinkedIn post-mortem. Product sense here is the kill decision, transparently documented.
What this is. A behavior-based portfolio of AI-native working practice — complementing Aviv's resume with an assessment of AI proficiency derived from his actual chat transcripts and project setup.
How to read it. Each section measures one observable practice. Every section has the same shape:
Definition — one sentence on what the behavior is.
Also Known As — concepts used in JDs or frameworks that map to this skill.
Best Practices — what this looks like in practice, 3–5 bullets.
Proficiency — Low / Med / High / Very High, reflecting depth across shipped evidence.
Evidence — linked items you can verify. Click See example to view a source snippet without leaving the page.
Proficiency scale.Very High means distinctive mastery demonstrated across multiple projects with advanced technique. High means solid, lived practice. Med means present with gaps. Low means emerging.
Sources. Rubric vocabulary drawn from 18 assessment platforms (interviewing.io Meta, Anthropic AI Fluency, TestGorilla, VBeyond), 16 PM JDs (Anthropic, OpenAI, Figma, Glean, Ramp, Apollo, LinkedIn APB, Hebbia, Vercel v0), 15+ engineering JDs (Anthropic, OpenAI, Cursor, Vercel, Zapier, Palantir, Deloitte), and published frameworks: Eledath's 8 Levels of Agentic Engineering, Anthropic's 2026 Agentic Coding Trends Report, Duvall / InfoQ Agentic AI Patterns, Patil Sample AI-Native PM Stack 2026Q1, Productside, MarTech Five Orders of AI Maturity.
Instrumentation. Motif (PyPI) reads Aviv's Cursor + Claude Code conversations to produce the session counts and concurrency data; custom motif-concurrency.py derives timeline stats from raw JSONL.
This is a ~35-line excerpt. The full CLAUDE.md is 118 lines with 17 cross-references to skills and rules.
# CLAUDE.md
This file provides guidance to AI agents working in this repository.
## Project Overview
Personal command center for an early-stage EdTech venture focused on improving education in the age of AI.
**Mission:** Help every child find their special calling.
## Agent Behavior
### Subagent Delegation (IMPORTANT)
Prefer delegating to subagents for complex, multi-step tasks. This prevents overloading the main context window and improves quality.
### Pre-Completion Checklist
Before finishing ANY work session, verify:
- [ ] STATUS.md updated (if outreach sent, meetings processed, contacts added)
- [ ] CRM database updated (if contact interactions occurred)
- [ ] No orphaned TODOs left only in conversation (log to STATUS.md or TASKS.md)
### Concurrent Agent Safety
Multiple agents often work on this repo simultaneously. Follow these rules:
- Shared files (STATUS.md, CLAUDE.md, crm.db): append-only edits over rewrites
- Code changes (5+ files): use a worktree (isolation: "worktree")
- Motif CLI / public repo: always branch, never push to main
- CRM batch operations: note what you're updating in the commit message
### Verification
Always verify your own work after write operations:
- After CRM inserts/updates: run a SELECT to confirm the change landed
- After file creation: confirm the file exists
- After STATUS.md edits: confirm the "Last Updated" date changed
- For newsletter content: scan for em-dashes and arrows before finalizing
This table makes the root CLAUDE.md a router: agents only load the referenced file when the trigger fires, keeping context lean.
## Cross-Project Workflow Triggers
These triggers apply regardless of which project you're in. Load the
referenced file BEFORE starting the task.
| When you're... | Load this file |
|---|---|
| Updating STATUS.md or persisting TODOs | docs/WORKFLOWS-STATUS.md |
| Processing a meeting | .cursor/skills/meeting-processing/SKILL.md |
| Preparing a meeting agenda | .cursor/skills/meeting-agenda-prep/SKILL.md |
| Building a landing page | .cursor/skills/build-landing-page-prompt/SKILL.md |
| Engaging on Twitter | .cursor/skills/twitter-engage/SKILL.md |
| Committing and pushing | .cursor/rules/commit-and-push.mdc |
| Creating a collaborative doc | .cursor/skills/proof/SKILL.md |
| Running wiki maintenance | wiki/schema.md (full document) |
One of 13 purpose-built skills. The frontmatter is the contract — name, description, and trigger phrases govern when the harness loads it.
---
name: meeting-processing
description: Automate post-meeting workflow — pull Granola transcript,
extract takeaways, create meeting notes, update CRM, update STATUS.md,
draft thank-you. Use when processing a meeting, after a call, or when
asked to "process meeting notes."
---
# Meeting Processing
End-to-end workflow for processing meetings in the EdTech command center.
Pulls transcript from Granola, extracts insights, creates structured notes,
updates CRM and STATUS.md, and optionally drafts a thank-you.
## Quick Start
Required inputs:
- Who: Person's name (and optionally organization) from the meeting
- Project: One of homeschools, assessment-pilot, vibe-coding-portfolio,
content-strategy
Tools: Granola MCP, user-sqlite MCP, Read/Write for files.
Read crm/Aviv's Voice.md before drafting thank-yous.
Skills iterate on real failure — the commit that ground the voice in 20 real tweets and added pre-scoring filters.
commit 34bc83b7dd414811801cec7801c5557c0a9b1d47
Author: Aviv Sheriff <avivsheriff@gmail.com>
Date: Thu Mar 12 21:39:52 2026 -0400
Overhaul twitter-engage: voice from real tweets, content filtering
Analyzed 20 real @Avivsh tweets to extract actual voice patterns
(short questions, snarky quips, casual analogies) and replaced
generic reply frameworks with concrete examples. Added Step 3
content-type and account-type filtering to kill bad opportunities
(release notes, code demos, no-reply-surface accounts) before
scoring.
Before: drafts sounded like LinkedIn.
After: drafts match Aviv's real voice.
.cursor/skills/twitter-engage/SKILL.md | +187 -69
41 wiki pages across 6 types. Co-access on SessionEnd strengthens edges; decay geometrically decays unused edges by class.
# Wiki Schema
These rules govern how the LLM interacts with the wiki.
## Retrieval Protocol (every session start)
1. Read wiki/index.md to orient
2. Semantic search (grep/glob) for the user's topic in wiki/
3. Read matched wiki pages
4. Follow connections in frontmatter where weight > 0.5 (hop 1)
5. For hop-1 pages, follow connections where weight > 0.7 (hop 2)
6. Stop. Maximum 8 pages loaded unprompted.
## Page Creation Rules
- Frontmatter is mandatory: title, type, sources, last_activated, decay_class
- Connections are managed by automated scripts — do NOT manually edit weights
- Every claim should cite a source file in /knowledge/ or a wiki page
- Page types: entity, concept, domain, decision, thread, operation
### Registering New Pages in graph.db
sqlite3 wiki/graph.db "INSERT INTO pages VALUES ('page_id', 'title',
'type', 'decay_class', date, 0, date);"
One of 11 behavioral-feedback files. The memory layer governs skills, not the other way around.
---
name: Use Opus for all subagents
description: Always use Opus 4.6 (default model) for subagents, not Sonnet or Haiku
type: feedback
---
Use Opus 4.6 for all subagents — do not downgrade to Sonnet or Haiku
even for "mechanical" tasks.
Why: User explicitly requested this. The writing-plans skill suggests
using cheaper models for simple tasks, but Aviv wants Opus quality
across the board.
How to apply: Do not pass model: "sonnet" or model: "haiku" to Agent
tool calls. Omit the model parameter entirely (defaults to Opus) or
explicitly set model: "opus".
Prompt-level fence around the public-repo boundary, until a hook can watch it.
# CLAUDE.md
Guidance for AI agents working in the motif-cli project.
**IMPORTANT: motif-cli/ is a PUBLIC GitHub repo.** Never create or edit
CLAUDE.md, .cursorrules, or any agent config files inside motif-cli/.
All agent guidance lives here in the parent vibe-cli/ folder, which is
tracked by the private Edtech repo. If you catch yourself about to
write to motif-cli/CLAUDE.md, stop and write here instead.
## Two-Repo Setup
The motif-cli source lives at vibe-cli/motif-cli/ but has its own .git
(public repo). The parent vibe-cli/ folder is tracked by the Edtech
repo (private).
Design spec for incident-driven, deterministic-first hooks — the catalog Aviv plans to roll out as quality gates.
# Autonomous Development via Claude Code Hooks — Catalog & Rollout
Date: 2026-04-19
Status: Design — pre-implementation
Owner: Aviv
## Principles
1. Hooks beat prompts for enforcement. Prompt-based guidance achieves
~70–90% compliance; hooks achieve 100% because they run outside
model reasoning. Use prompts for creative direction, hooks for
quality gates and safety.
2. Incident-driven, not planning-driven. Start with 3–8 hooks; add
only when a real failure justifies it.
3. Determinism first, LLM-judge second. Fast regex paths decide
90%+ of tool calls instantly. LLM judgment is the exception
handler for the gray zone, not the default.
4. No phone notifications. The goal is to remove the human from the
loop, not page them async.
## Representative hooks
H2. Secret-scanner PreToolUse — catches .env leaks the static deny
rules miss. Success: zero secret-leak commits over 30 days.
H3. Protected-branch guard — prevents silent push to main on
rebased/dropped code. Success: no protected-branch force pushes.
H5. Stop hook verification — runs npm run build && npm test.
Claude literally cannot declare victory on a broken build.
H6. Stop hook task-gate — cannot mark a plan complete while
TodoWrite items remain open. Success: zero "claimed done but
plan incomplete" outcomes over 10 runs.
The CRM-update verify rule firing in a real session — the constraint failure on harmony-03 is caught by the read-back SELECT before Claude narrates completion.
[Session: d563287f.jsonl · 2026-03-26]
Claude: Brad Saft / HBC is already in the CRM. OpenEd is not. Let me check
what's in the HBC record and add OpenEd.
[bash] sqlite3 crm/crm.db "SELECT * FROM contacts WHERE id='harmony-02';"
[bash] sqlite3 crm/crm.db "UPDATE contacts SET notes='...' WHERE
id='harmony-02';"
[bash] sqlite3 crm/crm.db "INSERT INTO contacts (...) VALUES
('harmony-03', ...);"
[bash] sqlite3 crm/crm.db "SELECT id, name, organization, notes FROM
contacts WHERE id='harmony-03';"
Claude: That ID is taken. Let me use a different one.
[bash] sqlite3 crm/crm.db "SELECT id FROM contacts WHERE id LIKE
'harmony-%' ORDER BY id;"
[bash] sqlite3 crm/crm.db "INSERT INTO contacts (...) VALUES
('harmony-57', ...);"
[bash] sqlite3 crm/crm.db "SELECT id, name, organization, priority,
outreach_status FROM contacts WHERE id IN
('harmony-02', 'harmony-57');"
Claude: Done. Both records updated/added. harmony-02 notes updated;
harmony-57 added [...].
A single-session friction becomes a durable rule: the cd X && git ... permission prompt gets diagnosed as an unsilenceable harness check, then written to memory in the same turn.
[Session: 014c6542.jsonl · 2026-04-18]
Aviv: the permissions are the issue. Accept edits makes sense though. but
that feels like a pretty minimal change. here's an example of a permission
request I get often: [cd X && git ... pattern pasted inline]
Claude: Got it — you're right, the specific-prompt pattern changes the
answer. That `cd X && git ...` case is a dedicated harness security check
("bare repository attack"), separate from the allow-list. No amount of
`"Bash(git *)"` or `"Bash(cd *)"` will silence it because it fires on the
*compound* shape.
Aviv: yes [authorizes the fix]
Claude: [tool_use: Write] memory/feedback_avoid_compound_cd.md —
"Use `git -C`, `npm --prefix`, absolute paths instead of `cd X && cmd`."
[tool_use: Edit] adds originSessionId: 014c6542-11e1-42a9-9596-10f1380789f9
The operator overrides a skill-suggested default (Sonnet for "mechanical" subagents), and the override gets written to memory so it loads in every future session.
[Session: 7bf34428.jsonl]
Aviv: Why are subagents using sonnet 4.6 instead of opus 4.6? is that in
instructions anywhere? I want them to use opus 4.6
Claude: The writing-plans skill recommends using cheaper/faster models
for mechanical tasks. I was following that guidance by setting
model: "sonnet" on implementation subagents. But you're the boss — I'll
use Opus for all subagents going forward.
[tool_use: Write] memory/feedback_subagent_model.md —
"Use Opus 4.6 for all subagents — do not downgrade to Sonnet or
Haiku even for 'mechanical' tasks."
Claude: Saved. All future subagents will use Opus 4.6.
Breakout's multi-stage prompt workflow functions as its eval rubric: when Aviv finds a milestone-prioritization failure mode, he fixes the rubric by editing the stage template itself.
[Session: Breakout.json · 2025-10-27]
Aviv: Prompt includes making milestones or KPIs based on the monthly
rollup. However, what happens sometimes is that the metrics are not
prioritized correctly, and so we miss milestone cards for really
important metrics such as Steam metrics. The most important metrics
should be Steam metrics. In order of priority: (1) Steam revenue,
(2) CCU / wishlists, (3) followers, (4) number of reviews. Then social
media metrics. Ideally there should be a variety of milestone cards ...
if all the cards were TikTok engagement, that would be too much.
Claude: I've analyzed your feedback and will update
Prompts/MegaTimeline.md to improve the prioritization of milestone
metrics and incorporate the commentary guidelines. ... I am now applying
these changes.
A gap-review subagent flags framework self-placements and composite scores as safety issues in the report — don't ask the reader to trust an unvalidated framework, don't publish a score you can't source, don't fake completeness with placeholder text.
[Session: edtech.json · 2026-04-18]
Subagent: Partial — structurally sound, but the voice and evidence
scaffolding is off-brief. The main report is organized around frameworks
the reader doesn't know (an internal level stepper, self-placement
percentiles, composite scoring) rather than around the three target
roles the brief names. ... Three dashed-border "Evidence slot" empty
cards in the most recruiter-relevant section is a visible red flag.
Either fill them from real artifacts or remove the rows and make it an
N-proficiency section labeled honestly. ... A standalone "questioning
profile" section with all three bullets marked `Sample: Pending
redaction` is exactly the soft-skill / archetype leak the brief
explicitly rules out. Cut the section.
Aviv writes a voice rule into the product as a refusal the system enforces on itself, then asks for it to be logged as durable principles so future sessions cannot silently drift.
[Session: edtech.json · 2026-04-18]
Aviv: We need to avoid any tropes or hyperbole. Everything should be
very succinct and to the point. E.g. "the retrieval layer literally"
— saying "literally" is a no-go in this setting. We need to be really
dry and descriptive. No opinion / conjecture / hyperbole. ... Make sure
to log whatever generalizable feedback I'm giving in our guidelines doc.
Claude: Principles logged (28-34) in PORTFOLIO-PRINCIPLES.md:
- P30 no tropes / hyperbole / opinion
- P31 evidence must be framed as evidence
- P32 don't re-label obvious things
- ...
No "literally" / hyperbole — stripped throughout.
Aviv authors the Safety & Policy Design PM rubric itself, grounded in public 2026 JD language. The anti-patterns list doubles as a self-check for his own product work.
[Session: edtech.json · 2026-04-17]
Category 3 — Safety, Behavior & Policy Design
Definition. Defining what the product will and won't do — behavioral
defaults, refusal boundaries, steerability constraints, and abuse
surfaces — and validating them through red-teaming.
What "good" looks like in 2026. Writes behavior policy docs (what the
system says/does in edge cases, refusal taxonomy, escalation paths);
runs or commissions red-teaming passes with structured findings;
distinguishes safety issues that are model-layer vs. application-layer;
knows when to use system-prompt guardrails vs. classifier filters vs.
human-in-the-loop review. Per Anthropic's Model Behaviors PM JD, this
includes maintaining a taxonomy of model behaviors across capabilities.
Anti-patterns. Safety bolted on post-launch. Treating "safety" as one
interview round. Only blocking "bad words" with no behavioral framework.
No documented refusal policy.
Aviv reviews a plan for the prompt admin tool's versioning logic and catches two ambiguities before the DB schema ships.
[Session: steam_page_analyst.json · 2025-07-17]
Claude: Here are my thoughts, organized by the sections in your document:
1. Versioning Logic. You mention the new version number is
(MAX(version)+1) scoped to (component, prompt_type). Since a "version"
of a prompt for a component will always include both a system and user
part, would it be simpler to calculate the MAX(version) scoped only to
the component? This would ensure both the new system and user prompts
get the same, predictable version number in one step.
When a user clicks "+ New Version," does it duplicate the currently
viewed version or the live version? I assume it's the currently viewed
version, which makes the most sense for iterating on drafts, but it's
worth specifying.
2. Placeholder Extraction. A couple of points could make the
implementation even more robust: path-notation consistency (dot vs.
bracket), and handling non-primitive placeholders (lists of items, not
just scalars).
Aviv designs Motif's terminal dashboard as color-coded threshold bands rather than raw numbers, with thresholds to be re-centered against real user distributions once data exists.
[Session: unknown.json · 2026-03-09]
Claude: Revised Tier 1: Three metrics
Concurrency | Active parallel AI sessions | Your scale
| Red: 0 / Yellow: 1 / Green: 2-3 / Purple: 4+
AIPM | Total tokens per minute | Your throughput
| Red: <5k / Yellow: 5-15k / Green: 15-40k / Purple: 40k+
AIPM/Agent | Tokens per minute per agent | Your efficiency
| Red: <3k / Yellow: 3-8k / Green: 8-15k / Purple: 15k+
(Thresholds are rough — we'd calibrate from real Motif user data.
The exact numbers matter less than the color psychology.) Concurrency
is the headline metric. AIPM is the throughput flex. AIPM/Agent is the
skill metric — this is what separates someone who opens 4 sessions
and context-switches poorly from someone who keeps 4 agents fully
productive.
During Harmony's C03 paid-acquisition pilot, aggregate CPC looks high. Instead of treating the aggregate as the diagnosis, Aviv decomposes it to ad-variant / age / placement — which surfaces the real product bottleneck downstream of the ads.
[Session: edtech.json · 2026-04-13]
Claude: Ad-Level Performance (Top Performers):
Ad 04 — Convention Meme: 10.75% CTR, $0.96/LPV
(48% of total spend, 64% of LPVs) — WINNER
Ad 11 — We Read Reviews: 8.82% CTR, $1.70/LPV
Ad 07 — Receipt: 6.59% CTR, $1.62/LPV
Ad 09 — Corkboard: 0.57% CTR, $3.19/LPV (WORST)
By Age: 35-44 is 51% of spend ($1.26/LPV). 55-64 is cheapest ($0.54
CPC, $0.62/LPV) — efficiency champion.
By Placement: Feed 94.4% of spend / 7.58% CTR. Reels 5.1% / 3.50% CTR.
Stories 0.6% / 0% CTR (dead).
Key finding: The ads performed exceptionally well (5-10x Facebook
education benchmarks) but converted 0 of 72 landing-page visitors to
the $79/month advisory booking. This triggered the pivot to the free
curriculum tool model.
Harmony has two plausible optimization events for the Meta algorithm — higher-volume profile_completed (pre-auth) or lower-volume account_created (post-auth). Pick the lower-volume one.
[Session: harmony.json · 2026-04-11]
Claude: Go with account_created as the optimization event, not
profile_completed. Reasoning:
1. It's the first event that represents real commitment — they
gave you an email.
2. It's still upstream enough to hit ~50/week faster than affiliate
clicks.
3. profile_completed fires before the auth wall, so it doesn't
distinguish tire-kickers from committed users. Meta would optimize
toward people who answer questions and bounce at the wall —
expensive noise.
4. It's a clean stepping stone: account_created now → swap to
affiliate_click later once volume supports it. No wasted learning.
Mid-pilot, session data shows the paywall is the bottleneck. The fix isn't more engineering — it's three UI strings reframing what the paywall is for.
[Session: edtech.json · 2026-04-01]
Claude: The Stripe framing isn't about unlocking content — it's about
starting the relationship. The matches are a conversation tool for the
advisor, not a self-serve feature.
So the post-profile CTA and Stripe page should convey: "We know enough
to help you now. Start your free trial and book your call."
Updated framing:
- Profile results CTA: "Get personalized recommendations —
schedule a free advisor call"
- Stripe page: "Start your free trial to book your advisory call.
Your advisor will use your profile to prepare personalized
curriculum recommendations."
- The locked matches are a signal of value ("we found things for you")
but the unlock is the advisor conversation, not a paywall toggle.
The frontend-migration design spec absorbed five named design corrections from a review round in a single follow-up commit, before any implementation began.
[git show 7606036 · 2026-04-08]
commit 7606036f3623cd042273e027a77de1629edef52f
Author: Aviv Sheriff
Date: Wed Apr 8 11:13:18 2026
docs: address review feedback on frontend migration spec
- Anonymous answers stay in localStorage until merge (backend requires
family_id on answers table — no anonymous family concept)
- Peek panel uses generic unpersonalized recs during profile building
- Fix planStore.js path (src/lib/ not src/data/)
- Add esaStates.js to delete list (handled by questions.depends_on)
- Note tagColors.js replacement source (tag_definitions.color)
.../specs/2026-04-08-harmony-frontend-migration-design.md | 15 +++++-----
1 file changed, 9 insertions(+), 6 deletions(-)
Mid-build on the curriculum-scraping pipeline, the agent proposes collapsing an LLM classify step into Stage 2. Aviv overrides and forces the classify step to live in Stage 1.
[Session: pipeline.json · 2026-04-08]
Claude: [proposes A/B: move the classify step into Stage 2 alongside
breadcrumb normalization, since both are LLM work]
Aviv: No, we should generally separate LLM calls to specific tasks,
otherwise quality suffers. So keep this in stage 1 as a classify step.
The stages are defined by their purpose and what we're trying to
accomplish, not by the methodology.
Claude: Perfect — Stage 1 becomes "get clean, classified product data"
regardless of whether that uses HTTP parsing or LLM. The LLM classify
step is a natural part of scraping because you need it to filter
properly.
- Assign subjects from our taxonomy
- Flag core curriculum vs accessory
- Run on Haiku (cheap, fast, structured extraction)
A brainstorm spec from a prior session is resumed by pointing at the saved file — Claude summarizes the open questions left on the table, and the operator picks which thread to continue.
[Session: harmony.json · 2026-04-14]
Aviv: I have an in-progress brainstorm I'd like to resume:
docs/superpowers/specs/2026-04-14-harmony-retention-brainstorm.md
Claude: Good, I can see where you left off. The brainstorm landed on the
"data flywheel / concierge first year" concept and ended with 5 open
questions for the next session. Where do you want to pick up?
1. What does "outcome" mean? — What signal closes the feedback loop
2. Concierge vs. intelligence platform — Individual guidance vs. data
3. Revenue model specifics — Pricing structure
4. What comes before curriculum? — How far upstream to go
5. [...]
The operator designs a game-marketing analyst agent that routes between SQL reads and ChromaDB semantic search inside one Gemini function-calling loop — hybrid SQL + vector retrieval shipped as a user-facing chat.
[Session: GameMarketer.json · 2025-11-13]
Aviv: I want to create an AI agent front-end in a new folder in this project.
We will use the GEMINI AI SDK [...] The agent will be a game marketing agent
that will use tools to call our database. The agent will use tool calls to
leverage the capabilities we have created in this project [...]
Aviv: [after clarifying qs] Yes you can create a node.js typescript app but
compartmentalize it in a separate folder. web UI with react/next.js. we
should use the existing infrastructure and avoid building new stuff. The
existing infra already has ways to poll for progress via the database.
Claude: ## Implementation complete
### All 10 tools implemented
1. find_similar_games — ChromaDB semantic search
2. get_game_timeline — Timeline orchestrator with progress tracking
3. get_monthly_metrics — Monthly rollup CSV parsing
4. get_steam_metadata — Game metadata lookup
[...and 6 more, mixing read-only Postgres SELECTs with vector search]
The operator maps the six components of a Steam-store-page scoring engine to three model providers based on per-component strength testing.
[Session: steam_page_analyst.json · 2025-07-07]
Aviv: ## Feedback
Each one of my component scoring systems uses a different LLM, based on my
previous testing.
Capsules: GPT o3
Tags: GPT 4.1
Hero Trailer: Gemini 2.5 Pro
Screenshots: GPT 4.1
Short Description: GPT o3
About this page: GPT 4.1
Claude: Plan updated to reflect multi-LLM scoring:
• Added a routing table in M3 that maps each component to its specific
model (GPT-o3, GPT-4.1, Gemini 2.5 Pro) and the corresponding
environment-variable keys loaded from .env.local.
• Clarified that the scoring service dynamically selects the correct
client before invoking the prompts.
The operator specs a multi-step video annotation pipeline up front, with Gemini context caching reused across feature identification and per-feature captioning before screenshots hand off to GPT-4.1.
[Session: GameMarketer_VideoProcessingFlow.json · 2025-05-27]
Aviv: build a new annotation flow in the existing folder UnifiedFlow.
Unlike our existing annotation flow, this flow will use a holistic video
analysis before captioning screenshots and start from features, working
downwards to screenshots.
// Step 1
- Query the database using the game id, extracting game description,
game name, feature names+descriptions. Inject these into the step 1
prompt to Gemini 2.5 Pro. Also, take in a link to a youtube video.
Use context caching to reduce costs on subsequent calls.
- Gemini 2.5 will return a json with a list of features.
// Step 2
- Create a separate api call to gemini pro for each feature, continuing
the same conversation context. Inject the feature name from Step 1's
output. Gemini 2.5 will return a json.
// Step 3
- Extract screenshots at the timestamps using ffmpeg, tile them, inject
into GPT-4.1 via openAI API, save JSON.
Operator codifies the portfolio's structure rules before dispatching the next subagent wave.
[Session: 1dc4e9d2.jsonl · 2026-04-20]
Aviv: okay. let's continue the portfolio work. I think the overall structure
of the report is better now. It's important to me that we keep this structure.
did we document the decisions anywhere? I want to make sure that if we mine
more data using subagents, that it won't override the structure and the way
the content is presented.
Claude: Good concern. Honest inventory of what's documented vs. what's stale.
[proceeds to add a Structure Lock section to PORTFOLIO-PRINCIPLES.md, lock the
10-section order and mandatory shape, mark the v5 spec superseded]
Operator catches that work is on the wrong branch, moves it to a new worktree, then dispatches a scoped research subagent in a single turn.
[Session: d274490a.jsonl · 2026-04-18]
Aviv: What branch/worktree are we in?
Claude: [runs git branch + worktree list — on main, one other worktree exists]
Aviv: why we we in this branch? we should move this work to another worktree
Claude: [invokes using-git-worktrees skill; creates
.worktrees/hooks-catalog on feature/autonomous-hooks-catalog; moves
untracked spec; current worktree returns to clean state]
Aviv: Let's continue here. spin up another subagent. I'm especially interested
in our last agent's finding — that there are other implementations that do most
of the work for us, such as the claude skill/plugin, and the ralph example.
Claude: [tool_use: Agent] — dispatches research subagent to evaluate
session-report, ralph-loop, git-safe as potential drop-in coverage
for the H0/H3/H5 hooks the user is planning.
Operator frames concurrency as a throughput problem — he wants agents to run unattended once a task is kicked off, not just "multiple windows open."
[Session: 014c6542.jsonl · 2026-04-18]
Aviv: I'd like to consider what best practices we can adopt to improve the
amount of time agents can proceed on tasks without my involvement. Especially
once the "planning" phase is complete. I.e. in the planning phase, I'd like to
be closely involved, but after that, once I kick off a task, I'd like the
agent to proceed as long as possible without me. Some things to consider:
1. I turn off my PC at night [...]
Claude: I'll dispatch a claude-code-guide subagent to research this — they
specialize in exactly this space (hooks, settings, autonomy patterns, recent
releases). While they research, I'll hold off on brainstorming until we see
the landscape together.
[tool_use: Agent] [tool_use: Skill superpowers:brainstorming]
Operator orders a worktree switch; Claude invokes the writing-plans skill automatically before dispatching scoped subagents.
[Session: 431c7926.jsonl · 2026-04-21]
Claude: You're on the main worktree: C:\dev\Edtech at commit e05c3f0 on
branch main. There's one other worktree at .worktrees/hooks-catalog on branch
feature/autonomous-hooks-catalog.
Aviv: Let's go to the hooks-catalog worktree/branch and proceed to creating
an implementation plan
Claude: [cd into worktree; confirms branch]
Claude: Switching context to the hooks-catalog worktree. Since we're creating
an implementation plan, I need to invoke the writing-plans skill first.
[tool_use: Skill superpowers:writing-plans]
[tool_use: Read spec; tool_use: Bash git history; tool_use: Write plan]
Capability Expansion Ratio — a metric introduced in Anthropic's 2026 Agentic Coding Trends Report (Feb 2026). The fraction of work a person ships that they could not have shipped without AI assistance.
Industry benchmark. Anthropic's 2026 survey reports the average power-user ratio at ~27%.
Aviv’s estimate: 75% (conservative 54% · central 75% · aggressive 93%). Measured by project-level binary scoring across 17 substantive repos + 2 Motif orphans, weighted by AI-era commits, with a skill-stack-delta tie-break per project.
What drives the 75%. Three out of four things shipped in 2025–26 depend on AI-native capabilities the pre-AI stack could not have produced solo:
Multi-model LLM routing in production (Breakout, steam_page_analyst)
Hybrid SQL + vector-search AI agents (TechTree GameMarketer_Database)
10-dimension LLM tagging pipelines (Harmony)
AI-generated ad creative + copy + A/B tournaments (Harmony Meta campaigns)
Multi-model video-annotation pipelines with context caching (TechTree VideoProcessingFlow)
Published AI tooling for end users (Motif CLI on PyPI, Motif VS Code extension)
Methodology note. Manual measurement. Motif does not yet expose this metric as a stat. Figure will be automated once Motif adds a capability-expansion-ratio extractor.
Coming soon. Paste a job description and get a match report generated against this candidate's portfolio — matched strengths, gaps, and a hire signal — using the same proficiency data shown on this page.
The interactive version isn't live yet. Reach out if you'd like a manual match report against a specific role.
What's in this transcript? Aviv drives a Harmony landing-page session via external references and explicit data-model specs:
Names an external reference (OpenSky) to lock in sticky-column scroll architecture for the Platform section.
Diagnoses a misfiring IntersectionObserver and switches to a center-of-viewport scroll listener.
Points Claude at a cross-repo V12 file to import a stronger Advisor card and Year Plan pattern.
Redesigns Year Plan from "subjects as rows" to color-coded curriculum blocks with click-through advisor popups.
Specifies the data model (CurriculumStatus, RecommendationPopup, CurriculumEntry) before code is written.
Stage 1 — External reference drives section architecture. Aviv points at openskyplanner.com and asks for the same sticky-right / scrolling-left pattern, plus a font swap and a layout fix on the Advisor section.
[claude-code · landing-page session · 2026-03-30 17:45]
Claude: The landing page is polished and ready. Dev server running at
localhost:5174 if you want to check it in browser.
Aviv: ## Feedback
# First you broke the rules of the front-end design prompt skill by
using Inter.
# Like your curriculum advisor section looks weird. What it looks like
is right underneath the advisor image. I think that should probably
be below because it is like two images for one section. And then
also the mobile interface itself is just very long and not scrollable.
I know if you meant to make it scrollable or something but it's kind
of really; it looks weird, like a really long phone.
# The platform section is not implemented correctly. Basically the left
column should be scrolling while the right column is held in place.
And the text should scroll within that column as opposed to all the
text being visible as you scroll down the page. Essentially you
should create the illusion that only the left column is scrolling
while the title of the section and the images are constant and the
images only switch when you're done scrolling. It's inspired by this
landing page. https://www.openskyplanner.com/
Claude: Good feedback. Three issues to fix:
1. Body font — Inter is generic per the frontend-design
skill. Need a more distinctive choice.
2. Advisor section — WhatsApp mockup should be below both
columns, and the phone frame is too tall/non-scrollable.
3. Platform section — scroll behavior is wrong. Need to
study the OpenSky reference.
Let me look at the reference site first, then fix everything.
Claude: [swaps Inter -> DM Sans globally ("warm, rounded humanist sans
that pairs well with Playfair Display and feels approachable for
a homeschool audience"); rewrites Advisor section so the
WhatsApp mockup sits below both columns with a fixed-height
scrollable phone frame; rebuilds Platform as a ~300vh section
with sticky right column holding the section header + mockup,
and an IntersectionObserver on three tall left-column blocks
swapping a block-index state]
Claude: [first IntersectionObserver implementation misfires on block 2;
diagnoses that rootMargin "-20% 0px -40% 0px" is too aggressive
and that the sticky right column makes all three blocks visible
in one viewport simultaneously; replaces observer with a scroll
listener that picks the block whose center is closest to
viewport center — Profile / Year Plan / History now swap
cleanly]
Stage 2 — Cross-repo taste transfer: V12 reference drives Year Plan redesign. Aviv points at a TheV12.tsx file in another repo, demands the exact JSX (not a summary), and specs the Year Plan pivot to status-colored blocks with advisor-popup recommendations.
[same session · 2026-03-30 19:25, after the scroll-listener fix lands]
Aviv: Go to https://github.com/Bulugulu/Asset-Harmony and look at V12.
We did a better job there on the advisor card (has a real advisor
photo and is greater length, which looks better).
# I think we should remove the whatsapp chat, instead include a
whatsapp icon in the advisor card to show that we support that.
# The year plan should be the first thing on the "the platform"
section.
# The title (the platform, your whyole homsechool, in onr place)
should be stickied to the top of the page as the user scrolls. AS-is
the scroll lock starts below it.
# the advisro picture should be in the mockups as well.
I think that the year plan should be organized a bit better,
showing more subjects and color ocding them better. instead of
subjects as rows (limits to a few subjects), we should just show
different curricula/subjects by color on the thing itself. Maybe
we color code by "planned" "recommended", etc. we can show that
when the user clicks on a recommendation, a popup comes up that
shows the advisor's explanation of that recommendation and how
it ties to the student profile, etc. And an option to learn more
and/or go chat with the advisor on it on Whatsapp.
This way we can show how the yearly plan can actualyl be used,
i.e: to show what is already decided, to plan ahead, to get
recommendations, and potentially to "swap" (ie.. report that
something isn't working out). Parents can also upload examples
and data here, like why is something working/not working.
Claude: Let me look at the reference implementation first, then plan
the changes.
Claude: [dispatches sub-agent via `gh api` to fetch TheV12.tsx from
Bulugulu/Asset-Harmony (83,596 bytes, the largest variant);
returns a structural dissection of the Advisor card —
vertical stack of three sub-cards inside a #FBF8F3 container:
Card 1 identity block (SarahPhoto w-24 h-24 double ring,
Playfair Display name, Curriculum Advisor title, "Available
now" pill, italic pull-quote); Card 2 next-call panel;
Card 3 credentials]
Aviv: Look at the file artifacts/mockup-sandbox/src/components/mockups/
harmony-variants/TheV12.tsx in the GitHub repo Bulugulu/Asset-
Harmony. Use `gh api` to fetch the file content. I need the EXACT
code for: the AdvisorCard_Mockup component, the SarahPhoto export,
the BrowserChrome wrapper, the Year Plan section. Focus on
extracting the actual code, not summarizing.
Claude: [re-fetches and returns full JSX verbatim for each component]
Claude: [then drafts the Year Plan redesign plan:
Data Model — fundamental shift from "subjects as rows"
to "curriculum entries as positioned blocks on a shared
timeline."
type CurriculumStatus =
'active' | 'recommended' | 'planned' | 'needs-swap';
type RecommendationPopup = {
advisorReasoning: string;
profileConnection: string;
swapReason?: string;
learnMoreUrl: string;
whatsappMessage: string;
};
type CurriculumEntry = {
id, subject, program, status,
startMonth, endMonth, color, popup
};
Visual — lanes grouped by subject, colored blocks with
status-driven border/fill (active solid, recommended dashed,
planned translucent, needs-swap amber stripe), click opens
popup anchored to the block]
Claude: [executes in parallel: rewrites Advisor.jsx to drop the
WhatsAppMockup + chatMessages data and inline a WhatsApp icon
on the real-photo advisor card in V12 style; rewrites
Platform.jsx — Year Plan becomes first block, section
title pinned above the scroll-lock, SarahPhoto appears inline
in each mockup's advisor note, timeline renders color-coded
curriculum blocks with click-to-explain popups wired to the
RecommendationPopup shape]
What's in this transcript? Two consecutive backend planning sessions for GameMarketer:
Drafts a v1 plan for a Twitch Helix scraper before letting Claude write code.
Iterates on cadence tiers, schema, and aggregation math via numbered feedback rounds.
Interrogates the SQL-vs-pandas tradeoff for a 52k-game tag co-occurrence network.
Refines normalization choices and weighting with cost/storage as a constraint.
Corrects launch_status vs release_date logic using a real category breakdown.
Stage 1 — Planning a Twitch Helix scraper before writing code. Aviv hands Claude a v1 spec for an automated Helix scraper and pushes back through several rounds on schema, retention, and what "polish" and "backfill" actually mean.
[cursor · GameMarketer.json · 2025-09-22, turns 579-767]
Aviv: @TwitchHistoricalListener.md
I want to build an automated script that will use the twitch helix api
to get data for all games.
help me create a plan before developing. Write your plan in the
attached .md file.
the plan should include:
- deployment (I want to use railway)
- development of the script, including endpoints, etc.
- the database schema
- any calculations needed to aggregate the data from polled into
daily aggregate metrics
Keep it high-level, without any code (you can use pseudo-code if helpful)
## The plan (v1)
A) Snapshots you'll collect
Hourly full census (complete coverage + discovery)
Call GET /helix/streams with no filters and paginate (100 per
page) until the cursor ends; group by game_id.
Every 10 minutes: "head crawl" for accuracy on popular games
Keep just the first K pages (e.g., 20-50 pages => top 2,000-5,000
streams). /helix/streams is sorted by viewer_count desc.
Hourly top-games list (optional): GET /helix/games/top
B) What you compute (daily per game)
Avg CCV / Peak CCV / Hours watched (area under the CCV curve) /
distinct streams
C) Storage: per-game snapshots from both crawls; nightly rollups;
retain 10-min raw for 14-30 days
D) Call budget: ~28-30k req/day, well under ~800 req/min limit
[multi-page spec continues with aggregation formulas and endpoint notes]
Claude: [reads docs/templates/PRD_template.md and sibling templates to
align format; drafts full plan into
docs/backlog/TwitchHistoricalListener.md covering Railway deploy,
Helix endpoints, cadence tiers, schema for raw + head + rollup
tables, aggregation math, testing plan, rollout risks]
Aviv: In the first phase, I want to create a local version of the scirpt
and database to validate, before we deploy. twitch credentials are
in .env.local.
You can use the postgresql mcp to create the necessary tables in the
datbase I already created, called breakout_scrapers
Claude: [uses postgresql MCP to create enum, games, stream_snapshots,
stream_head_rows, daily_game_metrics, ingest_runs with PKs/FKs/
indexes]
Aviv: ok so now that we have the database set up, let's build the scraping
script and hook it up to the database
Claude: [builds CLI scraper, wires Helix auth, implements census + head
commands writing to the new tables]
Aviv: explain what is remaining exactly
Claude: - Top games job (poll /helix/games/top hourly, small snapshot table)
- Observability and resiliency (Sentry, structured logs,
rate-limit headers, retry/backoff)
- CI (GitHub Actions: ruff + pytest, gate integration tests on
secrets)
- Railway deployment (Dockerfile, env vars, schedules)
- Aggregates polish (coverage_ratio, distinct_streams fallback)
- Backfill and hygiene (backfill CLI, retention job for
stream_head_rows)
- Docs
Aviv: ## Feedback
- Drop the observability and CI stuff.
- Explain the polish. what polish is needed?
- What do you mean by backfill?
- What's the retention job?
Claude: - Polish (what's needed)
- Coverage ratio: populate coverage_ratio in daily_game_metrics
(share of day covered by head/census samples after merge).
- distinct_streams fallback: when no head rows exist for a
game/day, keep as NULL (don't estimate from stream_count);
rely on coverage_ratio to signal data quality.
- Day boundary and rounding: confirm UTC boundary, clamp last
interval to day_end, round hours_watched to 3 decimals.
- Idempotency: verify re-runs don't duplicate head rows
(composite PK already in place).
- Backfill
- We cannot fetch historical live data from Helix; "backfill"
means recomputing aggregates for past days from snapshots we
already collected.
- rollup-daily --day YYYY-MM-DD
- backfill --start YYYY-MM-DD --end YYYY-MM-DD
- Retention job
- Keep stream_head_rows 14-30 days; keep census/hourly snapshots;
keep daily rollups indefinitely.
Aviv: ## Feedback
- Don't worry about backfilling ATM. We can build a backfill CLI
command though for when I'm ready to do that.
- Sounds good on retention policy. Just to confirm - what do we lose
by deleting stream-head-rows after 14 days?
Claude: - Per-channel detail: can't inspect which channels streamed a game
and when (started_at) beyond 14 days. Can't run deep-dive channel
analyses (language mix, channel distributions).
- Recomputability/audit: can't recompute distinct_streams for old
days from raw head rows.
- What we keep: daily aggregates already stored in
daily_game_metrics; head/census per-game snapshots remain, so
avg/peak/hours_watched unaffected.
Aviv: in that case, let's keep stream_head_rows indefinitely for now. I'll
implement retention policies later depending on database growth rates.
Stage 2 — Interrogating methodology before committing compute (tag co-occurrence network). Aviv asks for a SQL-vs-pandas split, weighting, and cost analysis on 52k games before any code runs, then narrows the launched/unreleased split using a real category × launch_status table.
[cursor · GameMarketer.json · 2025-10-19, turns 3038-3061]
Aviv: @steam_schema.sql
WE have steam tag and revenue data for about ~50k games.
I want to run a network analysis on these games (just the games we
have tag data about). See below a plan. I need your help figuring
out the methodology. For example, should we run any calculatinos in
the postgres database, or extract the data first? Is there a view
that can provide us with these calculations? Help me think through
this - don't code anything yet.
Goal: Build two tag-co-occurrence networks.
- Current (released games, weighted by reviews)
- Predicted (unreleased games, weighted by followers)
Then compare them to find momentum and gaps.
Popularity weights (log-normalized):
pop_cur(g) = log(1 + rev(g))
pop_pred(g) = log(1 + fol(g))
Build two co-occurrence matrices W_cur(A,B), W_pred(A,B), plus
per-tag totals T(A) and grand totals G.
Normalizations (three flavors for both Current & Predicted):
1) Jaccard (binary + popularity-weighted)
2) PMI / NPMI with epsilon smoothing
3) Raw weighted co-occurrence
Compare Current vs Predicted via absolute delta, relative ratio,
symmetric normalized difference. [full formulation continues]
Claude: ### High-Level Strategy: Hybrid Approach
1. PostgreSQL for Heavy Lifting: aggregate co-occurrence weights
(W) and per-tag totals (T) directly in the database. The
self-join on Steam.game_tags is exactly what PostgreSQL is
built for, with existing indexes on game_id and tag_id.
Aggregated pair counts are far smaller than raw rows --
minimal data transfer. Save as MATERIALIZED VIEW for reuse.
2. Python for Analysis & Modeling: load the small aggregated
tables into pandas for Jaccard, PMI, and momentum scores.
Easy to iterate on smoothing epsilon, support thresholds,
alternative formulas.
Aviv: That makes sense. Shoudl I be concerned about storage size or time
of execution? We have approximately 52k games with tag data in the
database.
Claude: [estimates: ~15 tags/game x 52k games = ~780k rows in game_tags;
self-join produces ~5.5M co-occurring pairs pre-aggregation. With
existing indexes the query runs in seconds to a few minutes. ~400
distinct tags => max (400*399)/2 ~= 80k pair rows, a few MB.
Suggests EXPLAIN ANALYZE as a pre-commit benchmark.]
Aviv: Yes, let's plan the materialized view. Save it do a .sql file I can
execute in the database
Claude: [drafts four materialized views into
docs/sql/steam_tag_network_views.sql:
steam.tag_cooccurrence_current, steam.tag_totals_current,
steam.tag_cooccurrence_predicted, steam.tag_totals_predicted.
Self-contained, idempotent.]
Aviv: let's use release date instead of "launch_status" to determine
whether a game is launched or not.
also, filter for games after 2020 only
Claude: [updates views: release_date <= current_date for released;
release_date in future or NULL for unreleased; 2020-01-01 floor
on current network only]
Aviv: for unreleased games, sometimes the "release_date" is null, but
through previuos scraping we determined that it is a launched game.
we should use a combination of launch_States+release_date for that.
the major group of interest is when release_date is null (70k+
games in this category).
We also care about betas/demos, etc.
Here is the breakdown
category, launch_status, count
launched_low_engagement launched 30709
unreleased_low_new unreleased 20021
unreleased_low_old unreleased 12963
launched_active launched 5168
unreleased_active_new unreleased 4368
unreleased_active_old unreleased 2907
soundtrack unreleased 324
beta unreleased 308
soundtrack launched 300
demo unreleased 149
demo launched 141
beta launched 12
dlc launched 1
Claude: This is a critical clarification. My previous change was too
simplistic.
### New Logic Proposal
I will revert the primary logic to use launch_status as the main
separator between "current" and "predicted" networks. Then I'll
add the date filter to the "current" network as you requested.
1. Current (Released) Network:
Primary: launch_status = 'launched'
Date: AND (release_date IS NULL OR release_date >= '2020-01-01')
2. Predicted (Unreleased) Network:
Primary: launch_status = 'unreleased'
Aviv: we don't want soundtrack/dlc in there. just demo/beta/normal game
Claude: [adds category NOT IN ('soundtrack','dlc') filter to all four
views]
Game designSubagent orchestrationConfig-driven architectureTutorial designRetention systemsAsset spec
What's in this transcript? Aviv runs a Chess Arena GDD pivot session:
Pivots Chess Arena from 12 curated battles to a config-driven level/wave system.
Locks open rules: movement, capture, premove, pawn-back, bush stealth, match timer.
Designs the tutorial as an instrumented on-rails match against frozen, then live, AI.
Commissions a subagent GDD review and answers its 6-axis scorecard point-by-point.
Specifies retention systems (daily quests, victory chests, XP-gated unlocks) and asset manifests.
Stage 1 — Pivot the game model and lock open questions. Aviv rewrites the GDD to a level/wave model with config-driven layouts and difficulty, and resolves the open rules around movement, premove, terrain, AI memory, and tutorial scope.
[Stage 1 — GDD rewrite directive · ChessArenaTT_Phaser.json msg 0 · 2026-03-21]
Aviv: ## Issues
1. Spawn bar - it should be a configurable. Start at 6 seconds per spawn.
3. Let's actually switch the main mode to be level (levels vs the enemy).
Enemy pieces spawn/appear from the top. Every level has a different
configuration (should be config driven), where there are different
starting pieces and spawning pieces. this will allow us to control
difficulty, i.e. through "waves", etc. the enemy king may not even
be visible initially.
4. board layouts should also be config-driven and there should be room
to add new layout elements in the future (in addition to rocks,
bushes, etc.). for now we can use procedural generation, i.e. levels
will be generated with random placement of rocks and bushes, with a
limit of 3 rocks and 3 bushes per level (but the number is randomized)
9. campaign progress only when winning, otherwise stuck at that level.
10. Localstorage for now
31. See procedural generation of levels. we should continue to generate
harder and harder levels and enemy configurations - more waves, more
difficulty pieces, etc. As the enemy becomes really hard (bigger
waves, more challenging pieces), the player will start to get
abilities (bombs, freezes, piece rerolls, etc.)
Clarifications
5. Yes, AI gets reinforcements but not the same as the player. i.e. this
is configurable and we will use it to escalate difficulty over the
course of the campaign (faster reinforcements, more reinforcements, etc.)
15. pieces move to where the player dragged them, i.e. they stop when
they get there. BTW pawns can move horizontally as well (but only
capture diagonally)
16. any piece can have a queued premove.
22. no lives during onboarding. it starts once player starts the campaign.
by the way, we should also add on-rails tutorial step in the main
menu after the onboarding is done, to instruct the player to start
the first level in the campaign.
36. pawns can move back
42. AI should still try to move in bushes but it only remembers the
player's last position, so it will try to target that.
Nice to specify
17. The last tutorial step is a "real" match with the AI (but AI always
loses). before that, the enemy pieces are frozen.
26. When king is captured - show checkmate animation and then go to
match summary. @References/BattleScreen/MatchResults.webp
28. We should be relatively lenient with the drag. We need to "preview"
the move (player sees where it will go with a ghost piece) and
then when they drop it moves there. When the player "picks up" the
piece, they see a little green dot on viable squares for movement.
38. Premove gets cancelled if the piece can't move there anymore.
Claude: [shifts game model from "12 curated battles" to level-based
wave defense with procedural generation; rewrites spawn, terrain, AI,
tutorial, onboarding sections of GameDesignDoc.md]
Aviv: RE: matchmaking - remember we pivoted to a level system. in this
system there is no matchmaking at all. However, we WILL add matchmaking
as a feature (PvP) at a certain point (e.g. unlocked at level 10). At
that point, we'll use the matchmaking screen and logic. We'll get to
that later.
RE: match victory, let's use this reference: @References/RoyalMatch_Victory.jpg
Stage 2 — Commission a subagent GDD review, then make the calls. Aviv specifies the review axes, then walks the returned 6-axis scorecard top to bottom — accepting, modifying, or rejecting each finding and dispatching follow-up subagents on the ones that need design specs.
[Stage 2 — Review + decisions · ChessArenaTT_Phaser.json msgs 112, 116, 117]
Aviv: Okay, spin up another subagent to review the plan for anything
confusing and not clear. The agent should examine:
- Is game design clear?
- Are configurations clear for everything? are they separated from code
such that we can change configs without playing around with code?
I.e. the tutorial system should also be config-driven
- Is the tutorial clear in teaching how to play the game? Does it
adhere to mobile best practices?
- Are we missing any critical features to engage players in the first
day and get D1 retention metrics? Compare to the top mobile games
like royal match, clash royale, tennis clash. we are also taking
inspiration from words with friends, yahtzee with buddies, match masters.
- Is the UI and interface clear? Will it be usable and delightful?
- Are we clear on art style and how art will be used across the project?
Claude (review subagent output, verbatim):
GDD Review Results — 6-Axis Scorecard
1. Game Design Clarity 3.5/5 — movement speed, sliding capture rules,
lose condition missing
2. Config-Driven Architecture 4/5 — tutorial and AI aren't config-driven,
schemas undocumented
3. Tutorial & Onboarding 3/5 — only teaches basics; premove, bushes,
sliding need interactive steps
4. D1 Retention 2/5 — BIGGEST GAP. No daily rewards, chests,
tasks; coins have no purpose
5. UI & UX Clarity 3.5/5 — needs wireframes, screen-fit math
6. Art Style & Assets 3.5/5 — no asset manifest, placeholder strategy,
or terrain art spec
Top 5 priorities:
1. Add Daily Rewards + Victory Chests (D1 retention — critical)
2. Specify Movement Speed + Sliding Capture + Lose Condition (blocking)
3. Expand Tutorial Across 3 Levels + Interactive Steps
4. Make Tutorial Config-Driven + Document Config Schemas
5. XP Levels Should Unlock Things (premove at L2, reroll at L3, cosmetic
at L5, instant-spawn at L8)
Notable gaps: No timeout/stalemate/draw condition. Knight L-shape on
4-wide board is severely constrained. Pawn-diagonal-capture + bush
stealth interaction undefined. 6 ability slots won't fit small phones.
v1 gameplay placeholder strategy not defined.
Which of these do you want me to address in the doc now?
Aviv: 1. Okay, let's add daily tasks and victory chests with a timer.
Spin up subagents to research and design the features specs.
2. We should make solid assumptions initially and make it configurable.
3. You're right. The tutorial should make sure you interact with each
of those things. Also, we should require specific interaction during
the tutorial. I.e. the element the user needs to interact with is
highlighted, and we show them the action we want them to take. When
showing bushes, we can spawn the player inside a bush and tell them to
"wait". The enemy king will pass and when they do, the match stops,
and we say "they fell for it!". then the player is instructed (via
the highlight, game still paused) to capture the enemy king. Also,
there needs to be a hidden "state" in the bushes, i.e. the piece is
slightly transparent.
Last level of the tutorial is a real "match" against the computer.
computer can speak some banter too.
if we add new systems (daily quests, chests) those should be shown in
the on-rails onboarding too. I.e. when the player wins their first
match, we instruct them to collect a chest, etc. (we give them one, too)
4. Okay, add it.
5. Premove should be unlocked by default. As for the other stuff, agreed,
let's unlock them. but don't design new systems just for the rewards,
we should only use systems we already have and unlock those (i.e. daily
quests, chest, etc.)
Other findings
- yes, you can capture a hidden piece but pawns can't move diagonally
into a bush so they can't
- good point, let's add a timer to matches. start at 30 seconds. player
loses unless they checkmates the enemy king. this way player needs
to be on the offensive and AI can play defensive in some levels
- don't worry about knight movement and auto placement randomness
- create wireframe for battle hud using the reference I provided (clash royale)
- we don't need 6 ability slots. use clash royale reference.
- lock to portrait orientation
- agree on result popup
- add an asset manifest
- add terrain sprite manifest (we'll probably use SVG for all of these
atm. we should still maintain a manifest so we can replace later
with sprites if we want)
- for placeholder strategy - we should have a loading screen before
the match to avoid this.
What's in this transcript? Aviv runs the homeschool-research loop end-to-end:
Designs a 2026 homeschool-preferences survey and paid Reddit recruitment under a $500 budget.
Rewrites Claude's first pass to strip startup framing, kill leading questions, and replace intent-to-use with pain-point scales.
Wires SurveyMonkey completion to a redirect-page Reddit Pixel so campaigns optimize on completers, not clickers.
Directs segmentation across stress, ESA usage, curriculum mentions, and motivation to surface a starting ICP.
Prioritizes interviewees by fit and writes an ICP-differentiated 20-minute agenda with a referral incentive.
Stage 1 — Survey + Reddit-ad design. Aviv rejects the AI's first pass on framing, screener placement, question phrasing, and a "would you use this" question, and dictates the replacements.
[Session: Edtech.json · 2026-01-26]
Aviv: One of my todo items is to run a social media ad to recruit survey
participants. Help me with this:
1. Research the best way to recruit — what ad format (carousel?
text-only?). I was thinking keep it simple and make it feel like an
organic part of the community. Do we need incentives and how much?
2. Define targeting — which subreddits, how many people, budget
under $500.
3. Design the ad copy (we can test a few).
4. Design the survey. It should answer questions about our college
admissions product, project-based learning, challenges with
assessment of PBL — and take advantage of the opportunity to
understand homeschoolers more broadly and discover pain points.
Consider a screener; focusing on progressive homeschoolers (not
religious); social challenges.
Claude: [generates HOMESCHOOL-SURVEY-CAMPAIGN.md — full campaign plan
with ad variants, screeners, incentive research citing PMC study]
Aviv: Feedback:
- Redditors hate entrepreneurs and AI startups. We should absolutely
NOT disclose what we're using this for. Did you research what
actually works for survey recruitment? It's a specific niche. I was
thinking simpler like "Take the 2026 homeschool preferences survey
— get $5". Or frame it as a personality test: "What kind of
homeschooler are you?"
- Running it on SurveyMonkey (annual sub). No screener on grade level
— segment later.
- Don't ask questions where robust public data exists unless we need
it for our own analysis.
- Phrase questions colloquially; many parents won't understand the
lingo — give examples.
- Never ask "would you use this" — not predictive. Ask about pain
points: "is this a major pain point?" on a scale, and "how do you
solve it now?"
- Include 1-2 open-ended questions for richer data, not too many.
- For state: start with "does your state have ESA?" then ask state
— contextualizes so it doesn't feel nosy. Optional.
- Don't include a waitlist. We're masquerading as a survey/study,
not a startup.
Claude: [revises to neutral "2026 Homeschool Preferences Survey"
framing; drops startup mentions; reworks question set]
Aviv: After my conversation with [counselor], more hypotheses: many
homeschoolers feel pressure around high-school time to send kids back
to public school. Let's add conditional questions on high-school
transition pressure — only shown if grade = high school or graduated.
Aviv: I'm worried this question feels "planted". We're implying they're
missing out, which invokes that emotion and might be interpreted as a
leading question by a startup. Make it more objective.
Claude: [rewrites from "feel like you're missing out" to neutral "how
does your family think about these opportunities"]
Stage 2 — Instrumentation. Aviv asks how survey completion maps back to ad optimization and pushes off text-only ads toward image creatives with a generation prompt.
[Session: Edtech.json · 2026-01-26]
Aviv: How do I connect the survey to Reddit ads so we can optimize to
target for people who complete the survey?
Claude: Reddit Ads uses a Reddit Pixel for conversion tracking.
SurveyMonkey doesn't allow custom JavaScript on survey pages (on most
plans), so you can't embed the Pixel directly. The workaround is a
redirect-on-completion to a thank-you page on your own domain that
fires the Pixel — then optimize the campaign for that completion
event. [walks through full setup]
Aviv: We can't actually run text-only ads on Reddit — those just
lead to the post. Instead, let's use images. I'll need the generation
prompt. Guidelines: headline 50 chars or less, image 1200x628 [...]
Claude: [writes image-ad specs with generator prompts for three
variants]
Stage 3 — Analysis plan + ICP segmentation. Aviv specifies the segmentation dimensions, layers in ESA and curriculum-provider cuts, and orders an exploratory pass before the main analysis.
[Session: Edtech.json · 2026-02-01]
Aviv: Here is the data for the homeschool survey. Let's do some
analysis. First, plan out the analysis. I'm interested in identifying
the best-fit segments — those who experience the problem most
acutely, would benefit most from the solution, or are adopting methods
that fit our product. Cross-reference/segment the responses through
various dimensions — e.g. whether homeschoolers in microschools,
co-ops, pods experience greater stress around high-school,
transcription, documentation. We need the ideal "starting segment".
Ideally we come out with a specific audience definition. Separately
we're looking for validation of our idea — do homeschoolers
actually struggle with assessment, AI usage, college readiness?
Claude: [proposes multi-dimensional segmentation plan]
Aviv: Also incorporate the ESA data. Directions:
1. target audience mostly has/uses ESAs — great, they have money
to spend
2. they don't use them or aren't aware — also interesting, we
can help them with that part
3. is there a state association with our target users?
And aggregate all curriculum providers people mentioned. Do any
appear more frequently? Are they project-based? Could we partner?
Lots of homeschool curriculum is not the best quality — that could
be an opportunity.
Aviv: Before we get started, do exploratory high-level descriptive
analysis. This helps understand the dataset and guide additional
questions. The report should start with this. Consider that (1) we'll
use this survey for a white paper once we have in-depth interviews,
and (2) we may re-run once we have more responses — we should
refine the analysis plan for re-runs.
Aviv: Help me understand the people who answered 3+4 on documentation
stress. What's their primary motivation for homeschooling? What's
their AI attitude — different from the non-stressed group? Group
1+2 and 3+4 into two groups and do a comparison to identify key
differences.
Stage 4 — ICP-differentiated interview protocol. Aviv prioritizes participants by fit and writes a 20-minute agenda differentiated for ICP vs. non-ICP parents, with a referral hook and a same-day domain-setup ask.
[Session: Edtech.json · 2026-02-01]
Aviv: Now let's prioritize the list of participants for interviews,
since interviews are expensive. Prioritize based on "fit".
Aviv: Plan an interview agenda for 20 minutes. Leave time for
follow-ups, not entirely booked. For non-ICP parents, ask more general
questions like "top pain points in homeschooling". Hypotheses to dig
into:
1. Too much curriculum, hard to organize in one place.
2. Hard to find good curriculum for some subjects — PBL, or
subjects where the parent has high requirements.
Ask if there are other parents they recommend we talk to (incentive
for successful referrals). Ask if they want to participate in further
studies. Write a simple email outreach. Use surveys@howyougotthere
.com — it doesn't exist yet; we have Cloudflare. What's the
fastest setup?
What's in this transcript? Aviv reshapes a vision-doc rewrite session:
Imposes measurability and privilege constraints that reframe Claude's vision draft.
Stress-tests eleven anti-theory candidates across two rounds without picking any.
Scraps the curriculum-scoped frame and dictates a universal anti-theory from scratch.
Lands the three-hit: impossible goal, anti-theory, 25-year pitch.
Extends the anti-theory with two alternate phrasings for the doc.
Stage 1 — Setting the bar. Aviv adds two design constraints — goals must be measurable like “ending underemployment,” and finding your calling is currently a privilege.
Aviv (msg 1479): we need to update our lofty vision for
@projects/homeschools/harmony/. Help me brainstorm, given our prior
revisions @knowledge/philosophy/lofty-vision.md
Claude: [pulls the current doc; proposes a rework anchored on
"personalization isn't picking a curriculum."]
Aviv (msg 1489): First, I think we need goals that are measurable and
impactful. Having an advisor in itself is not a goal, but changing
some outcome for the learner or society is. for example "ending
underemployment" was [advisor]'s goal. is there something similar
that we can research or find to target? Something impactful and
instantly understandable. Ideally there is a concept tied to it,
like "underemployment". Second, the long-term vision is pretty much
the same here [...] replacing the resume and matching you
automatically to jobs - helping you find your calling passively,
no applications. that's the endgame. RE: contrarian belief. I
believe that no one is personalizing enough. People are fooling
themselves [...] buying curricula that someone else recommended
isn't personalizing. [...] we need to be chasing real fulfillment
and allowing children to thrive by doing anything and expanding the
definition of school altogether. school will encompass work,
entrepreneurship, sports, etc.
Claude: [runs research on "calling" / "purpose gap"; lands on the
measurable 2/3-never-find-their-calling statistic as the impossible-
goal anchor.]
Aviv (msg 1523): Okay. I like that. We're helping people find their
vocational calling. Most people DONT. However, one thought here
that could impede us [...] what if most people don't find their
calling not because they can't find it, but because they
financially can't afford to search for it? [...] I guess what we're
saying is - we will reduce the cost of finding the right thing, or
make it more accessible, and by doing that, we will open up
opportunities for those who can't afford to search right now.
searching for passion is now reserved for the rich, but it
shouldn't be.
Claude: [reframes: the anti-theory hiding in plain sight is "finding
your calling is a privilege." Impossible goal locks to "every child
finds their calling — not just the ones who can afford to search."]
Stage 2 — Stress-testing Claude's anti-theory drafts. Aviv reads eleven candidates across two rounds and replies “these are all solid” — which is not acceptance.
Aviv (msg 1540): This is solid but I think the anti-theory might not
be punchy enough. It needs to hit people in the gut, like an
uncomfortable truth.
Claude (msg 1541, verbatim for the draft surface):
You're right. Let me think about this.
The current anti-theory is: "Personalization isn't picking a
curriculum. It's expanding what counts as learning." That's
explanatory, not confrontational.
Candidates — from most confrontational to least:
1. "Your child's curriculum doesn't matter."
2. "You're not personalizing. You're just shopping."
3. "Passion can't be assigned."
4. "The curriculum is the easy part. Knowing your child is the
hard part."
5. "School got smaller when you left — not bigger."
My instinct: #1 or #5 are the gut-punch versions. Which direction
pulls you?
Aviv (msg 1542): The anti-theory should flip an accepted truth on
its head. it doesn't make a claim about the individual.
Claude (msg 1543): [regenerates around the accepted truth
"curriculum is the most important decision." New candidates:
"Curriculum doesn't matter" / "No curriculum has ever helped
someone find their calling" / "What you learn matters less than
who you become" / "Education isn't preparation. It's discovery."
Recommends #1.]
Aviv (msg 1544): these are all solid
Stage 3 — Aviv scraps the frame and dictates the real anti-theory. Aviv walks the scope from curriculum to work and writes the universal version himself in one paragraph.
Aviv (msg 1546, the three-hit landing, in full):
We strayed too far. I think it's simpler than this and it is tied
to our impossible goal. Here's the thing. Most people don't like
work. Most people assume that work isn't and cannot be fulfilling
and engaging. Most people don't even believe work is necessary.
My anti-theory is - work can be fulfilling for everyone. Work
should be fulfilling. Most people know this to be true and want
this to be true, but they've given up. they were sold on a system
where work is a means to an end. Where work is all toil and no
fun. Where work is a compromise, a necessity. We're saying - it's
not. And our goal (accordingly) is to prove to everyone it's not
by helping everyone find fulfilling work.
Claude (msg 1547): That's it. And it's much stronger because it's
universal — not about curriculum, not about homeschooling, not
about education methodology. It's about the deepest assumption
most people carry:
> "Work isn't supposed to be fulfilling."
Your anti-theory flips it:
> "Work should be fulfilling — for everyone."
Simple. Almost naive-sounding. That's what makes it powerful —
everyone wants it to be true but has been trained to believe it's
unrealistic. [...] Let me update the vision doc.
Claude: [writes V3 of knowledge/philosophy/lofty-vision.md. Three-hit
reads: Impossible Goal — "Every child finds their calling, not
just the ones who can afford to search" (1 in 3 baseline).
Anti-Theory — "Work should be fulfilling — for everyone."
25-Year Pitch — "Learning journey is your resume, opportunities
find you."]
Aviv (msg 1550): Good. Let's add another phrasing to the anti-theory,
which is: fulfilled employees -> compliant employees. And also
"Work doesn't have to be [insert negative side of coin]"
Idea evaluationRubric designCompetitive researchTraction analysisSubagent orchestrationVision-fit filtering
What's in this transcript? Aviv runs the Breakout idea-evaluation loop:
Defines the idea-evaluation rubric — execution difficulty as advantage, saturation as disqualifier, vision-fit as gate.
Scores three candidate ideas against the rubric; PBL wins 20/21.
Commissions competitive research on PBL platforms (Recess, Prism, Prisma, Sora).
Reads competitor traction to separate demand-signal from market-exists-signal.
Kills the B2B-to-service-providers angle after it scores low on vision fit, then dispatches 4 parallel subagents on high-stakes parent decisions.
Stage 1 — Aviv defines the evaluation rubric. Aviv writes the criteria himself before any idea gets scored, anchoring on unfair-advantage categories and user empathy.
[claude-code · edtech session · 2026-03-19]
Aviv: Let's talk about our criteria so that we can evaluate ideas better.
firts off, execution difficulty isn't a drawback, it's an advantage.
We want to build something other people will have a hard time
replicating, either because the tech is hard to implement (less
likely here) or becuase distirbution takes time to build, or because
we built strong distribution partnerships, or because we understood
the user in a deeper way, or because we have a lofty vision, or
because we gained access to special/unqiue data.
Second, I'm cautious of things that are saturated. for example,
there are a bunch of portfolio tools out there. it's also easy to
create a portfolio with AI today. so, what service are we really
providing there? the creation of a portoflio is not a differentiated
product. we need to be asking ourselves, with user empathy - what
are these homeschoolers trying to achieve, what are they using now
to achieve it (if anything), and how do we help them achieve it.
Other criteria for our ideas:
- needs to adhere to our lofty vision.
- check off boxes in @"knowledge/Tom_Counterpartos/7 Categories of
Unfair Advantage.md".
Claude: [builds evaluation framework; scores PBL platform 20/21, passive
portfolio lower, college-guidance lower; writes
idea-evaluation-framework-2026-03-19.md]
Claude: PBL wins on every criterion — execution difficulty filters the
right way. Matching kids + facilitating collaboration + outcome
data is hard to replicate. An AI that organizes photos is not.
Stage 2 — Competitive research on the space. Aviv names a starting competitor and pushes back when Claude over-distinguishes "PBL-first" vs. "social-first" positioning.
Aviv: let's do some competitive research for the PBL platofrm idea.
one product I'm aware of is: https://prism.guide/
Claude: [competitive landscape across Recess.gg, Prism.guide, Prisma,
Sora Schools — pricing $10-$12,490/yr, funding, traction,
feature overlap; writes
pbl-platform-competitive-landscape-2026-03-19.md]
Aviv: I don't get the distinction between pbl first and social first.
seems like they're just a better positioning for our idea
Claude: You're right to challenge that. Recess IS close to what we're
describing. [concedes the distinction is positioning, not
structural; lists 4 honest gaps that might matter]
Stage 3 — Reading traction to separate demand-signal from market-exists-signal. Aviv reframes weak competitor traction as a market-exists signal rather than a PMF signal, and asks how Harmony would position differently.
Aviv: First, I think it's important to acknowledge that these startups
don't have big traction. Recess got some funding a few years ago
but haven't demonstrated significant traction so far. It seems to
me that they pivoted from an AI tutor to this idea recently, no?
Can you look that up? ... they have SOME users, it doesn't seem
to be meaningful scale. They will probably failt o raise another
round at this rate, so we can count them "dead". the greater
concern is if their startup shows that there is no market for
this. at the same time, it validates that there is some demand
for this (can see parent reactions) - but the question is, are we
positioned to compete with them in some way? How would we
position differently?
Claude: [pulls PitchBook — Recess founded 2024, 3 employees, seed from
Avalanche VC + Bloomberg Beta + Stand Together, running FB ads,
Prism is solo/tiny pre-funding]
Claude: Neither has meaningful traction. Both are 1-3 person teams at
seed or pre-seed. They validate interest but not product-market
fit.
Stage 4 — Pressure-testing the framework, then killing a direction. Aviv tightens the "matching" criterion to require outcome stakes, then runs the B2B-service-provider angle through the rubric and kills it on vision-fit.
Aviv: It depends on the value of the matching. "better" matching
doesn't matter, unless it leads to better outcomes of some sort.
for adults, amtching to a job is good because and if it means
easier interviews/higher chance of getting the job. same for the
employer. matching in dating is good becuase it's such an
important decision and there are so many people to choose from.
Claude: [concedes matching isn't high-stakes for PBL peer groups, is
higher-stakes for curriculum selection and credential-to-
opportunity matching]
Aviv: Let's add the LMS idea. Now let's evaluate another direction of
brainstorming. what if instead of serivcing homeschool parents
directly, we serviced some service provider to homeschools as a
way in? [...] can we provide a serice to them that uses AI to
streamline their work? What are such professions at present?
Claude: [researches homeschool evaluators, umbrella schools, ESA
administrators, compliance services, transcript services,
co-op administrators — pricing, scale, AI-opportunity]
Claude: B2B here is a distribution strategy, not a product strategy.
None of these ideas embody the vision. They score 11-14/21 on
the framework vs. PBL's 20/21. The useful takeaway is: partner
with umbrella schools and co-ops for distribution.
Aviv: I think curriculum choices can add up. Wrong choices are costly.
They cost you money AND time, and eventually potentially your
child's education if you really make bad decisions. Other
decisions tha tmatter a lot - taking your kid out of school in
the first place. Spin up a team of agents to explore this more
deeply and identify possible paths forward.
Claude: [launches 4 parallel sub-agents: curriculum decision costs,
transition-to-homeschool decision, full high-stakes decision
inventory, where matching actually creates value]
Directs creative generation off a reference mood-board, with per-image direction.
Builds a Facebook-simulation Elo-tournament web app to rank the 29 generated ads.
Launches Campaign 02 through his Meta Marketing API wrapper, extending the CLI when it lacks the right event type.
Prods the AI through post-run analysis across Meta + Supabase.
Stage 0 — Defining the campaign. Aviv kicks off a campaign rethink, dispatches parallel subagent evaluations, then recaps the decisions into a brief.
Aviv (msg 4764): we pivoted @projects/homeschools/harmony. now, we need
to review our meta ad campaign and see if anything needs revisions. Take
a look at it and evaluate: 1. are the images still good or need revisions?
2. does the text need revisions? 3. research whether we should use the
existing campaign with adjustments (it already has learning data) or a
new campaign entirely. What are the considerations.
## spin up subagents per task.
## also evaluate anything else you think is important.
## Don't make any changes to the campaign yet.
Claude: [spins up 4 parallel subagents: (a) image evaluation against
new affiliate flow, (b) ad-copy survivability, (c) keep-vs-new-campaign
research, (d) campaign-structure audit (objective, audience, budget,
optimization event). Synthesizes into a consolidated recommendation.]
Aviv (msg 4788): Lots of good insights here. Recapping my decisions:
1. Let's do a new campaign.
2. RE: the ads, I agree we should consolidate. Take a look at the
performance data of the ads and use that to decide which of the ads
to kill. Let's keep 5 ads, ideally top-performing and don't require
large changes to the image.
3. Add your idea to send an e-mail if people don't sign up to the
harmony backlog.
4. implement the text pivots you mentioned for ad copy.
5. RE: mid-funnel event, we are already tracking events in supabase, so
we'll have that data internally to track where people fell off. we
also have google analytics.
6. We'll run the campaign at $100/day but we'll pause after $50 if no
meaningful results. Given the high CTR we saw in the first campaign,
I'm hoping that the improved landing page with free value will do
better.
7. I also want to AB-test sending people straight to the questionnaire
and skipping the landing page home screen entirely. Let's add that
to our backlog and remember to set up the ad campaign this way (I
think we can just send them to the different URL to do this).
8. No need to tweak the audience.
## Am I missing anything?
Stage 1 — Creative generation from a curated reference board. Aviv assembles the reference folder himself and annotates each image with how much to reuse.
Aviv (msg 1446): okay, I added a bunch of references in
@projects/homeschools/harmony/ad-images/references\ . Review them and
let's brainstorm some ideas together. I also want to use some of these
for visual style or just use the image and add some text on top.
# [Image #1] good just for the checklist concept (replace the content,
reduce the number of items to be readable at a glance).
# [Image #2] awesome as-is. Keep the concept, replace the URL. Text
"there's no app for this" is great but need to communicate our own
value prop. Maybe "Build their dream curriculum with Harmony" or
something about finding kids' passions.
# [Image #3] like this visual style. Use as visual style (still need
text overlay).
# [Image #4] almost good as-is. Add our branding. Maybe add something
curriculum related to the post-its.
# [Image #5] like this format. Change to "7 ways to stop worrying
about curriculum".
# [Image #6] great. Change "mess" to "curriculum". Like the "Reset".
...
# [Image #10] Everyone loves her! Maybe "cool girls use Harmony".
Claude: [iterates across ~29 generated variants per Aviv's per-image
direction; organizes into review/ and keep/ subfolders]
Stage 2 — Build the ranking tool instead of eyeballing. Aviv spec's a Facebook-simulation web app to rank the 29 generated ads via Elo.
Aviv (msg 1558): okay now to test these, I'd like us to "simulate" them,
the way they would appear on facebook. so I want you to simulate them for
me and create a little web app (simple) where I'm shown two ads at a time
and pick my favorite. as I pick, you aggregate the ratings into a ranking
of all the ads (which ones won out).
@"Screenshot 2026-03-27 140251.png"
Claude (msg 1561): [builds local Elo-ranking web app at localhost:8089]
- Shows two ads side-by-side in Facebook-feed format
- Click preferred → Elo scoring updates relative rankings
- 58 matchups across 29 ads (vs 406 possible pairs)
- Live leaderboard + Skip button
Aviv (msg 1621): Okay, keep the top 15 as finalists and sideline the
others in a "needs work" folder:
Ad Rankings
# Ad W-L Score
1 No App For This 6-0 1585
2 Corkboard Post-its 4-0 1561
3 Curriculum Reset 5-1 1559
4 Convention Meme 3-0 1548
5 7 Ways to Stop Worrying 4-1 1540
...
15 The Receipt 1-1 1496
[others sidelined to needs-work/]
Stage 3 — Launch Campaign 02 through the Meta Marketing API wrapper. Directs Claude to launch via the meta-ads-api skill. Hits two CLI limitations and corrects them — extending the CLI in-place, then writing the remaining ads via direct API calls.
Aviv (msg 4806): 1. You can edit the text overlays with the nano banana
mcp. 2. custom conversion is verified. 3. note the CTA change in our ad
campaign brief. 4. once we're good to go, you can set up the new
campaign directly using our skill @.cursor/skills/meta-ads-api\
Claude: [nanobanana MCP edits text overlays on 4 images → uploads 5
image hashes via upload_creatives.py → dry-runs create_campaign.py]
Claude (msg 4818-4820): Dry run looks good. $100/day budget,
homeschool_parents targeting, OUTCOME_SALES objective, SIGN_UP CTA,
Facebook Feed + Story + Reels + Marketplace + Search placements.
... The script hardcodes `LEAD` as the custom event type. For
OUTCOME_SALES we need `COMPLETE_REGISTRATION`. Let me check and fix.
Claude (msg 4821-4825): [confirms meta_client supports custom_event_type
but create_campaign.py doesn't expose it; adds --custom-event-type CLI
arg, threads it through run_create() and create_adset(), updates the
dry-run output path]
Claude (msg 4827-4828): Dry run looks correct — COMPLETE_REGISTRATION
event, pixel attached.
$ python scripts/meta_ads/create_campaign.py \
--name "Harmony Affiliate Pivot - Campaign 02" \
--objective OUTCOME_SALES \
--daily-budget 10000 \
--targeting-preset homeschool_parents \
--image-hashes "e-switch-convention-meme-v1,\
a-research-we-read-reviews-v2-affiliate,\
a-research-receipt-v3-affiliate,\
no-app-for-this-harmony-v2-affiliate,\
dancer-girl-curriculum-v2-affiliate" \
--primary-text "..." \
--headlines "Before you spend $300 on the wrong curriculum again." \
--link "https://www.howyougotthere.com" \
--cta SIGN_UP \
--custom-event-type COMPLETE_REGISTRATION
Created campaign: 6971719889352
Created ad set: 6971719900352
Created creative 1: ...
Created ad 1: ...
[Campaign 02 live — PAUSED]
Aviv (msg 4830): Not sure I follow why I need to add [the other 4 ads]
manually? We didn't have this issue in the past.
Claude (msg 4831-4834): You're right — Campaign 01 had 17 ads with
unique copy each. The CLI is the limitation, not the API. Let me add
the remaining 4 ads directly via the client. [writes the 4 ads via
meta_client, each with its own primary_text from the brief]
Aviv (msg 4836): you made an error. you used the howyougotthere parents
pixel instead of the harmony landing page pixel.
Claude (msg 4837-4839): [confirms wrong pixel baked into promoted_object;
Meta doesn't allow updating promoted_object on existing ad set → creates
new ad set with correct pixel 924580337025807, reassigns ads]
Stage 4 — AI-assisted analysis with Aviv prodding to verify. Aviv pauses the campaign and directs the analysis himself, specifying what to compare, triangulate, and filter.
Aviv (msg 4880): so I ran the campaign for the harmony pivot for a short
time. Results seem really promising. I paused it so we can take a look
at the data. I'd like you to pull the data using the meta ads API and
run an analysis. In addition, we have rich data in the supabase database
as well that I want to dig into.
## Meta
# Compare our CTR, CPI, etc. to our metrics from the last ad campaign
(should be saved in this repo).
# Understand drop-off patterns.
## Supabase
# Look at the funnel and triangulate with meta — can we see where people
drop off? Build a funnel with % and drop-off %.
# Analysis of behavior after signup — interactions? most engaged user?
Did people click the buy links at all?
# Note — some events in the database may belong to prior test users so
we should filter by authenticated user ids (I deleted all the test
users before the ad campaign went live).