Phase 3+4 Completion Guide — Squish v9.0.0¶
Overview¶
Phase 3+4 requires hardware validation and community publication. This guide covers all remaining tasks.
Phase 3: Hardware Validation (Requires M-series Mac)¶
3.1 End-to-End Performance Benchmark (bench_eoe.py)¶
Goal: Measure real TTFT (time to first token) and throughput on live Squish server
Files:
- Script: dev/benchmarks/bench_eoe.py (ready to run)
- Output: results/eoe_${DATE}.json
- Destination: README (cold load metrics) + paper.md Section 4.1
Steps:
-
Start the Squish server:
-
Run the benchmark:
-
Expected output:
───────────────────────────────────────────────────────── Squish (squish) ───────────────────────────────────────────────────────── run 01 234.5 tok/s TTFT 47ms 256 toks run 02 245.2 tok/s TTFT 42ms 256 toks ... mean TTFT 43 ms ± 4 ms mean throughput 240.1 tok/s ± 5.8 best TTFT 42 ms peak throughput 245.2 tok/s -
Optional: Compare vs Ollama
-
Update documentation:
- Extract
mean TTFT,mean throughput,peak throughputfrom JSON - Update README.md Section 2.1 (cold start metrics)
- Update docs/paper.md Section 4.1 "Inference Performance"
- Commit:
git add results/eoe_*.json README.md docs/paper.md && git commit -m "benchmark: add real hardware TTFT/tok-s from bench_eoe.py"
3.2 MMLU Evaluation (lm-evaluation-harness)¶
Goal: Validate accuracy on MMLU (14,042 questions) using INT8-quantized weights
Files:
- Script: lm_eval CLI (install: pip install lm-eval)
- Output: accuracy metrics per subject and overall
- Destination: docs/RESULTS.md + paper.md Section 4.2
Steps:
-
Install lm-evaluation-harness:
-
Run MMLU evaluation:
(This may take 30–60 minutes on M-series hardware)
-
Expected output format:
-
Update documentation:
- Extract accuracy and per-subject scores
- Format as table in docs/RESULTS.md
- Add row to paper.md Section 4.2 "Accuracy Validation"
- Commit:
git add results/mmlu_*.json docs/RESULTS.md docs/paper.md && git commit -m "benchmark: add MMLU eval results (14042 questions, INT8)"
Phase 4: Community & Publication (Can Start Now)¶
4.1 Publish Pre-Squished Weights to Hugging Face¶
Goal: Publish ZIP with squish_weights.safetensors so users skip the one-time conversion
Prerequisites:
1. HuggingFace account (free, huggingface.co)
2. Write-access API token from https://huggingface.co/settings/tokens
3. Squish model already pre-compressed: ~/.cache/squish/Qwen2.5-1.5B-Instruct/squish_weights.safetensors (from squish pull)
Steps:
- Create HF token:
- Visit https://huggingface.co/settings/tokens
- Click "New token" → enter name "squish-publish" → select "Write" access → Create
-
Copy the token
-
Set environment variable:
-
Run publish script (example for Qwen2.5-1.5B):
-
Expected output:
Repository : squish-community/Qwen2.5-1.5B-Instruct-Int8 Model dir : ~/.cache/squish/Qwen2.5-1.5B-Instruct-Int8 Files : 4 Total size : 1.2 GB squish_weights.safetensors 1.1 GB tokenizer.json 2.3 MB ... Repo ready: https://huggingface.co/squish-community/Qwen2.5-1.5B-Instruct-Int8 Committed : https://huggingface.co/squish-community/.../commit/... Model page : https://huggingface.co/squish-community/Qwen2.5-1.5B-Instruct-Int8 Pull with : squish pull squish-community/Qwen2.5-1.5B-Instruct-Int8 -
Repeat for each pre-squished model (if multiple):
- Qwen2.5-1.5B-Instruct
- Llama-3.2-3B
-
Other key models
-
Document in README – add HF hub models section: Update README.md with links to published models
4.2 Create GitHub Release¶
✅ Already completed! (v9.0.0 released on 2026-03-12)
Release notes: https://github.com/wesleyscholl/squish/releases/tag/v9.0.0
4.3 Community Posts¶
Files: dev/community_posts.md (templates ready)
Platforms & Timing:
| Platform | When | Template | Est. Traffic |
|---|---|---|---|
| Hacker News | Day 1, 9–10 AM PT (Tue–Thu) | community_posts.md → HN section |
500–2K upvotes = 10K–50K views |
| Reddit r/LocalLLaMA | Day 1, 9–10 AM PT (same time as HN) | community_posts.md → Reddit section |
100–500 upvotes = 2K–10K views |
| Twitter/X | Day 1, staggered 2–3h after HN | community_posts.md → Tweet threads |
5K–50K impressions |
| Day 1–2, midday (professional hours) | community_posts.md → LinkedIn section |
500–5K impressions |
HN Submission:
1. Go to https://news.ycombinator.com/submit
2. Paste: https://github.com/wesleyscholl/squish/releases/tag/v9.0.0
3. Title: Squish – 54× faster local LLM cold-start on Apple Silicon (222 modules)
4. Submit
5. Monitor comment threads; respond to technical questions
Reddit Submission:
1. Go to https://reddit.com/r/LocalLLaMA
2. Click "Create Post"
3. Use template from community_posts.md → Reddit section
4. Engage in comments
Twitter/X Threads:
1. Post as thread (not individual tweets)
2. Space replies 30 min apart
3. Use templates from community_posts.md → Twitter section
4. Tag relevant accounts: @openai, @AnthropicAI, @ollama_ai, key ML researchers
4.4 arXiv Submission¶
Goal: Submit formal paper to arXiv.org
Files: - Source (Markdown): docs/paper.md (ready, just needs real numbers) - Output format: PDF (via LaTeX → pdflatex)
Steps:
- Fill in real numbers (from Phase 3):
- Section 4.1: TTFT from bench_eoe.py
- Section 4.2: Accuracy from MMLU eval
-
Any revised benchmark tables
-
Convert Markdown → LaTeX (use Pandoc):
-
Create LaTeX preamble (
docs/preamble.tex): (See template below) -
Compile to PDF:
-
Upload to arXiv:
- Visit https://arxiv.org/submit
- Create account (free)
- Upload
squish_paper.pdf - Fill metadata:
- Title: Squish: Sub-Second Model Loading and Modular Inference Optimisation for Apple Silicon
- Authors: Wesley Scholl (Independent Research)
- Abstract: (from docs/paper.md intro)
- Categories: cs.AI, cs.LG, cs.DC
- Submit
-
arXiv will assign ID (e.g.,
2403.12345) within 24h -
Update docs and README:
LaTeX Preamble Template (docs/preamble.tex)¶
\documentclass[11pt]{article}
\usepackage{amsmath,amssymb,amsfonts}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{siunitx}
\usepackage[margin=1in]{geometry}
\lstset{
language=python,
basicstyle=\ttfamily\small,
keywordstyle=\color{blue},
commentstyle=\color{gray},
stringstyle=\color{red},
breaklines=true,
breakatwhitespace=true,
showstringspaces=false,
}
\title{Squish: Sub-Second Model Loading and Modular Inference Optimisation for Apple Silicon}
\author{Wesley Scholl \\ Independent Research}
\date{\today}
\begin{document}
\maketitle
Checkoff Checklist¶
Phase 3: Hardware Validation¶
- [ ] Run bench_eoe.py (5+ runs), collect TTFT/tok-s metrics
- [ ] Update README.md with real cold-load times
- [ ] Run MMLU eval (14042 questions), collect accuracy
- [ ] Update docs/RESULTS.md with accuracy table
- [ ] Update paper.md Sections 4.1–4.2 with real numbers
- [ ] Commit:
git commit -m "benchmark: Phase 3 hardware validation complete"
Phase 4: Community & Publication¶
- [ ] Publish 1–3 pre-squished models to HuggingFace Hub
- [ ] GitHub release v9.0.0 ✅ (done)
- [ ] Post to Hacker News (monitor for 48h)
- [ ] Post to r/LocalLLaMA (engage in comments)
- [ ] Post tweet threads on Twitter/X (stagger over 3h)
- [ ] Post on LinkedIn (optional, for professional reach)
- [ ] Convert docs/paper.md → LaTeX via Pandoc
- [ ] Submit to arXiv.org
- [ ] Update README with arXiv link
- [ ] Commit:
git commit -m "papers: submit to arXiv [XXXX.XXXXX], announce v9.0.0"
Expected Timeline¶
| Task | Duration | Start | End |
|---|---|---|---|
| bench_eoe.py | 15 min (5 runs) | Day 1 | Day 1 |
| MMLU eval | 45 min – 1h | Day 1 | Day 1–2 |
| Publish HF weights | 10 min per model | Day 1–2 | Day 2 |
| HN + Reddit + Twitter | 30 min posting + 24h+ monitoring | Day 1 | Day 2+ |
| arXiv conversion + submission | 30 min | Day 2 | Day 2 |
| Total | 2–3 hours active work | Day 1 | Day 2 |
Support & Troubleshooting¶
bench_eoe.py not connecting?
- Verify server is running: curl http://localhost:11435/health
- Check port matches --squish-port flag
- Increase timeout: --timeout 120
MMLU eval slow?
- Reduce --limit 100 for quick test
- Monitor GPU with powermetrics (macOS):
HF token issues?
- Verify token has "write" permission (not "read" only)
- Test: huggingface-cli login (interactive)
arXiv submission rejected? - PDF must be <100 MB (compress images if needed) - Title + abstract must be <10KB each - See arXiv submission guidelines: https://arxiv.org/help/submit
Key Contacts & Links¶
- GitHub Issues: https://github.com/wesleyscholl/squish/issues
- HuggingFace Model Hub: https://huggingface.co/squish-community
- arXiv Submissions: https://arxiv.org/submit
- Hacker News: https://news.ycombinator.com/
- Reddit r/LocalLLaMA: https://reddit.com/r/LocalLLaMA
- Twitter/X: https://twitter.com/
Sample Results to Expect¶
bench_eoe.py (M2 Pro, Qwen2.5-1.5B)¶
MMLU (INT8 Squished)¶
HF Hub Upload¶
Enjoy the launch! 🚀