GitHub Trending Top 10: The agent skills ecosystem splits (Jun 8–15)

Last week's thesis was skills-as-software becoming a distribution primitive. This week the ecosystem starts differentiating internally. Four of the ten entries are still agent skill repos, but the problems they solve are diverging: Addy Osmani's agent-skills encodes engineering discipline; taste-skill injects design taste; pm-skills systematizes product management; graphify maps codebases into queryable graphs. Meanwhile, NVIDIA shipped the first security scanner built specifically for skill files, and Block's goose agent graduated to the Linux Foundation. The remaining four entries — apple/container, markitdown, supervision, and tolaria — span container runtimes, document conversion, computer vision, and knowledge management.

Two entries that appeared on the trending page this week (markitdown, supervision) are continuing momentum from prior weeks. They show enough new development to include: markitdown's Azure Content Understanding integration landed on May 26, and supervision's v0.29 release candidate appeared June 11.

Rankings are by weekly star gain for the June 8–15 window. Total star counts are as of June 15.

#1 · addyosmani/agent-skills — ~59,500 stars · +10,445 this week

What it solves: AI coding agents default to the shortest path, which means skipping specs, tests, security reviews, and anything else that slows down the first working version. addyosmani/agent-skills — by Addy Osmani (Google Chrome engineering lead, author of Learning JavaScript Design Patterns) — is a library of 24 SKILL.md files encoding the practices that distinguish production code from demo code. The README's stated premise: "Skills encode the workflows, quality gates, and best practices that senior engineers use when building software. 'Seems right' is never sufficient." 1

Stack and approach: Shell 77.9% + JavaScript 22.1%, MIT license. 24 SKILL.md files covering 23 lifecycle stages plus one meta-skill, 4 agent personas, 7 slash commands (/spec, /plan, /build, /test, /review, /code-simplify, /ship), and 4 reference checklists. Each skill embeds Google engineering concepts explicitly: Hyrum's Law (API surface management), the Beyonce Rule (test coverage), Chesterton's Fence (before removing code), trunk-based development, and shift-left CI. Each skill also includes an anti-rationalization table — a list of reasons an agent commonly uses to skip the skill, with counter-arguments for each. Compatible with Claude Code (recommended), Cursor, Gemini CLI, Codex, Windsurf, OpenCode, GitHub Copilot, and Kiro. v0.6.2 shipped June 11. 1

Differentiation: The key research finding comes from an independent O'Reilly/SkillsBench study: curated skills raised AI agent task completion by 16.2% on average across 84 tasks; model-written (self-generated) skills showed no consistent benefit across any tested configuration. 2 The same study found gains vary sharply by domain — healthcare tasks improved by ~52%, software engineering by only 4.5%. The repo has 33 contributors and 56 open issues, including a June 12 issue alleging phantom star activity (#256, currently open with no official response). 3

Verdict: ⭐ Star it if you use Claude Code or any of the eight supported platforms and care about code quality discipline. The anti-rationalization tables alone are worth reading — they document exactly how agents wriggle out of quality gates. Don't over-interpret the 16.2% headline; the domain breakdown matters more than the average.

#2 · apple/container — ~37,000 stars · +10,021 this week

What it solves: Docker Desktop runs all your containers inside one shared Linux VM on macOS. That means processes in different containers share kernel attack surface, and a container compromise can reach neighbors. apple/container runs each Linux container in its own lightweight VM using Apple's Virtualization.framework, isolating them at the hypervisor boundary. 4

Stack and approach: Swift, Apache 2.0, Apple silicon required. Uses macOS-native primitives throughout: Virtualization.framework for VM management, vmnet for networking, XPC for inter-process communication, launchd for lifecycle, Keychain for credential storage. The project reached 1.0.0 on June 9 — its first birthday. 5 The headline v1.0.0 feature is container machine: a persistent Linux development environment (OCI image–based) that auto-shares your macOS username and home directory, supports systemd, and behaves more like WSL for Mac than a traditional Docker replacement. 5 TOML configuration files replace the previous UserDefaults approach; container cp for file transfer and structured JSON/YAML/TOML output were also added. macOS 26 is the target; macOS 15 runs but lacks network isolation and multi-network support.

Differentiation: Docker Compose is not supported — Issue #66 requesting docker.sock exposure was closed as not planned. 6 A third-party container-compose project exists as a workaround. Memory ballooning is incomplete: memory freed by Linux processes doesn't return to macOS, so memory-intensive multi-container workloads may need occasional VM restarts. Twelve-plus new issues opened June 11–14 report DNS failures, ECR push 401 errors, and host-to-container network breakage. 7 Colima supports Docker Compose and custom VMs today and has signaled future support for the Containerization backend — teams dependent on Compose should stay on Colima or Docker Desktop for now. 8

Verdict: ⭐ Star it if you're on Apple silicon and the per-container VM isolation matters for your security posture. The container machine feature is the more interesting path — it's a better Mac-native Linux dev environment than a Docker Desktop replacement. Hold off on production container workflows until the 1.0.0 post-release bug reports stabilize.

#3 · Leonxlnx/taste-skill — ~43,700 stars · +7,591 this week

What it solves: AI-generated UIs look identical because every model trained on the same SaaS template pool. The output is Inter everywhere, muted palettes, hero + features + pricing layout, and cards nested three levels deep. taste-skill is an agent skill that teaches the AI why the defaults are wrong and gives it adjustable constraints to produce something more differentiated. 9

Stack and approach: Shell 100%, MIT license, installed via npx skills add. The default taste-skill (currently v2 experimental) works by reading a brief, inferring a design language, then adjusting three calibration knobs: VARIANCE (1–10, how far from convention), MOTION (1–10, animation intensity), and DENSITY (1–10, content density). The skill includes a strict em-dash prohibition (em dashes are a detectable AI writing tic that bleeds into UI copywriting), a GSAP animation skeleton, and a re-designed audit protocol. The repo ships ten specialized variants beyond the default: gpt-taste (strict GPT/Codex mode), image-to-code (screenshot → analysis → code), redesign (refactoring existing UIs), soft-skill, minimalist-skill, brutalist-skill, output-skill, stitch-skill (Google Stitch–compatible), plus brandkit. Three image generation skills for visual mock-ups round it out. 107 commits, 3,100 forks. 9

Differentiation: Taste-skill and impeccable (entry #6 last week) solve the same problem at different scopes. Taste-skill provides behavioral presets — dial settings that produce a specific aesthetic category. Impeccable is a design vocabulary that explains principles. Taste-skill's author describes the FAQ differentiation: "Multiple specialized variants, adjustable dials in key skills, anti-repetition rules informed by dedicated research." 9 The v2 rewrite is experimental and actively iterating (not yet stable). The author background is thin — no public professional profile, contact via [email protected] and X/@lexnlin.

taste-skill Floria example — dark editorial UI with botanical product photography and serif headlines, generated by taste-skill v2 — taste-skill Floria example: dark editorial layout with serif headlines and product photography — not the standard SaaS template 9

Verdict: ⭐ Star it if you build frontend UIs with AI agents. The variant collection is the main value — install the base skill and reach for redesign or image-to-code as the workflow calls for them. Use taste-skill-v1 for production if you need stability while v2 experimental settles.

#4 · microsoft/markitdown — ~153,000 stars · +6,280 this week

What it solves: Feeding documents to LLMs or RAG pipelines requires clean, structured text — not PDF binary blobs or DOCX XML. markitdown is a Python library that converts 25+ file formats (PDF, DOCX, PPTX, XLSX, HTML, CSV, JSON, XML, ZIP, EPUB, YouTube URLs, images, audio) to Markdown with a single convert() call, designed specifically for LLM ingestion rather than human-readable output. 10

Stack and approach: Python, MIT license, built by the Microsoft AutoGen team. Functions as a glue layer over mammoth (DOCX), pandas (tabular), python-pptx, pdfminer.six, and BeautifulSoup. v0.1.6 (May 26) added Azure Content Understanding integration — the only path to video conversion, and a higher-quality cloud option for audio transcription and structured field extraction (invoice numbers, contract terms as YAML front matter). An optional OpenAI client parameter enables GPT-4o vision descriptions of images found in documents. Currently 153K total stars (global rank ~#49), 10.6K forks, 309 commits. 11

Differentiation: The honest limitation: markitdown's PDF conversion ranks 11th out of 12 comparable tools on the OpenDataLoader Benchmark (total score 0.589/1.0), with a heading-level detection score of 0.000 — it cannot distinguish headings from body text in PDFs. 12 The default PDF backend (pdfminer.six) does text-stream extraction only, no layout analysis. Word, Excel, and PowerPoint conversions are genuinely solid. The LLBBL blog framing is accurate: "Pandoc is for publishing. MarkItDown is for feeding AI." 13 They aren't competing — Pandoc converts for human consumption (40+ output formats), markitdown converts for LLM consumption (Markdown only). 13

Verdict: ⭐ Star it for Office document conversion in LLM pipelines — DOCX, PPTX, and XLSX handling is reliable. Skip it as your primary PDF extraction tool; use pymupdf or marker-pdf instead. The 153K star count reflects the problem's universality more than the solution's completeness.

#5 · safishamsi/graphify — ~67,200 stars · +5,478 this week

What it solves: When an AI coding assistant doesn't understand how a codebase is structured, it reads files sequentially — burning context tokens re-parsing the same imports and relationships on every query. graphify runs /graphify . on any project folder and outputs a queryable knowledge graph (graph.html + GRAPH_REPORT.md + graph.json), so the agent navigates the codebase by querying nodes and edges rather than grepping files. 14

Stack and approach: Python (PyPI package graphifyy), 36 tree-sitter language grammars, NetworkX for graph construction, Leiden algorithm for community detection. Two-pass processing: Pass 1 is local AST extraction via tree-sitter (zero token cost), Pass 2 is AI semantic extraction applied only to non-code files like PDFs, images, and Markdown. Each graph edge carries a confidence label — EXTRACTED (1.0), INFERRED (0.7–0.9), or AMBIGUOUS (<0.7). Optional extensions add MCP server, Neo4j/FalkorDB export, Ollama local inference, and faster-whisper audio transcription. git hook integration auto-rebuilds the graph on commit. 14 YC S26 company (Graphify Labs), 91 contributors, 748 commits, 156 open issues. 15

Differentiation: A dev.to benchmark compared graphify against code-review-graph (CRG) on a large monorepo. Graphify needed 2 actual code lookups for targeted feature work; CRG needed 36 tool calls. CRG's advantage was speed (0.425s incremental updates vs. graphify's ~10s) and SQLite storage with semantic search. 16 The week's active issue queue shows real edge cases: Go cross-package calls (#1313), TypeScript workspace imports (#1308), PowerShell modules not indexed (#1315), injected field call edges missing (#1316). Community engagement is high but so is unresolved surface area. 15

Verdict: ⭐ Star it for large monorepos where AI agents repeatedly re-scan the same structural relationships. The tree-sitter pass is genuinely zero-cost and the Leiden community detection produces readable module clusters. Test incremental rebuild latency against your codebase size before committing — the ~10s rebuild may be acceptable on a post-commit hook, tight on a pre-save loop.

#6 · NVIDIA/SkillSpector — ~5,300 stars · +3,669 this week

What it solves: Agent skills are code that executes inside your AI agent's runtime. A malicious or poorly written skill can inject instructions, exfiltrate environment variables, escalate privileges, or poison the agent's memory. SkillSpector scans skill files — any SKILL.md, Python scripts, config YAMLs, or full skill repos — before you install them, scoring the risk on a 0–100 scale. 17

NVIDIA SkillSpector — security scanner for AI agent skills, showing risk scoring, key detection categories, and SARIF/CI integration — NVIDIA SkillSpector: 0–100 risk scoring across 64 patterns in 16 categories 17

Stack and approach: Python (97.3%) + YARA (2.2%), Apache 2.0. Two-stage detection: Stage 1 runs fast static analysis (regex + AST parsing) covering 64 vulnerability patterns across 16 risk categories including prompt injection, data exfiltration, privilege escalation, supply chain risks, excessive agency, memory poisoning, tool misuse, rogue agent behavior, and MCP tool poisoning. Stage 2 is optional LLM semantic analysis that raises precision to ~87%. 17 CVE lookups run against OSV.dev in real time (with offline fallback). Accepts git repos, URLs, zips, directories, or single files. Output formats: terminal, JSON, Markdown, SARIF — SARIF integrates directly with GitHub Code Scanning. 18 Based on Liu et al. (2026) research analyzing 42,447 skills: 26.1% contained at least one exploitable vulnerability; 5.2% showed malicious intent; skills containing executable scripts were 2.12× more likely to have vulnerabilities. 17

Differentiation: SkillSpector is the first tool targeting agent-specific risks that traditional SAST tools miss: hidden instructions, trigger abuse, and MCP tool poisoning don't appear in conventional vulnerability scanners. 19 The caveats are significant: 23 commits total, zero formal releases, no published CI/CD pipeline (Issue #58 just requested one from the community). 20 An open issue (#47) flags possible phantom star activity — treat the star trajectory as uncertain. No independent false-positive/false-negative rate has been published.

Verdict: ⭐ Star it to track the space and get early access to the SARIF integration — running skillspector scan before installing any community skill is low-friction audit hygiene. Don't gate production CI/CD on it yet: zero formal releases means the API surface can change without notice.

#7 · refactoringhq/tolaria — ~16,200 stars · +3,592 this week

What it solves: Notion and Obsidian store notes as proprietary formats or behind cloud sync dependencies. Tolaria is a desktop Markdown knowledge base where every note is a plain .md file, every vault is a git repository, and the whole thing runs offline — no account, no subscription, no export step. The AI integration is direct: a built-in MCP server lets Claude Code, Codex CLI, and Gemini CLI read and write vault notes directly. 21

Tolaria desktop app in light mode — showing file browser, block editor, and properties panel — Tolaria's three-panel layout: inbox/note list (left), block editor (center), note properties (right) 22

Stack and approach: Tauri 2 + React + TypeScript (70.3%) + Rust (13.7%), AGPL-3.0, by Luca Rossi (Refactoring.fm, 170K newsletter subscribers). Cross-platform: macOS (Intel + Apple Silicon), Windows x64, Linux x64. Install: brew install --cask tolaria. Each vault is a git repo with built-in diff visualization. The app opinionates note organization: typed notes (Essays, Projects, People, etc.), relationships between notes, wikilinks, a whiteboard view, and AI chat sidebar. v2026-06-14 stable shipped June 14, with menu/tooltip clarity improvements, IME punctuation fixes, and cross-platform path resolution. 3,095 commits, 1,284 releases (including Alpha builds). 23 The Show HN post gathered 318 points and 143 comments. 24

Differentiation: The git-first design is the real differentiator over Obsidian: every note change is a commit, branching works, and team knowledge bases can be shared via standard git workflows. Obsidian's Sync service costs extra and uses its own sync protocol. The AGPL-3.0 license means any cloud service built on Tolaria must open-source its modifications — a meaningful constraint if you plan to build a hosted offering. Active bugs at launch: create_note false-success reports, Windows wikilink path failures, macOS refresh loops (all reported June 11–14). 25

Verdict: ⭐ Star it if you want a self-hosted, git-backed knowledge base where your AI agents can read and write directly. The MCP integration is the practical differentiator for developers already using Claude Code or Codex. Wait on Windows adoption until the path resolution bugs clear.

#8 · roboflow/supervision — ~44,200 stars · +3,315 this week

What it solves: Every computer vision project re-implements the same scaffolding: loading detections, drawing bounding boxes, tracking objects across frames, evaluating mAP, managing datasets. supervision is a model-agnostic Python toolkit that handles all of this — you connect your model of choice, and supervision provides the reusable infrastructure. 26

Stack and approach: Python, MIT license, Roboflow. Compatible with Ultralytics YOLO, Transformers, MMDetection, RF-DETR, and any framework that outputs detections. Four modules: annotators (customizable bounding box, label, track path renderers), datasets (load/split/merge COCO/YOLO/Pascal VOC formats), trackers (ByteTrack, SORT, BotSort), metrics (mAP, mAR). v0.28.0 (April 30) added CompactMask — the same 28 instance segmentation masks that previously consumed ~55MB now use ~237KB (a 240× reduction) via RLE encoding on tight bounding-box crops. 27 SAM3 text-prompt segmentation support also landed in v0.28.0. v0.29.0rc0 appeared June 11 as a release candidate. 4,894 commits, 3,900 forks. 26

Differentiation: Supervision's position against OpenCV is complementary rather than competing — OpenCV handles image I/O and processing; supervision handles the higher-level detection/tracking/annotation layer above it. Against Ultralytics directly: Ultralytics bundles its own annotation utilities, but they're YOLO-specific. Supervision is framework-agnostic. The risk: Roboflow is a commercial company and has historically pushed its cloud platform through supervision. The library itself is MIT and works fully offline, but tutorials and tooling nudge toward Roboflow's paid services. 28

Verdict: ⭐ Star it if you write any computer vision code. The CompactMask optimization alone is worth it for any project running instance segmentation at scale. Wait for v0.29 stable before upgrading from v0.28 in production — the RC appeared June 11 and the stable release is imminent.

#9 · aaif-goose/goose — ~49,400 stars · +2,165 this week

What it solves: Most AI coding agents are tied to a single model and a single use case. goose (by Block — the Jack Dorsey company behind Square and Cash App) is a general-purpose AI agent that handles code, research, data analysis, and automation in one session. The organizational news this week: goose completed its migration from block/goose to the Linux Foundation's Agentic AI Foundation (AAIF), alongside Anthropic's MCP and OpenAI's AGENTS.md as the three founding projects. 29

Stack and approach: Rust (64.3%) + TypeScript (29.1%), Apache 2.0. Desktop app + CLI + API. 15+ LLM providers (Anthropic, OpenAI, Google, Ollama, Azure, Bedrock, OpenRouter, and others), 70+ MCP extensions. Recipes: shareable YAML workflow definitions that package multi-step agent tasks. v1.37.0 shipped June 3, adding xAI SuperGrok OAuth, Alibaba Qwen (DashScope), Perplexity declarative provider, a TUI command-line interface, /goal self-evaluation command, Hooks system, and Russian/Turkish language support. 30 Block reports 60% of its 12,000 employees use goose weekly, citing 50–75% development time savings on applicable tasks. 31

Differentiation: Tessera's assessment of the AAIF migration: "The joke is weak. The bet behind it is not." 32 The structural argument for the move — open-source agent infrastructure shouldn't be owned by one company — is sound. The identified risks: Rust is a thin ecosystem for AI tooling (most Python libraries need bridging), 15+ provider support risks lowest-common-denominator feature implementation, and AAIF governance is unproven. 32 Against Claude Code (which scores 80.9% on SWE-bench with Opus 4.5): goose doesn't win on raw accuracy, but delivers free + open-source + model-agnostic + 3,000+ MCP servers + Recipes + local offline capability as a bundle no commercial option matches. 33

Verdict: ⭐ Star it if you want a free, model-agnostic agent that can handle work outside pure coding — cross-service workflows involving cloud infra, databases, APIs, and code together in one session. The AAIF migration is table stakes for long-term trust; the real indicator to watch is whether non-Block contributors materially increase in the next two quarters.

What it solves: Asking an AI for product management help produces generic output — inconsistent formats, missing frameworks, and no institutional context. pm-skills is a library of 66 agent skill files covering the complete PM lifecycle from discovery through iteration, built by Jonathan Prisant (Product on Purpose). The pitch: "Stop prompt-fumbling. Start shipping. Every time you ask an AI to help with product management, you start from zero." 34

Stack and approach: Node.js, Apache 2.0, npm-published, with an Astro Starlight documentation site. 66 skills organized across the Triple Diamond: 30 phase skills (Discover/Define/Develop/Deliver/Measure/Iterate), 8 foundation skills, 10 utility skills, 15 tool skills (Foundation Sprint + Design Sprint), and one standalone. Additional infrastructure: 4 sub-agents (pm-critic, pm-skill-auditor, pm-changelog-curator, pm-release-conductor), 10 /workflow-* orchestration commands, 95+ real-world sample outputs, and a /chain temporary skill-chain runner. v2.26.0 is current, marking the end of a "quality-convergence" pass: all 26 original-generation skills now carry "When NOT to Use" boundary pointers and enumerated output contracts. 34 6 contributors, 716 commits.

Differentiation: A note on the data: pm-skills has 314 actual GitHub stars — the trending page figure overstates it significantly, likely conflating the repo with the broader topic. Use the 314 number. The engineering maturity here exceeds what the star count implies: CI enforces parity checking, cross-reference validation, em-dash scanning (em dashes are a known AI output artifact), and version badge consistency on every PR. The MCP server (pm-skills-mcp) entered maintenance mode on May 4 — security patches only, v2.9.x line. 34 pm-skills is currently the only agent skill collection covering the full PM lifecycle end-to-end; single-point PM tools exist but nothing with this scope.

Verdict: ⭐ Star it if you use AI agents for product work. The 66 skills is actually usable rather than overwhelming — the Triple Diamond organization makes it easy to reach for the right skill at the right phase. The /chain runner is the highest-leverage new feature for running multi-skill PM workflows without manually sequencing commands.

Three patterns this week

The agent skills ecosystem is adding a security layer. Last week the question was "which skills should I install?" This week NVIDIA shipped a tool for answering "is it safe to install these skills at all?" SkillSpector's underlying research finding — 26.1% of 42,447 analyzed skills contain exploitable vulnerabilities — is the kind of number that should change how teams treat community skill repos. The practical posture right now: run static analysis on any skill from outside your organization before installation, and treat executable script inclusion as a risk multiplier.

Open-source infrastructure is consolidating under neutral governance. goose's Linux Foundation move mirrors what happened to Kubernetes, Linux, and other infrastructure projects that grew too important to remain under any single company's control. AAIF now holds goose, MCP, and AGENTS.md — three of the most load-bearing standards in the current agent stack. Whether AAIF governance produces more innovation than Block alone could is genuinely unknown; the precedent from similar moves is mixed. Track the contributor graph, not the announcement.

The file-first, git-native pattern is spreading beyond code. apple/container runs each container in an isolated VM (not a shared VM), tolaria stores every note as a plain Markdown file (not proprietary format), and graphify exports the knowledge graph as graph.html + graph.json (inspectable text files). These aren't coincidences — they're converging on a preference for durable, version-controllable, human-readable outputs over platform-locked abstractions. The AI-native angle is practical: plain files can be read by any model without format translation.

Cover image: apple/container 1.0.0 — GitHub repository social preview. 4