Adding llms.txt to Your Astro Blog
What is llms.txt?
The llms.txt specification proposes a standard location for LLM-readable content. Think of it like robots.txt for crawlers or sitemap.xml for search engines, but designed for AI agents.
The problem it solves: when an AI agent visits your website, it has to parse HTML, navigate around headers, footers, and sidebars, and extract the actual content. This wastes tokens and often produces messy results.
The solution: provide a clean, structured text file at a known location. Agents fetch /llms.txt, get a table of contents, and can request individual pieces of content in plain markdown.
The Architecture
We’ll build three endpoints that work together:
/llms.txt → Index: "Here's what I have"
/llms-full.txt → Everything: "Here's all of it at once"
/llms/[slug].txt → Individual: "Here's just this one post" Why three? Different agents have different needs:
- A quick lookup might only need the index to find one relevant post
- A RAG system might want everything in one request
- A focused query might want just one article without the overhead of the full dump
File Structure
Here’s where everything lives in your Astro project:
src/
├── utils/
│ └── llms.ts # All the generation logic
├── pages/
│ ├── llms.txt.ts # Index endpoint
│ ├── llms-full.txt.ts # Full content endpoint
│ └── llms/
│ └── [slug].txt.ts # Per-post endpoints (dynamic route) The utils/llms.ts file contains all the logic. The page files are thin wrappers that call into it. This separation keeps the endpoints clean and the logic testable.
Prerequisites
Before we start, you’ll need these project-specific pieces:
siteConfig- An object withname,description,url, andauthorpropertiesgetAllPosts()- A function that returns your content collection postsBlogPost- The type from Astro’s content collections withslug,body, anddata
The complete gist shows the full implementation with all type definitions.
Part 1: Type Definitions
Let’s start by defining the shapes of our data. Good types make the rest of the code self-documenting.
// src/utils/llms.ts
// Basic item for the index - just enough to create a link
interface LlmsItem {
title: string;
description: string;
link: string;
}
// Extended item for full content - includes the actual post data
interface LlmsFullItem extends LlmsItem {
pubDate: Date;
category: string;
body: string;
} Why two types? The index only needs titles and links. The full content dump needs everything. By extending LlmsItem, we ensure consistency while allowing the richer type where needed.
Now the configuration types for each generator:
// Config for the index endpoint
interface LlmsTxtConfig {
name: string;
description: string;
site: string;
items: LlmsItem[];
optional?: LlmsItem[]; // Links that agents can skip if context is tight
}
// Config for the full content endpoint
interface LlmsFullTxtConfig {
name: string;
description: string;
author: string;
site: string;
items: LlmsFullItem[];
}
// Config for individual post endpoints
interface LlmsPostConfig {
post: BlogPost;
site: string;
link: string;
} The optional field in LlmsTxtConfig is part of the spec. It signals to agents: “these links are nice-to-have, skip them if you’re running low on context window.”
Part 2: The Document Builder
Every endpoint needs to return a plain text Response. Instead of repeating this logic, we create one builder that handles it all:
function doc(...sections: (string | string[])[]): Response {
const content = sections
.flat() // Flatten nested arrays
.join("\n") // Join with newlines
.replace(/\n{3,}/g, "\n\n") // Normalize multiple blank lines to just one
.trim(); // Clean up edges
return new Response(content + "\n", {
headers: { "Content-Type": "text/plain; charset=utf-8" },
});
} Why rest parameters with arrays? This lets us compose documents flexibly:
// These all work:
doc("# Title", "Some text");
doc(["# Title", "", "Some text"]);
doc(headerArray, bodyArray, footerArray); Why normalize newlines? When composing from multiple arrays, you might accidentally get three or four blank lines in a row. The regex /\n{3,}/g catches any run of 3+ newlines and replaces it with exactly 2 (one blank line). Clean output, no matter how messy the input.
Part 3: Helper Functions
Small, focused functions that each do one thing:
Formatting Dates
function formatDate(date: Date): string {
return date.toISOString().split("T")[0];
} Takes a Date, returns "2025-11-30". The split("T")[0] trick extracts just the date part from an ISO string like "2025-11-30T00:00:00.000Z".
Building Headers
function header(name: string, description: string): string[] {
return [`# ${name}`, "", `> ${description}`];
} Returns an array of lines. The empty string creates a blank line between the title and the blockquote description. This matches the llms.txt spec format.
Building Link Lists
function linkList(title: string, items: LlmsItem[], site: string): string[] {
return [
"",
`## ${title}`,
...items.map((item) => `- [${item.title}](${site}${item.link}): ${item.description}`),
];
} Creates a section with an H2 heading and a markdown list of links. Each link includes a description after the colon. The leading empty string ensures a blank line before the section.
Building Post Metadata
function postMeta(site: string, link: string, pubDate: Date, category: string): string[] {
return [`URL: ${site}${link}`, `Published: ${formatDate(pubDate)}`, `Category: ${category}`];
} Three lines of metadata for each post. This keeps the format consistent across the full dump and individual post endpoints.
Part 4: Stripping MDX Syntax
If you use MDX, your post bodies contain things agents don’t need:
import TldrBox from '@components/TldrBox.astro';
<TldrBox>
This is a summary component.
</TldrBox>
The actual content starts here... We need to strip the import and the JSX component, but keep the markdown content:
const MDX_PATTERNS = [
/^import\s+.+from\s+['"].+['"];?\s*$/gm, // import statements
/<[A-Z][a-zA-Z]*[^>]*>[\s\S]*?<\/[A-Z][a-zA-Z]*>/g, // JSX blocks like <Component>...</Component>
/<[A-Z][a-zA-Z]*[^>]*\/>/g, // Self-closing JSX like <Icon />
] as const;
function stripMdx(content: string): string {
return MDX_PATTERNS.reduce((text, pattern) => text.replace(pattern, ""), content).trim();
} How the patterns work:
-
Import pattern: Matches lines starting with
import, followed by anything, thenfromand a quoted path. Themflag makes^match line starts. -
JSX block pattern: Matches
<CapitalLetter(JSX convention), captures everything until the closing tag. The[\s\S]*?is a non-greedy match for any character including newlines. -
Self-closing pattern: Matches
<CapitalLetter ... />for components without children.
Why PascalCase? JSX components use PascalCase by convention. HTML elements are lowercase. So <TldrBox> gets stripped, but <div> or <a href="..."> passes through. This also means code examples in fenced blocks are safe, since they’re not parsed as actual JSX.
Part 5: The Generators
Now we combine everything into the three main functions:
Index Generator
export function llmsTxt(config: LlmsTxtConfig): Response {
const sections = [
header(config.name, config.description),
linkList("Posts", config.items, config.site),
];
if (config.optional?.length) {
sections.push(linkList("Optional", config.optional, config.site));
}
return doc(...sections);
} Builds an array of sections, conditionally adds the optional section, then passes everything to doc(). The spread operator ...sections unpacks the array into separate arguments.
Output looks like:
# Site Name
> Site description
## Posts
- [Post Title](https://site.com/llms/post-slug.txt): Post description
## Optional
- [About](https://site.com/about): About the author Full Content Generator
export function llmsFullTxt(config: LlmsFullTxtConfig): Response {
const head = [
...header(config.name, config.description),
"",
`Author: ${config.author}`,
`Site: ${config.site}`,
"",
"---",
];
const posts = config.items.flatMap((item) => [
"",
`## ${item.title}`,
"",
...postMeta(config.site, item.link, item.pubDate, item.category),
"",
`> ${item.description}`,
"",
stripMdx(item.body),
"",
"---",
]);
return doc(head, posts);
} Why flatMap? Each item produces an array of lines. Using map would give us an array of arrays. flatMap maps and flattens in one step, giving us a single array of all lines.
The horizontal rules (---) separate posts visually and give agents clear boundaries between content pieces.
Individual Post Generator
export function llmsPost(config: LlmsPostConfig): Response {
const { post, site, link } = config;
const { title, description, pubDate, category } = post.data;
return doc(
`# ${title}`,
"",
`> ${description}`,
"",
...postMeta(site, link, pubDate, category),
"",
stripMdx(post.body ?? ""),
);
} The simplest generator. Destructures the config and post data, then builds a single document. The post.body ?? "" handles the edge case of a post without body content.
Part 6: Data Transformers
We need functions to convert Astro’s content collection format into our types:
export function postsToLlmsItems(
posts: BlogPost[],
formatUrl: (slug: string) => string,
): LlmsItem[] {
return posts.map((post) => ({
title: post.data.title,
description: post.data.description,
link: formatUrl(post.slug),
}));
}
export function postsToLlmsFullItems(
posts: BlogPost[],
formatUrl: (slug: string) => string,
): LlmsFullItem[] {
return posts.map((post) => ({
...postsToLlmsItems([post], formatUrl)[0],
pubDate: post.data.pubDate,
category: post.data.category,
body: post.body ?? "",
}));
} Why the callback for URLs? Different endpoints need different URL formats:
- Index links to
/llms/post-slug.txt(the plain text version) - Full content links to
/post-slug(the HTML version)
By passing the formatter as a callback, the same transformer works for both cases.
Why does postsToLlmsFullItems call postsToLlmsItems? DRY principle. The full item includes everything from the basic item, plus extra fields. Instead of duplicating the mapping logic, we reuse it and spread the result.
Part 7: The Endpoints
Now we wire everything up in Astro page files. These are intentionally thin.
Index Endpoint
// src/pages/llms.txt.ts
import type { APIRoute } from "astro";
import { siteConfig } from "@/site-config";
import { llmsTxt, postsToLlmsItems } from "@utils/llms";
import { getAllPosts } from "@utils/posts";
export const GET: APIRoute = async () => {
const posts = await getAllPosts();
return llmsTxt({
name: siteConfig.name,
description: siteConfig.description,
site: siteConfig.url,
items: postsToLlmsItems(posts, (slug) => `/llms/${slug}.txt`),
optional: [
{ title: "About", link: "/about", description: "About the author" },
{ title: "Full Content", link: "/llms-full.txt", description: "All posts in one file" },
],
});
}; The APIRoute type tells Astro this is an API endpoint, not an HTML page. The .txt.ts filename means it generates /llms.txt.
Full Content Endpoint
// src/pages/llms-full.txt.ts
import type { APIRoute } from "astro";
import { siteConfig } from "@/site-config";
import { llmsFullTxt, postsToLlmsFullItems } from "@utils/llms";
import { getAllPosts } from "@utils/posts";
export const GET: APIRoute = async () => {
const posts = await getAllPosts();
return llmsFullTxt({
name: siteConfig.name,
description: siteConfig.description,
author: siteConfig.author,
site: siteConfig.url,
items: postsToLlmsFullItems(posts, (slug) => `/${slug}`),
});
}; Almost identical structure. The URL formatter now points to HTML pages since agents reading the full dump might want to reference the original.
Dynamic Per-Post Endpoints
// src/pages/llms/[slug].txt.ts
import type { GetStaticPaths } from "astro";
import { siteConfig } from "@/site-config";
import { llmsPost } from "@utils/llms";
import { getAllPosts, type BlogPost } from "@utils/posts";
export const getStaticPaths: GetStaticPaths = async () => {
const posts = await getAllPosts();
return posts.map((post) => ({
params: { slug: post.slug },
props: { post },
}));
};
export const GET = ({ props }: { props: { post: BlogPost } }) => {
return llmsPost({
post: props.post,
site: siteConfig.url,
link: `/${props.post.slug}`,
});
}; What’s getStaticPaths? Astro needs to know at build time which pages to generate. This function returns an array of all valid slugs. Each entry includes params (the URL parameters) and props (data passed to the page).
Why [slug] in the filename? Square brackets denote a dynamic route in Astro. The file [slug].txt.ts generates /llms/post-one.txt, /llms/post-two.txt, etc.
Part 8: Discovery
Agents need to find your llms.txt. The spec says to put it at the root (/llms.txt), similar to robots.txt. But you can also advertise it in HTML:
<link rel="alternate" type="text/plain" href="/llms.txt" title="LLMs.txt" /> Add this to your base layout or head component, wherever you define other <link> tags like RSS or favicon.
This isn’t part of the official spec, but follows web conventions. You can also register your site on directories like llmstxt.site.
Limitations
This implementation works for content collections with markdown or MDX bodies. It reads post.body directly, which is raw text.
For component-based pages (React, Vue, Svelte, or plain .astro files), there’s no markdown body to extract. You’d need a different strategy:
- Render to HTML and strip tags (lossy, messy)
- Maintain separate content files (duplicate effort)
- Use a headless CMS where content exists independently
For most blogs, content collections are the right choice anyway.
Why Not Use a Library?
There are Astro integrations for llms.txt. They auto-generate from all pages at build time. Sounds convenient, but:
- You get everything, including pages you might not want exposed
- No per-post endpoints
- No control over the output format
- Another dependency to maintain
This implementation is ~150 lines of TypeScript. You control exactly what’s included. You understand every line. For something this simple, the DIY approach wins.
Bonus: An SVG Icon
The llms.txt logo is four rounded squares in a plus pattern. Here’s a simple SVG version you can use in your navigation:
<svg
xmlns="http://www.w3.org/2000/svg"
width="24"
height="24"
viewBox="-4 -4 32 32"
fill="currentColor"
aria-hidden="true"
>
<rect x="8" y="1" width="8" height="8" rx="2" opacity="0.6" />
<rect x="1" y="8" width="8" height="8" rx="2" opacity="0.8" />
<rect x="15" y="8" width="8" height="8" rx="2" opacity="0.7" />
<rect x="8" y="15" width="8" height="8" rx="2" />
</svg> Design notes:
viewBox="-4 -4 32 32"adds padding so the icon matches the visual weight of stroke-based icons like Lucidefill="currentColor"inherits from CSS, so it works with any color scheme- Varying opacity (0.6, 0.7, 0.8, 1.0) gives depth without using multiple colors
rx="2"rounds the corners to match the original logo style
For Astro, wrap it in a component so you can pass size as a prop and reuse it across your site.
The Result
After deploying, you have:
/llms.txt- Index listing all posts with descriptions/llms-full.txt- Complete content for RAG systems or full context/llms/post-slug.txt- Individual posts for focused queries
Agents fetch the index, pick what they need, and get clean markdown. No HTML parsing, no navigation noise, no wasted tokens. That’s the point of the standard.