# Adding llms.txt to Your Astro Blog

> How to implement the llms.txt standard in Astro without dependencies. Three endpoints, ~150 lines of TypeScript, and your content becomes agent-accessible.

URL: https://kumak.dev/adding-llms-txt-to-astro/
Published: 2025-11-30
Category: tutorial

## What is llms.txt?

The [llms.txt specification](https://llmstxt.org/) proposes a standard location for LLM-readable content. Think of it like `robots.txt` for crawlers or `sitemap.xml` for search engines, but designed for AI agents.

The problem it solves: when an AI agent visits your website, it has to parse HTML, navigate around headers, footers, and sidebars, and extract the actual content. This wastes tokens and often produces messy results.

The solution: provide a clean, structured text file at a known location. Agents fetch `/llms.txt`, get a table of contents, and can request individual pieces of content in plain markdown.

## The Architecture

We'll build three endpoints that work together:

```
/llms.txt           → Index: "Here's what I have"
/llms-full.txt      → Everything: "Here's all of it at once"
/llms/[slug].txt    → Individual: "Here's just this one post"
```

**Why three?** Different agents have different needs:

- A quick lookup might only need the index to find one relevant post
- A RAG system might want everything in one request
- A focused query might want just one article without the overhead of the full dump

## File Structure

Here's where everything lives in your Astro project:

```
src/
├── utils/
│   └── llms.ts            # All the generation logic
├── pages/
│   ├── llms.txt.ts        # Index endpoint
│   ├── llms-full.txt.ts   # Full content endpoint
│   └── llms/
│       └── [slug].txt.ts  # Per-post endpoints (dynamic route)
```

The `utils/llms.ts` file contains all the logic. The page files are thin wrappers that call into it. This separation keeps the endpoints clean and the logic testable.

## Prerequisites

Before we start, you'll need these project-specific pieces:

- **`siteConfig`** - An object with `name`, `description`, `url`, and `author` properties
- **`getAllPosts()`** - A function that returns your content collection posts
- **`BlogPost`** - The type from Astro's content collections with `slug`, `body`, and `data`

The [complete gist](https://gist.github.com/szymdzum/a6db6ff5feb0c566cbd852e10c0ab0af) shows the full implementation with all type definitions.

## Part 1: Type Definitions

Let's start by defining the shapes of our data. Good types make the rest of the code self-documenting.

```typescript
// src/utils/llms.ts

// Basic item for the index - just enough to create a link
interface LlmsItem {
  title: string;
  description: string;
  link: string;
}

// Extended item for full content - includes the actual post data
interface LlmsFullItem extends LlmsItem {
  pubDate: Date;
  category: string;
  body: string;
}
```

Why two types? The index only needs titles and links. The full content dump needs everything. By extending `LlmsItem`, we ensure consistency while allowing the richer type where needed.

Now the configuration types for each generator:

```typescript
// Config for the index endpoint
interface LlmsTxtConfig {
  name: string;
  description: string;
  site: string;
  items: LlmsItem[];
  optional?: LlmsItem[];  // Links that agents can skip if context is tight
}

// Config for the full content endpoint
interface LlmsFullTxtConfig {
  name: string;
  description: string;
  author: string;
  site: string;
  items: LlmsFullItem[];
}

// Config for individual post endpoints
interface LlmsPostConfig {
  post: BlogPost;
  site: string;
  link: string;
}
```

The `optional` field in `LlmsTxtConfig` is part of the spec. It signals to agents: "these links are nice-to-have, skip them if you're running low on context window."

## Part 2: The Document Builder

Every endpoint needs to return a plain text `Response`. Instead of repeating this logic, we create one builder that handles it all:

```typescript
function doc(...sections: (string | string[])[]): Response {
  const content = sections
    .flat()                          // Flatten nested arrays
    .join("\n")                      // Join with newlines
    .replace(/\n{3,}/g, "\n\n")      // Normalize multiple blank lines to just one
    .trim();                         // Clean up edges

  return new Response(content + "\n", {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}
```

**Why rest parameters with arrays?** This lets us compose documents flexibly:

```typescript
// These all work:
doc("# Title", "Some text");
doc(["# Title", "", "Some text"]);
doc(headerArray, bodyArray, footerArray);
```

**Why normalize newlines?** When composing from multiple arrays, you might accidentally get three or four blank lines in a row. The regex `/\n{3,}/g` catches any run of 3+ newlines and replaces it with exactly 2 (one blank line). Clean output, no matter how messy the input.

## Part 3: Helper Functions

Small, focused functions that each do one thing:

### Formatting Dates

```typescript
function formatDate(date: Date): string {
  return date.toISOString().split("T")[0];
}
```

Takes a Date, returns `"2025-11-30"`. The `split("T")[0]` trick extracts just the date part from an ISO string like `"2025-11-30T00:00:00.000Z"`.

### Building Headers

```typescript
function header(name: string, description: string): string[] {
  return [`# ${name}`, "", `> ${description}`];
}
```

Returns an array of lines. The empty string creates a blank line between the title and the blockquote description. This matches the llms.txt spec format.

### Building Link Lists

```typescript
function linkList(title: string, items: LlmsItem[], site: string): string[] {
  return [
    "",
    `## ${title}`,
    ...items.map((item) => `- [${item.title}](${site}${item.link}): ${item.description}`),
  ];
}
```

Creates a section with an H2 heading and a markdown list of links. Each link includes a description after the colon. The leading empty string ensures a blank line before the section.

### Building Post Metadata

```typescript
function postMeta(site: string, link: string, pubDate: Date, category: string): string[] {
  return [`URL: ${site}${link}`, `Published: ${formatDate(pubDate)}`, `Category: ${category}`];
}
```

Three lines of metadata for each post. This keeps the format consistent across the full dump and individual post endpoints.

## Part 4: Stripping MDX Syntax

If you use MDX, your post bodies contain things agents don't need:

```mdx

The actual content starts here...
```

We need to strip the import and the JSX component, but keep the markdown content:

```typescript
const MDX_PATTERNS = [
  /^import\s+.+from\s+['"].+['"];?\s*$/gm,           // import statements
  /<[A-Z][a-zA-Z]*[^>]*>[\s\S]*?<\/[A-Z][a-zA-Z]*>/g, // JSX blocks like 
  /<[A-Z][a-zA-Z]*[^>]*\/>/g,                         // Self-closing JSX like 
] as const;

function stripMdx(content: string): string {
  return MDX_PATTERNS.reduce((text, pattern) => text.replace(pattern, ""), content).trim();
}
```

**How the patterns work:**

1. **Import pattern**: Matches lines starting with `import`, followed by anything, then `from` and a quoted path. The `m` flag makes `^` match line starts.

2. **JSX block pattern**: Matches `` for components without children.

**Why PascalCase?** JSX components use PascalCase by convention. HTML elements are lowercase. So `<TldrBox>` gets stripped, but `<div>` or `<a href="...">` passes through. This also means code examples in fenced blocks are safe, since they're not parsed as actual JSX.

## Part 5: The Generators

Now we combine everything into the three main functions:

### Index Generator

```typescript
export function llmsTxt(config: LlmsTxtConfig): Response {
  const sections = [
    header(config.name, config.description),
    linkList("Posts", config.items, config.site),
  ];

  if (config.optional?.length) {
    sections.push(linkList("Optional", config.optional, config.site));
  }

  return doc(...sections);
}
```

Builds an array of sections, conditionally adds the optional section, then passes everything to `doc()`. The spread operator `...sections` unpacks the array into separate arguments.

**Output looks like:**

```markdown
# Site Name

> Site description

## Posts
- [Post Title](https://site.com/llms/post-slug.txt): Post description

## Optional
- [About](https://site.com/about): About the author
```

### Full Content Generator

```typescript
export function llmsFullTxt(config: LlmsFullTxtConfig): Response {
  const head = [
    ...header(config.name, config.description),
    "",
    `Author: ${config.author}`,
    `Site: ${config.site}`,
    "",
    "---",
  ];

  const posts = config.items.flatMap((item) => [
    "",
    `## ${item.title}`,
    "",
    ...postMeta(config.site, item.link, item.pubDate, item.category),
    "",
    `> ${item.description}`,
    "",
    stripMdx(item.body),
    "",
    "---",
  ]);

  return doc(head, posts);
}
```

**Why `flatMap`?** Each item produces an array of lines. Using `map` would give us an array of arrays. `flatMap` maps and flattens in one step, giving us a single array of all lines.

The horizontal rules (`---`) separate posts visually and give agents clear boundaries between content pieces.

### Individual Post Generator

```typescript
export function llmsPost(config: LlmsPostConfig): Response {
  const { post, site, link } = config;
  const { title, description, pubDate, category } = post.data;

  return doc(
    `# ${title}`,
    "",
    `> ${description}`,
    "",
    ...postMeta(site, link, pubDate, category),
    "",
    stripMdx(post.body ?? ""),
  );
}
```

The simplest generator. Destructures the config and post data, then builds a single document. The `post.body ?? ""` handles the edge case of a post without body content.

## Part 6: Data Transformers

We need functions to convert Astro's content collection format into our types:

```typescript
export function postsToLlmsItems(
  posts: BlogPost[],
  formatUrl: (slug: string) => string,
): LlmsItem[] {
  return posts.map((post) => ({
    title: post.data.title,
    description: post.data.description,
    link: formatUrl(post.slug),
  }));
}

export function postsToLlmsFullItems(
  posts: BlogPost[],
  formatUrl: (slug: string) => string,
): LlmsFullItem[] {
  return posts.map((post) => ({
    ...postsToLlmsItems([post], formatUrl)[0],
    pubDate: post.data.pubDate,
    category: post.data.category,
    body: post.body ?? "",
  }));
}
```

**Why the callback for URLs?** Different endpoints need different URL formats:

- Index links to `/llms/post-slug.txt` (the plain text version)
- Full content links to `/post-slug` (the HTML version)

By passing the formatter as a callback, the same transformer works for both cases.

**Why does `postsToLlmsFullItems` call `postsToLlmsItems`?** DRY principle. The full item includes everything from the basic item, plus extra fields. Instead of duplicating the mapping logic, we reuse it and spread the result.

## Part 7: The Endpoints

Now we wire everything up in Astro page files. These are intentionally thin.

### Index Endpoint

```typescript
// src/pages/llms.txt.ts

export const GET: APIRoute = async () => {
  const posts = await getAllPosts();

  return llmsTxt({
    name: siteConfig.name,
    description: siteConfig.description,
    site: siteConfig.url,
    items: postsToLlmsItems(posts, (slug) => `/llms/${slug}.txt`),
    optional: [
      { title: "About", link: "/about", description: "About the author" },
      { title: "Full Content", link: "/llms-full.txt", description: "All posts in one file" },
    ],
  });
};
```

The `APIRoute` type tells Astro this is an API endpoint, not an HTML page. The `.txt.ts` filename means it generates `/llms.txt`.

### Full Content Endpoint

```typescript
// src/pages/llms-full.txt.ts

export const GET: APIRoute = async () => {
  const posts = await getAllPosts();

  return llmsFullTxt({
    name: siteConfig.name,
    description: siteConfig.description,
    author: siteConfig.author,
    site: siteConfig.url,
    items: postsToLlmsFullItems(posts, (slug) => `/${slug}`),
  });
};
```

Almost identical structure. The URL formatter now points to HTML pages since agents reading the full dump might want to reference the original.

### Dynamic Per-Post Endpoints

```typescript
// src/pages/llms/[slug].txt.ts

export const getStaticPaths: GetStaticPaths = async () => {
  const posts = await getAllPosts();
  return posts.map((post) => ({
    params: { slug: post.slug },
    props: { post },
  }));
};

export const GET = ({ props }: { props: { post: BlogPost } }) => {
  return llmsPost({
    post: props.post,
    site: siteConfig.url,
    link: `/${props.post.slug}`,
  });
};
```

**What's `getStaticPaths`?** Astro needs to know at build time which pages to generate. This function returns an array of all valid slugs. Each entry includes `params` (the URL parameters) and `props` (data passed to the page).

**Why `[slug]` in the filename?** Square brackets denote a dynamic route in Astro. The file `[slug].txt.ts` generates `/llms/post-one.txt`, `/llms/post-two.txt`, etc.

## Part 8: Discovery

Agents need to find your llms.txt. The spec says to put it at the root (`/llms.txt`), similar to `robots.txt`. But you can also advertise it in HTML:

```html
<link rel="alternate" type="text/plain" href="/llms.txt" title="LLMs.txt" />
```

Add this to your base layout or head component, wherever you define other `<link>` tags like RSS or favicon.

This isn't part of the official spec, but follows web conventions. You can also register your site on directories like [llmstxt.site](https://llmstxt.site).

## Limitations

This implementation works for **content collections with markdown or MDX bodies**. It reads `post.body` directly, which is raw text.

For component-based pages (React, Vue, Svelte, or plain `.astro` files), there's no markdown body to extract. You'd need a different strategy:

- Render to HTML and strip tags (lossy, messy)
- Maintain separate content files (duplicate effort)
- Use a headless CMS where content exists independently

For most blogs, content collections are the right choice anyway.

## Why Not Use a Library?

There are Astro integrations for llms.txt. They auto-generate from all pages at build time. Sounds convenient, but:

1. You get everything, including pages you might not want exposed
2. No per-post endpoints
3. No control over the output format
4. Another dependency to maintain

This implementation is ~150 lines of TypeScript. You control exactly what's included. You understand every line. For something this simple, the DIY approach wins.

## Bonus: An SVG Icon

The llms.txt logo is four rounded squares in a plus pattern. Here's a simple SVG version you can use in your navigation:

```html
<svg
  xmlns="http://www.w3.org/2000/svg"
  width="24"
  height="24"
  viewBox="-4 -4 32 32"
  fill="currentColor"
  aria-hidden="true"
>
  <rect x="8" y="1" width="8" height="8" rx="2" opacity="0.6" />
  <rect x="1" y="8" width="8" height="8" rx="2" opacity="0.8" />
  <rect x="15" y="8" width="8" height="8" rx="2" opacity="0.7" />
  <rect x="8" y="15" width="8" height="8" rx="2" />
</svg>
```

**Design notes:**

- **`viewBox="-4 -4 32 32"`** adds padding so the icon matches the visual weight of stroke-based icons like Lucide
- **`fill="currentColor"`** inherits from CSS, so it works with any color scheme
- **Varying opacity** (0.6, 0.7, 0.8, 1.0) gives depth without using multiple colors
- **`rx="2"`** rounds the corners to match the original logo style

For Astro, wrap it in a component so you can pass `size` as a prop and reuse it across your site.

## The Result

After deploying, you have:

- `/llms.txt` - Index listing all posts with descriptions
- `/llms-full.txt` - Complete content for RAG systems or full context
- `/llms/post-slug.txt` - Individual posts for focused queries

Agents fetch the index, pick what they need, and get clean markdown. No HTML parsing, no navigation noise, no wasted tokens. That's the point of the standard.