---
title: 'Build a Docs Agent Without Vector Search'
description: 'Build a LangGraph docs agent that searches read-only documentation files with Bashkit shell tools instead of a vector database or embeddings.'
date: 2026-05-22
tags:
  - ai
  - agents
  - bashkit
  - python
  - search
---

When we hear about classic agentic applications, there is almost always some version of semantic search for data. In most cases it is implemented as vector search across embeddings, like OpenAI Embeddings.

In this post, let's build alternative approach, where data is represented as files in virtual file system, and then agent figures out how to search it.

## Basic Idea

Imagine that all data is represented as files in file system. And that agent has means to work with these files: traverse, search, grep, list, and print selected ranges. So now, when someone asks agent to answer question, agent just does what it is trained for: it uses own knowledge to orchestrate reading files, extract data, and answer.

On top of that, let's give agent an ability to run shell commands against these files. Then semantic search becomes not needed for many cases. The agent can do what developer does:

```bash
rg -i -n 'bashkit-cli' /docs/public /docs/rustdoc | head -20
grep -R -i -n -C 1 -m 3 -- 'readonly' /docs/public /docs/rustdoc /docs/examples
sed -n '40,90p' /docs/public/cli.md
find /docs/examples -maxdepth 2 -type f -name '*.md'
```

This looks too primitive. It is not.

The useful thing is not `grep` itself. The useful thing is that agent has an inspectable environment. It can try one query, see nothing, retry with different word, narrow down with filenames, read nearby lines with `sed`, combine two searches, and only then answer.

That is search as workflow, not search as one API call.

![Agent loop searches mounted documentation files with Bashkit shell commands](./docs-shell-search-cycle.svg)

## The Example

The example is a small console [docs search agent](https://github.com/everruns/bashkit/tree/main/examples/docs-grep-agent). It is LangGraph agent created through LangChain 1.0, with one Bashkit tool attached.

If you build your own version outside Bashkit repo, first dependency is just:

```bash
uv add bashkit
```

For this exact shape you also need LangChain and OpenAI package:

```bash
uv add bashkit langchain langchain-openai
```

Run example from Bashkit repo:

```bash
cd examples/docs-grep-agent
export OPENAI_API_KEY=sk-...
uv run docs-grep-agent "what is bashkit"
uv run docs-grep-agent "give me example on how to use bashkit cli"
```

Here is it answering from docs, with no vector index behind it:

![Bashkit docs search agent answering a question in terminal](./docs-grep-agent.png)

And if you want to see what agent actually runs:

```bash
uv run docs-grep-agent --show-tools "how do read-only mounts work?"
```

## The Whole Trick

There are three parts.

First, docs are mounted as read-only folders into Bashkit:

```python
docs_mounts = [
    (root / "docs", "/docs/public"),
    (root / "crates/bashkit/docs", "/docs/rustdoc"),
    (root / "examples/bashkit-pi", "/docs/examples/bashkit-pi"),
    (root / "examples/browser", "/docs/examples/browser"),
]
```

Then these mounts are passed to `bashkit.langchain.create_bash_tool()`:

```python
return create_bash_tool(
    username="agent",
    hostname="docs",
    max_commands=120,
    max_loop_iterations=1000,
    timeout_seconds=3,
    mounts=[
        {"host_path": str(host_path), "vfs_path": vfs_path, "writable": False}
        for host_path, vfs_path in docs_mounts
    ],
    files=build_example_files(root),
    allowed_mount_paths=[str(host_path) for host_path, _ in docs_mounts],
    readonly_filesystem=True,
    max_output_length=MAX_CONTEXT_CHARS,
)
```

This gives LangChain agent one tool: run bash inside Bashkit.

Then the tool is attached to agent. This is the whole instantiation:

```python
agent = create_agent(
    model=ChatOpenAI(**llm_kwargs),
    tools=[create_docs_bash_tool(root)],
    system_prompt=SYSTEM_PROMPT,
)
```

LangChain `create_agent` builds LangGraph agent under the hood. Bashkit provides the shell tool. The model writes bash scripts, LangGraph runs the loop, Bashkit executes the script in virtual filesystem.

But not host bash. Bashkit bash. In-process, sandboxed, read-only filesystem, resource limits, capped tool output, no `cp docs /tmp` escape hatch. The example even blocks scratch files, because I wanted to force agent to answer from mounted docs and not build some private cache.

Third part is prompt. Not long, but opinionated:

```text
Call the bash tool before answering Bashkit documentation questions.
Keep tool output compact.
Search progressively with rg, grep, sed, and targeted find.
Prefer grep when you need context flags.
Use only facts present in bash tool output.
Do not treat a failed command as proof that something is absent;
retry with a simpler command when needed.
If the bash output does not answer the question, say the docs snippets do not show it.
```

That is more or less it.

No semantic index hidden somewhere. The model writes shell search scripts. Bashkit runs them. Model answers from output.

## Why Not Just Vector Search

Because vector search is not free architecture.

Once you add it, you have few annoying things:

1. **Index freshness.** Docs changed. Did embeddings update? Did deployment pick right index? Is local dev same as prod?
2. **Chunk boundaries.** The answer is often in lines before or after retrieved chunk. Then you add bigger chunks. Then precision drops.
3. **Exact strings.** CLI flags, error codes, filenames, function names. Semantic search can find them, but shell search is much more honest here.
4. **Debuggability.** With shell search, failed answer shows failed command. With vectors, failed answer often shows "similarity was weird".
5. **Permissions.** Filesystem mounts are easy to reason about. This folder is mounted read-only. This one is absent. Done.

I am not saying "never use vector search". I am saying defaulting to it is lazy.

For docs, code, logs, configs, tickets exported as markdown, product manuals, generated API specs, and many internal knowledge bases, files + shell tools are already strong baseline.

## What Shell Search Gives Agent

There is small but important shift here.

With vector search, agent asks external system "give me relevant stuff." The search system decides relevance.

With shell, agent controls the search plan:

```bash
# discover likely files
find /docs/public /docs/rustdoc -type f -name '*cli*'
find /docs/public /docs/rustdoc -type f -name '*security*'

# broad discovery
rg -i -n 'mount' /docs/public /docs/rustdoc | head -20

# search exact phrase
grep -R -i -n -C 2 -- 'read-only mount' /docs/public /docs/rustdoc

# read selected range after finding the file and line
sed -n '40,90p' /docs/public/security.md

# broaden after no result
grep -R -i -n -C 2 -- 'readonly\|writable\|mount' /docs/public /docs/rustdoc
```

And yes, prompt in the example says Bashkit `rg` is intentionally simpler than full ripgrep, and Bashkit `find` does not support every GNU expression. That is exactly the kind of local tool knowledge you want in system prompt. Agent does not need perfect Unix. It needs accurate constraints for this runtime.

## Safety Is Not Optional

If you give agent real shell over real docs directory, it will eventually do something stupid.

Bashkit makes this much more boring:

- docs mounted read-only at `/docs/public`, `/docs/rustdoc`, `/docs/examples`;
- full Bashkit filesystem is read-only;
- execution has command count, loop, timeout, and output limits;
- self-test checks that write attempts fail.

## Where Semantic Search Still Fits

I would still use semantic search when corpus is messy, natural-language-heavy, or not file-shaped. Customer support conversations. Long prose. Lots of synonyms. No stable keywords. Then embeddings earn their place.

But for many agent systems, the better first step is different:

1. Put source of truth into files.
2. Mount files into safe virtual environment.
3. Give agent shell tools it already understands.
4. Let it inspect, search, and compose commands.
5. Add semantic search only if shell-search baseline is not enough.

This also works nicely with structured data. JSON files, CSV exports, OpenAPI specs, markdown docs, logs. Agent can use `rg`, `grep`, `sed`, `awk`, `jq` when available. It can build search as sequence of concrete observations.

## Small But Useful Rule

Do not let agent be file browser.

In the example prompt, if user asks to list or dump directories, agent should refuse raw listing and explain it answers Bashkit docs questions. Listing files is allowed only as internal discovery for a specific question.

This avoids turning docs bot into slow `tree` command. It also reduces accidental data exposure in larger systems.

This is where virtual filesystem is nice. You decide what exists for agent. Prompt decides how it can use it. Runtime enforces what prompt cannot.

## The Point

I like this pattern because it is stupid simple:

```text
question -> agent -> bash search script -> docs snippets -> answer
```

And because it is simple, it is easy to test. Easy to inspect. Easy to constrain. Easy to run locally.

Semantic search is optimization. Sometimes good optimization. Sometimes needed.

But if your agent can safely access data as files, start with shell search.

That is 80% of the value. Everything else is retrieval engineering.

## Code

[Bashkit docs search agent](https://github.com/everruns/bashkit/tree/main/examples/docs-grep-agent)
