What Caching Does

In FAQ Ally, caching stores the AI agent’s answers for specific questions so that the next time the same (or a very similar) question is asked, the system can return the stored response instead of running a new search and generation. Each cached entry is tied to the question text and the agent’s knowledge at the time the answer was created.

Caching is applied automatically for chat and API requests: when a query matches an existing cache entry and that entry is still within its TTL (Time to Live), the cached response is returned. When the TTL expires or the cache is cleared (for example during retraining), the next request for that question will get a fresh answer from the current model and documents.

What Caching Is Good For

  • Faster responses: Cached answers are returned with lower latency than a full retrieval-and-generation run.
  • Lower cost and load: Fewer calls to the underlying AI and vector search mean lower usage and more headroom for traffic. Cached answers do not count against your site’s query limits, so your plan goes farther.
  • Stable answers for common questions: Repeated questions get the same answer while the cache is valid, which can be useful for consistency and support.

Caching works best when many users ask similar questions (e.g. common FAQs). You can also warm the cache by pre-populating it with answers to your top questions so that the first real user to ask them already gets a fast, cached response.

Cache Warnings When Retraining

When you retrain an AI agent (e.g. add or remove documents, change settings), the agent’s knowledge changes. FAQ Ally clears that agent’s cache so that all future answers reflect the updated content. Before you confirm retraining, the dashboard shows a cache warning so you’re aware of the impact.

The warning may say something like:

  • “This will clear X cached response(s) for this agent.” when the system knows how many entries will be removed, or
  • “Cached responses for this agent will be cleared during retraining.” as a general notice when the count isn’t available.

This is intentional: cached answers were built from the old knowledge, so they must be invalidated. After retraining, new requests will repopulate the cache over time, or you can use Warm Cache to pre-fill it again (see below).

How to Use Cache Warning and Warm Cache

1 See the cache warning when retraining

In the dashboard, when you start a retrain (e.g. from the agent card or training flow), the confirmation step will include the cache warning. Read it and confirm only when you’re ready for that agent’s cache to be cleared. There’s no separate “cache warning” screen—the warning is part of the retrain confirmation.

2 Warm the cache after training or when you want faster first answers

From the AI Agents list, open the agent and use the option to open the Warm Cache dialog. There you can:

  • Optionally set a cache TTL override for that agent (how long each cached response stays valid).
  • Run Warm Cache to send the top questions through the agent and store the answers in the cache.

Warming uses the same TTL as normal queries (or your override). After warming, the first user to ask one of those questions will get the cached answer instead of waiting for a full run.

3 Rely on cache warnings to avoid surprises

Whenever an action would clear the cache (such as retraining), the UI will show the cache warning in the confirmation message. Use it to decide when to retrain and whether to run a warm cache afterward so high-traffic questions stay fast.

Summary

Caching stores answers for repeated or similar questions to improve speed and reduce load. It’s good for common FAQs and high-traffic agents. Cache warnings appear when retraining (or other actions) are about to clear the cache, and the Warm Cache dialog lets you pre-populate the cache with your top questions. Use the warning to plan retrains and use warming to keep important questions fast after updates.