MistByteAI

Practical local AI tooling for local LLM workflows: Search API, RAG integrations, time-aware prompting, and reproducible setups on real hardware constraints.

Node.js
Linux
Text Generation WebUI
Search API
RAG
Time-aware prompts

Search API V1 / WebSearcher

Explicit web search integration for Text Generation WebUI. Built for deterministic, debuggable retrieval: search → fetch → extract → cite.

Time Injector

A tiny plugin that injects CURRENT_DATE / CURRENT_TIME into prompts so smaller local models keep a sane timeline. Designed to be machine-readable (no fluff, no “source”).

What’s next (short roadmap)

I also share honest performance notes — e.g. Qwen 80B with 256k context on 6GB VRAM can do ~8–9 tok/s once the context is warmed, but the first response after resuming a long chat can be slow due to prefill.

Support

If you find these tools useful, support helps me keep building, testing, and maintaining this work. Crypto is currently the simplest option.

Bitcoin (BTC)
bc1qvshupepuvwwtmgnzspfsqh3up72k8e3pme50te
Solana (SOL / USDT / USDC)
4MMz9PvPrmRR88RvB59JhdQrzqnMkj73ByYd5DM3wxZa
EVM (Ethereum / BNB / USDT / USDC)
0x50f859193a7f314df994b8be152661f0fa7064c8
Prefer stablecoins: USDT / USDC. Any amount helps. You can also support by starring the repos, testing, and reporting issues.