How to Build a Context Builder App with BisenseAI

Data Engineering•Difficulty: Intermediate•Time to Implement: 1–2 hours

Who This Guide Is For

Product engineers, solutions architects, and ops leads who need a branded knowledge base—not a one-off ChatGPT upload. You have manuals, contracts, or wiki exports and want semantic search with citations inside your own app or MCP clients.

Prerequisites

BisenseAI workspace with BisenseFlow backend canvas and Weaver frontend access
Vector database account (Pinecone, Weaviate, or Chroma) with API key in BisenseAI secrets
Embedding provider credentials (OpenAI, Google Generative AI, or compatible endpoint)
Sample corpus: 3–5 PDFs or Markdown files under 20 MB each for playground testing
Basic understanding of chunk size vs retrieval quality (800–1200 token targets)
Optional: Google Drive node for automated re-ingest
Optional: LangSmith or LangFuse project for production tracing

Key Outcomes

→Ingest workflow: File Input → Text Splitter → Embeddings → Vector Store
→Query workflow with multi-query expansion, top-K retrieval, and grounded LLM answers with citations
→Weaver admin upload UI and end-user search/chat bound to workflow I/O
→Metadata schema for document_id, source, and role-based filtered retrieval
→Re-index strategy that deletes stale vectors before re-embedding updated files
→Deployed REST API or MCP server validated in the interactive playground

Core Challenge

Teams need AI that answers from their own documents—but pasting files into chat hits token limits and causes hallucinations when the model guesses beyond the corpus.

Production RAG requires intelligent chunking, consistent embeddings, vector storage with access-control metadata, and retrieval that returns only relevant passages at query time.

Operators must add and replace documents without duplicating vectors or breaking citations—something spreadsheets and one-off scripts cannot maintain.

BisenseAI addresses this with BisenseFlow for ingest/retrieval graphs and Weaver for upload/search UI, tested in the playground and deployed as API or MCP.

2025-2026 research confirms that naive chunk-and-embed RAG fails on enterprise corpora: Anthropic contextual retrieval reduces retrieval failures 49-67%, hybrid BM25+dense with reciprocal rank fusion fixes acronym and SKU queries that pure embeddings miss, and Cohere Rerank 3 adds a final precision layer before the LLM sees context. Teams evaluating with RAGAS find faithfulness below 0.80 unacceptable for customer-facing support; BisenseFlow Text Splitter, dual Retriever branches, and grounded LLM nodes map directly to this stack without custom orchestration code. Embedding upgrades to text-embedding-3-large or Voyage voyage-3 require planned re-index windows but deliver measurable context_recall gains on technical documentation.

What You Will Build

A Context Builder product: Weaver admin upload wired to a BisenseFlow workflow that chunks, embeds, and writes to your vector store.

End users search or chat with answers constrained to retrieved chunks and citation IDs linking to source passages.

Two backend entry points—ingest and query—share Vector Store config and metadata schema, with optional Google Drive sync and webhook re-index.

Production deploy exposes REST endpoints for your SaaS or MCP resources for Claude Desktop and other MCP hosts.

Platform Architecture on BisenseAI

BisenseFlow hosts LangChain-compatible Text Splitter, Embeddings, and Vector Store nodes (Pinecone, Weaviate, Chroma). Control-flow nodes branch on ingest success and loop batch uploads.

Weaver provides App Nodes for upload, search, and chat; I/O configuration binds UI fields to workflow inputs. Real-time execution and time-travel debugging inspect chunk boundaries before bulk embed.

Deploy ingest and query separately or as subgraphs. Enable LangSmith/LangFuse on both paths for retrieval latency, embedding failures, and token usage per tenant.

┌──────────────┐     ┌─────────────────────────────────────────┐
│ Weaver UI    │     │ BisenseFlow Backend                      │
│ Upload/Chat  │────▶│ File Input → Text Splitter → Embeddings │
└──────┬───────┘     │                    ↓                     │
       │             │              Vector Store (Pinecone)     │
       │             │                    ↑                     │
       └────────────▶│ Text Input → Retriever → LLM + Citations │
                     └─────────────────────────────────────────┘
                              │ Deploy: API / MCP
                              ▼
                     ┌─────────────────┐
                     │ LangSmith trace │
                     └─────────────────┘

Semantic Chunking on BisenseFlow

Text Splitter nodes use markdown-aware or recursive strategies so chunks respect headings and code—not arbitrary cuts. Set chunk_size 800–1200 and overlap 100–150 in the node panel. Attach metadata (source_file, department, effective_date) per chunk for filtered search. Enable Anthropic-style contextual retrieval by inserting an LLM preprocessing node that situates each chunk with document title and section path before embedding. Benchmark chunk boundaries in time-travel debug before bulk ingest; markdown-aware splitting at 1000 tokens with 120 overlap is the 2025 default for policy and wiki corpora.

Multi-Provider Vector Stores

Swap Pinecone, Weaviate, or Chroma via one Vector Store node without rebuilding Weaver UI. Map index, namespace, and metadata filters in secrets. Self-hosted teams use Chroma/pgvector; SaaS teams use managed Pinecone.

Grounded Answers with Citations

Retriever topK 8–12 feeds an LLM with strict system prompt: answer only from CONTEXT, cite chunk_id per claim. Optional Query Transform LLM expands vague questions. Logic branch returns insufficient_context when similarity scores are below threshold. Add Cohere Rerank 3 after hybrid RRF merge to lift precision on the top-8 passages fed to the grounded LLM. Log rerank scores in LangSmith; drop chunks below rerank threshold 0.3 to trigger insufficient_context branch instead of hallucinating.

Weaver Upload and Search Experience

File I/O on App Node triggers ingest; show chunks_indexed on success. Chat panels call query workflow with optional streaming. Trigger Nodes schedule nightly Google Drive re-sync for ops-managed folders.

Backend Logic Canvas (BisenseFlow)

File Input or Google Drive node loads PDF, DOCX, Markdown
Text Splitter (recursive/markdown) chunk_size 1000, overlap 120
Embeddings node (OpenAI text-embedding-3-small or Google)
Vector Store upsert with document_id, source_url, content_hash, allowed_roles
Logic: delete vectors by document_id before re-upload upsert
Query: Text Input → Query Transform LLM (optional) → retriever top_k=10
Similarity threshold Logic before grounded LLM
LLM JSON Output: answer, citations[], scores
HTTP webhook for CMS-triggered re-index
LangSmith/LangFuse tracing on ingest and query
Deploy ingest and query as separate API routes
MCP resource for search_knowledge_base

Frontend Canvas (Weaver Studio)

App Node: document upload with File I/O and progress
App Node: search bar and chat bound to query workflow
Logic Node: loading, empty, and error states from status codes
Citation list with source_filename and excerpt
Trigger Node: scheduled Google Drive re-sync
Admin route gating upload vs public query
Filter dropdown passing metadata filters to retriever
Real-time execution status during long ingests
Time-travel debug link for support roles
Playground embed for QA before deploy
Optional React import for branded layout
Deploy Weaver with API base URL from project settings

Node Configuration Reference

File Input (ingest)

MIME allowlist: PDF, DOCX, Markdown; max 25 MB in I/O panel. Output raw_document → Text Splitter.

Disable image extract for text-only RAG in v1.

Text Splitter

RecursiveCharacterTextSplitter or MarkdownHeaderTextSplitter; chunk_size 1000, overlap 120.

Enable add_start_index metadata when store supports offset citations.

Embeddings

Provider + model in secrets; batch 64. Confirm embedding_dimensions in playground match index.

Connect output to Vector Store upsert port.

Vector Store

Index, namespace per tenant, metadata document_id, allowed_roles, content_hash.

Retriever: top_k=10, include_metadata=true, filter from user_roles input.

LLM (grounded answer)

System: answer ONLY from CONTEXT; cite [chunk_id]; refuse if insufficient. Temperature 0.2, max_tokens 1024.

Template: CONTEXT:\n{chunks}\n\nQUESTION:\n{question}.

Query Transform LLM

Output JSON queries[] with 3 search variants from user question.

Use Gemini Flash or Haiku to limit latency before retrieval.

Chunking strategy and retrieval quality

Retrieval quality depends more on chunk boundaries than embedding model. Mid-header splits produce fragments that match queries but lack context. Use MarkdownHeaderTextSplitter for wikis; for PDFs convert to markdown then recursive split with overlap. Time-travel debug Text Splitter output before bulk embed; tune chunk_size and overlap from trace feedback.

Store page_number, section_title, document_version in metadata. Pass allowed_roles into Vector Store filters. Wrong answers often trace to splitter settings or missing Query Transform synonyms—not model size.

Re-indexing and vector lifecycle

Compute content_hash at ingest; on upload delete vectors where document_id matches before upsert. Log delete_count and insert_count per run for ops dashboards.

Weekly Trigger reconciles Google Drive hashes vs index metadata. CMS webhooks trigger ingest on publish. GDPR delete workflow removes vectors and File Output blobs in one audited run.

Implementing Reciprocal Rank Fusion on BisenseFlow

Reciprocal rank fusion merges ranked lists from dense and sparse retrievers without requiring score normalization, which differ wildly between cosine similarity and BM25 raw scores. For each document ID appearing in either list, compute RRF_score = sum(1 / (k + rank_i)) where k=60 is the standard constant and rank_i is the 1-based position in each list. Sort by RRF_score descending and take top-N for reranking.

Implement in a Custom Python node receiving two JSON arrays from parallel Retriever branches. Deduplicate by chunk_id, preserve metadata from the higher-ranked occurrence, and emit fused results to the Cohere Rerank HTTP node. LangSmith spans should tag dense_rank, sparse_rank, and fused_rank per chunk for debugging recall gaps.

Latest Research & Industry Context (2025–2026)

Anthropic Contextual Retrieval Cuts RAG Failures by 49-67%

Anthropic published contextual retrieval in September 2024, demonstrating that prepending chunk-specific context to each embedded segment before indexing reduces retrieval failure rates by 49% when using embeddings alone and up to 67% when combined with reranking. The technique addresses a core RAG failure mode: isolated chunks lose document-level meaning, so queries about cross-section topics retrieve wrong passages or nothing at all. Production teams on BisenseFlow implement this as a preprocessing LLM node that generates 50-100 token situating summaries per chunk before the Embeddings node runs, trading modest upfront token cost for dramatically better recall on policy manuals, contracts, and multi-chapter wikis.

BisenseAI Context Builder workflows should run contextualization on ingest, not at query time, so embeddings and BM25 sparse indexes both benefit from enriched text. Store the original chunk text separately in metadata for citation display while embedding the contextualized version. Benchmark on a held-out question set using RAGAS faithfulness and context_recall metrics before and after enabling contextual retrieval; teams typically see context_recall lift from the mid-60s to high-80s on technical corpora.

Pair contextual retrieval with metadata filters (department, effective_date, allowed_roles) on the Vector Store node so contextualized chunks remain RBAC-scoped. When re-indexing, delete vectors by document_id first, regenerate contextual summaries only for changed sections, and upsert atomically to avoid duplicate or stale embeddings in Pinecone or Chroma namespaces.

Sources: https://www.anthropic.com/news/contextual-retrieval · https://docs.cohere.com/docs/rerank-2 · OpenAI Embeddings API docs

Hybrid BM25+Dense Retrieval and Cohere Rerank 3 in Production RAG

Hybrid retrieval combining dense vector search with BM25 sparse keyword matching via reciprocal rank fusion (RRF) has become the 2025-2026 default for production RAG, outperforming either method alone on acronym-heavy, SKU-coded, and legal corpora. Dense embeddings from OpenAI text-embedding-3-large (3072 dimensions) or Voyage voyage-3 capture semantic paraphrase, while BM25 excels when users query exact product codes, statute numbers, or error strings that embedding models smooth over.

On BisenseFlow, implement hybrid search by running parallel Retriever branches: one Vector Store similarity query with top_k=20 and one HTTP or custom Python node calling your store hybrid endpoint (Pinecone hybrid, Weaviate hybrid, or Elasticsearch with kNN). Merge results with RRF (score = sum(1/(k+rank))) in a Logic node before passing the fused top-8 to the grounded LLM. Cohere Rerank 3 or rerank-english-v3.0 as a final reranking step on the fused set typically adds another 10-15% nDCG improvement on enterprise benchmarks.

Monitor hybrid latency in LangSmith: reranking 20 candidates adds 100-200ms but saves far more in downstream LLM tokens by eliminating irrelevant context. Set similarity thresholds in Logic nodes to return insufficient_context when max fused score falls below 0.35, preventing hallucinated answers on out-of-corpus questions.

RAGAS Eval and Embedding Model Selection for Knowledge Bases

RAGAS (Retrieval Augmented Generation Assessment) provides reference-free metrics including faithfulness, answer_relevancy, context_precision, and context_recall that teams run nightly against golden question sets without human labels for every query. BisenseAI operators export 50-100 representative questions from Weaver search logs (PII-redacted), attach expected source document_ids where known, and schedule a Trigger Node workflow that runs the query graph, scores outputs with RAGAS via a Custom Python node, and posts regressions to Slack when faithfulness drops below 0.85.

Embedding model choice materially affects retrieval quality: text-embedding-3-large consistently outperforms text-embedding-3-small on MTEB retrieval subsets at 2-3x storage cost, while Voyage voyage-3 offers competitive multilingual performance for global support portals. Never mix embedding models in the same index; plan a full re-embed migration with blue-green namespaces when upgrading.

Chunking strategy remains the highest-leverage tuning knob: markdown-aware recursive splitting at 800-1200 tokens with 100-150 overlap preserves heading hierarchy for contextual retrieval prompts. Overlap prevents boundary artifacts where a table header lands in one chunk and rows in the next. Use BisenseFlow time-travel debugging to inspect chunk boundaries on 5 representative documents before bulk ingest.

Sources: https://docs.ragas.io/ · MTEB leaderboard 2025

Step-by-Step: Build in BisenseAI

1
Create ingest workflow on BisenseFlow
New project → BisenseFlow canvas → workflow context-builder-ingest. Add File Input with MIME allowlist.
Connect output to Text Splitter input port.
2
Chain splitter, embeddings, vector store
Text Splitter chunk_size 1000, overlap 120. Embeddings node with secrets. Vector Store upsert with metadata fields.
Playground one PDF; time-travel verify chunks.
3
Add delete-before-reindex branch
Logic: if document_id exists, Vector Store delete by filter then upsert.
Re-upload same file; vector count must not double.
4
Build query workflow
Workflow context-builder-query: question, roles → retriever → grounded LLM → JSON citations.
Add similarity threshold insufficient_context branch.
5
Validate playground end-to-end
Ingest samples; query with doc-only questions. Confirm citations match chunks in time-travel.
Tune top_k and chunk_size from traces.
6
Weaver admin upload UI
App Node file upload → ingest I/O. Show chunks_indexed from JSON output.
Restrict route to admin role.
7
Weaver search and chat UI
Search/chat bound to query workflow. Display answer + citations.
Use AI-assisted linking for I/O ports.
8
Optional Google Drive sync
Google Drive Input + nightly Trigger. Same splitter pipeline for new files.
Log failures to Sheet via Composio.
9
Enable observability
Connect LangSmith/LangFuse; tag ingest vs query. Alert on embedding errors.
Review first 50 production traces.
10
Deploy REST API
Deploy panel → separate ingest and query endpoints. Store API keys in gateway.
Configure per-tenant rate limits.
11
Optional MCP deploy
Deploy query as MCP Server; test Claude Desktop config snippet.
Document internal connection steps.
12
Production smoke test
Run productionChecklist items. Upload, query, update, delete test document.
Verify RBAC filter blocks cross-department retrieval.

Production Checklist

Chunk boundaries reviewed in playground
Metadata includes document_id, content_hash, allowed_roles
Delete-before-upsert tested
Citations on every factual claim
Similarity threshold prevents weak answers
Secrets in BisenseAI secrets manager only
LangSmith/LangFuse enabled with tenant_id
Ingest API rate limited
PII review documented
Admin upload separated from user query
Vector index backup procedure documented
GDPR delete runbook assigned

Common Pitfalls

Arbitrary chunk sizes

Fixed small splits break tables and clauses. Validate header-aware splitting in playground before bulk ingest.

Skipping delete on re-upload

Duplicate vectors cause contradictory answers. Always delete by document_id before upserting updates.

Unfiltered retrieval

Without metadata filters, restricted docs leak across teams. Map JWT roles to Vector Store filters on every query.

Mega-chunks in LLM context

Too many large chunks blow tokens. Cap top_k and truncate chunk text before the LLM node.

No observability

Bad answers are undebuggable without traces. Enable LangSmith from day one.

Frequently Asked Questions

Should I use text-embedding-3-small or text-embedding-3-large for my Context Builder?

Start with text-embedding-3-small for prototypes and playgrounds where index size and cost matter more than recall. Move to text-embedding-3-large or Voyage voyage-3 when RAGAS context_recall on your golden set falls below 0.80 or when users frequently report missing answers that exist in the corpus. On BisenseFlow, swap the Embeddings node provider in one place; Weaver UI bindings stay unchanged. Budget roughly 3x storage in Pinecone for large embeddings and plan a re-index window since vectors are not interchangeable across models.

How do I implement Anthropic-style contextual retrieval on BisenseAI?

Add an LLM node between Text Splitter and Embeddings that takes each chunk plus document title and section path, outputting a brief contextual prefix. Concatenate prefix + original chunk for embedding; store original text in metadata for citations. Use a fast cheap model (GPT-4o-mini or Claude Haiku) for contextualization during ingest. Enable LangSmith tracing on this node to monitor token spend; contextual retrieval typically adds 15-25% ingest cost but reduces query-time failures enough to lower overall LLM spend.

What is the recommended hybrid retrieval setup on BisenseFlow?

Run dense retrieval (top_k=20) and BM25 or sparse retrieval (top_k=20) in parallel, merge with reciprocal rank fusion in a Logic node, then rerank the top 15 with Cohere Rerank 3 before sending top-8 to the grounded LLM. Pinecone and Weaviate both expose hybrid endpoints; self-hosted Chroma teams often add an Elasticsearch sidecar for BM25. The Weaver search UI stays bound to a single query workflow endpoint regardless of internal retrieval complexity.

How do I handle document updates without duplicate vectors?

Before re-ingest, run a Vector Store delete filtered by document_id metadata. Upsert new chunks atomically in the same workflow transaction. Store content_hash in metadata and skip re-embed when hash unchanged. Google Drive Trigger Nodes on BisenseFlow can poll folders nightly; Logic branches compare hash to last indexed value. Weaver admin UI should show last_indexed_at per document so operators trust freshness.

Can I expose my Context Builder as an MCP resource for Claude Desktop?

Yes. Deploy the query workflow as an MCP server from BisenseAI project settings. Register a search_knowledge_base tool with query and optional metadata filter parameters matching your retriever schema. Keep ingest as a separate REST webhook or admin-only MCP tool with stronger auth. MCP clients discover tools via the 2025-11-25 spec manifest; test with Claude Desktop before publishing to customers.

How do I evaluate RAG quality before launch?

Build a golden set of 50+ questions with known source passages. Run RAGAS faithfulness, context_recall, and answer_relevancy nightly via a scheduled BisenseFlow workflow. Block production deploy when faithfulness drops more than 5 points week-over-week. Use the interactive playground with time-travel to inspect retrieval scores per query during tuning. Export traces to LangFuse for tenant-level dashboards once live.

Give your AI a perfect memory

Start with Vector Store and Text Splitter nodes on a free BisenseAI workspace.

Explore BisenseAI Product