Best Practices / Knowledge Base Guide

Best Practices

Knowledge Base Guide

Learn how to configure and optimize your Knowledge Base for better camelAI performance

The Knowledge Base is a critical feature that enhances camelAI's ability to understand and analyze your data accurately. It provides context-specific information that helps camelAI deliver consistent, relevant insights tailored to your business domain.

What is the Knowledge Base?

The knowledge base is a text area where you define important context about your data, business logic, and terminology. This information helps camelAI:

  • Maintain consistent metric definitions across all queries
  • Navigate complex schemas by understanding which tables to prioritize or avoid
  • Interpret ambiguous column names and relationships
  • Apply proper data formatting and display preferences
  • Handle time periods and date calculations correctly
  • Understand locale-specific requirements (currency, language, regional formats)

Persistent vs Session-Specific Knowledge Base Entries

CamelAI supports two types of knowledge base entries, each designed for different use cases:

Persistent (Stateful) Entries

Persistent entries are created through the /api/v1/knowledge-base/ API endpoint and are tied to your connection IDs. These entries:

  • Persist across all iframes that use the associated connection ID
  • Apply globally to all users and sessions
  • Are ideal for context that applies universally across your organization

Use persistent entries for:

  • Dataset descriptions and schema information
  • Company-wide terminology and metric definitions
  • Standard table relationships and joins
  • Data quality notes that affect all users

Session-Specific (Stateless) Entries

Session-specific entries are provided directly in the iframe creation request via the knowledge_base_entries parameter. These entries:

  • Only apply to that specific iframe instance
  • Do not persist beyond the iframe's lifecycle
  • Work alongside any persistent entries you've already created

Use session-specific entries for:

  • User-specific instructions (e.g., "This user prefers non-technical explanations")
  • Organization-specific context when serving multiple tenants
  • Temporary overrides or custom behavior for specific sessions
  • Locale preferences (e.g., "Please respond in Spanish")

Example: Using Session-Specific Entries

When creating an iframe, you can include temporary knowledge base entries that apply only to that session:

import requests

payload = {
    "uid": "<string>",
    "srcs": ["<string>"],
    "ttl": 900,
    "knowledge_base_entries": [
        "This user is new to the tool and has never used SQL before. Please keep answers non-technical",
        "Please speak in Spanish"
    ],
    "model": "gpt-5",
    "response_mode": "full",
    "show_sidebar": True
}

response = requests.post(
    "https://api.camelai.com/api/v1/iframe/create",
    headers={
        "Authorization": "Bearer <token>",
        "Content-Type": "application/json"
    },
    json=payload
)

These session-specific entries complement (not replace) any persistent knowledge base entries associated with your connection IDs.

Best Practices

1. Always Include a Dataset Description

Every knowledge base should start with a clear description of what your dataset represents. This foundational context helps camelAI understand the overall purpose and structure of your data.

This dataset is a replica of our production e-commerce database. 
It contains customer orders, product inventory, and shipping information 
from our online retail platform serving the US market.

2. Specify Standard Schemas

If your data follows a well-known schema or is a replica of a standard system, explicitly state this. camelAI can leverage its understanding of common schemas to provide better insights.

3. Define Company-Specific Terminology

Document any terms that have specific meanings within your organization, especially when they differ from industry standards or could be ambiguous.

Entry 1: "Active User": A user who has logged in within the last 30 days AND completed at least one transaction
Entry 2: "LTV" (Lifetime Value): The sum of total_purchases + subscription_revenue + addon_revenue columns
Entry 3: "Churn": When a customer has no activity for 90+ days (not the standard 30-day definition)
Entry 4: "Region": Refers to our custom sales territories, not geographic regions (see regions_mapping table)

4. Clarify Complex Relationships

Help camelAI navigate joins and relationships by explaining non-obvious connections between tables.

Table Relationships:
- orders.user_id links to users.id (primary relationship)
- orders.promo_code links to both promotions.code AND partner_promotions.code
- product_variants should be used instead of products table for inventory queries
- Always join transactions through transaction_items, never directly to orders

5. Specify Data Preferences

Include preferences for how data should be formatted, calculated, or displayed.

Data Handling Preferences:
- When calculating percentages, round to 1 decimal place
- Week starts on Monday for all weekly aggregations
- Fiscal year begins April 1st

6. Document Data Quality Issues

Be transparent about known data limitations or quality issues to prevent misleading analyses.

Data Quality Notes:
- Revenue data before March 2022 may be incomplete due to migration
- The user_demographics table has ~15% missing values for age field
- Avoid using the legacy_orders table - use orders_v2 instead
- Product categories were restructured in June 2023; use category_mapping for historical comparisons

Structuring Knowledge Base Entries

Use Multiple Focused Entries

Due to RAG implementation, multiple smaller, focused entries perform better than one large entry.

✅ Good Practice

Entry 1: "Customer segments: Premium (>$1000/year), Standard ($100-999), Basic (<$100)"
Entry 2: "Subscription tiers: Starter ($29/mo), Professional ($99/mo), Enterprise (custom)"
Entry 3: "Geographic regions: NA (US/Canada), EU (European Union), APAC (Asia-Pacific)"

❌ Poor Practice

Customer and Business Context: Our customer segments include Premium customers who spend over $1000 per year, Standard customers who spend $100-999 annually,
and Basic customers under $100. We also have subscription tiers with Starter at $29/month, Professional at $99/month, and Enterprise with custom pricing. 
Our geographic regions cover NA which includes US and Canada, EU covering the European Union, and APAC for Asia-Pacific. Additionally, our fiscal year 
starts April 1st, we calculate LTV as total_purchases plus subscription_revenue plus addon_revenue, and active users must have logged in within 30 days AND
completed a transaction. Our churn definition is 90+ days of inactivity, and regions refer to sales territories not geographic areas.

Managing Your Knowledge Base

You can create, read, update, and delete knowledge base entries through the API or through the developer console. Changes take effect immediately for all new conversations.