Key Benefits
- Semantic Intelligence: Hybrid keyword and semantic search ensures high relevance from natural language queries without complex engineering.
- Context Control: Granular filters (domain, time, content) reduce noise and align results with specific use cases.
- Cost & Latency Optimization: Retrieving highlights instead of full content minimizes token consumption, essential for Agentic workflows.
Search Configuration
- Safety (
safesearch):"strict"(Default): Excludes adult content. Standard for public-facing chatbots to prevent toxic content generation."off": Unfiltered results. Required only for specialized research agents (e.g., medical or biological studies) where standard safety filters might incorrectly flag necessary anatomical or scientific content.
Precision Filtering
-
Domain Filtering (
include_domains/exclude_domains): Functions as a whitelist or blacklist.- Trusted Knowledge Base: Using
include_domainsto restrict retrieval to high-authority sources (e.g., official documentation,.gov, or.edusites) creates a “walled garden” that significantly reduces hallucination risks in professional contexts. - Noise Reduction: Using
exclude_domainsto filter out user-generated content platforms or content farms prevents the LLM from ingesting colloquial or unverified information.
- Trusted Knowledge Base: Using
-
Content Constraints (
include_text/exclude_text): Enforces or forbids specific keywords within the page content. For example, requiring “quarterly earnings” to appear when searching for financial reports, or excluding “rumor” to filter out speculative content. -
Time Sensitivity:
- Breaking News Mode: Combining
time_basis: "published"with a strictstart_time(e.g., past 24 hours) forces the engine to ignore SEO-optimized evergreen content. This strategy is essential for news summarization or market analysis agents.
- Breaking News Mode: Combining
-
Result Count (
count): Defaults to 5. For direct Q&A tasks, retrieving 3-5 results typically offers the best balance between context availability and latency. Higher counts (10+) are recommended for broad topic aggregation tasks.
Response Content & Format
-
Highlights vs. Full Content:
- Highlights (Default): Returns relevant, concise snippets. This is the most token-efficient format for Fact-Checking and Q&A, where the answer is likely contained in a single paragraph.
- Full Content: Returns parsed page text. Necessary for “Reading Assistant” agents that need to summarize entire articles, analyze writing style, or extract scattered data points from a long report.
- Hybrid Strategy (Highlight-First): A cost-effective pattern involves requesting highlights first to assess relevance, and then triggering a second request for
full_contentonly on the specific high-value URLs.
-
Output Format (
format): Controls the output format of the highlight snippets."text"(Default): Returns plain text."markdown": Returns with basic formatting (e.g., bolding of matching terms) where supported.
Performance and Usage Considerations
- Token Economy: The
meta.usagefield monitors consumption. To minimize operational costs, applications should default tohighlightsand only requestfull_contentwhen user intent explicitly requires deep reading. - Metadata Utilization: Fields like
time_publishedshould be used for secondary ranking on the client side (e.g., prioritizing the absolute newest article among the top 5).