Skip to content

feat(search): Semantic Tool Search#149

Open
shashi-stackone wants to merge 47 commits intomainfrom
semantic_search_12111
Open

feat(search): Semantic Tool Search#149
shashi-stackone wants to merge 47 commits intomainfrom
semantic_search_12111

Conversation

@shashi-stackone
Copy link

@shashi-stackone shashi-stackone commented Feb 19, 2026

Problem

Following up from #142

StackOne has over 10,000 actions across all connectors and growing, some connectors have 2,000+ actions alone. Keyword matching breaks
down when someone searches "onboard new hire" but the action is called hris_create_employee. The SDK already supports keyword-based
search, and we need to add semantic search using the action search service.

Implementation Details

  • SemanticSearchClient that calls StackOne's /actions/search API for natural language tool discovery
  • Three ways to use it:
    1. search_tools() search by intent, get a Tools collection ready for OpenAI, LangChain, or any framework
    2. search_action_names() lightweight lookup returning action names and scores without full tool definitions
    3. Utility tools pass a SemanticSearchClient to utility_tools() and the tool_search tool becomes semantic-aware inside
      agent loops
  • Per-connector parallel search so results are scoped to only the connectors the user has linked
  • Automatic fallback to local BM25+TF-IDF hybrid search when the semantic API is unavailable
  • Action name normalization that strips version prefixes (e.g. bamboohr_1.0.0_bamboohr_create_employee_global
    bamboohr_create_employee)
  • Connector helpers (StackOneTool.connector, Tools.get_connectors()) for connector-aware filtering
  • Benchmark suite with 94 evaluation tasks across 8 categories — semantic search achieves 76.6% Hit@5 vs 66.0% for local search (+10.6%
    improvement)

Summary by cubic

Adds semantic tool search so users can find and execute actions with natural language. Searches are scoped to connectors in the fetched tools (and optional project_ids), with a local BM25+TF‑IDF fallback.

  • New Features

    • StackOneToolSet.search_tools() and a callable SearchTool (via get_search_tool()) for agent loops; fully replaces Utility Tools (module removed).
    • SemanticSearchClient with search_action_names(); per‑connector parallel search; respects server ranking/min_similarity unless top_k is set; supports optional project_ids; can be passed into StackOneToolSet (otherwise created lazily).
    • Local keyword fallback moved to local_search.ToolIndex.
    • StackOneTool is directly callable; README and examples updated (search_tool_example.py, semantic_search_example.py).
  • Bug Fixes

    • Scoped searches to available connectors so agents don’t discover tools they can’t execute.
    • Fixed CI and lint issues; restored lazy semantic client creation and cleaned up outdated docs references.

Written for commit f6920c8. Summary will update on new commits.

shashi-stackone and others added 30 commits February 18, 2026 09:51
When utility_tools(semantic_client=...) is used, tool_search now
searches only the connectors available in the fetched tools collection
instead of the full StackOne catalog. This prevents agents from
discovering tools they cannot execute.

- Add available_connectors param to create_semantic_tool_search
- Pass connectors from Tools.utility_tools() to scope searches
- Update docs, examples, and README to reflect scoping
- Add 4 new tests for scoping behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="stackone_ai/utility_tools.py">

<violation number="1" location="stackone_ai/utility_tools.py:342">
P2: An empty `available_connectors` set now falls back to a full-catalog search, which can surface tools the user doesn’t have access to. This contradicts the scoping behavior (“only the user’s own connectors are searched”) and likely returns incorrect results for accounts with no connectors.</violation>
</file>

<file name="stackone_ai/toolset.py">

<violation number="1" location="stackone_ai/toolset.py:407">
P2: Passing top_k directly to the local tool_search limits results before connector filtering, so fallback can return fewer than requested even when matching tools exist for the allowed connectors. Consider keeping an expanded fallback limit (e.g., top_k * N or a safe default) before filtering.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines 227 to 229
utility = tools.utility_tools(semantic_client=toolset.semantic_client)

search_tool = utility.get_tool("tool_search")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It feels like the nicer experience is like my_search_tool = tools.search(client=semantic_search) or my_search_tool = tools.search(client=local_search) and the my_execute_tool = tools.execute()"

what was the reason for not implementing it like this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that approach is clean and agreed, We can implement UtilityTools(Tools) subclass approach which seems to be fully backwards compatible with exiting BM25 local search but not confident enough to make this change as part of this PR as it's a subclass of Tools so nothing should breaks but wondering should I flip this as part of this PR another?

Let me give it it a go ..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willleeney Thanks for this suggestion, I made this change anyways now sooner than later.. Just added a UtilityTools subclass with typed search_tool and execute_tool property accessors.
The pattern is now:

  utility = tools.utility_tools(semantic_client=toolset.semantic_client)
  result = utility.search_tool.call(query="onboard new hire")

However, kept utility_tools() as the config point rather than tools.search(client=...) to separate
concerns, backend choice (semantic vs local) is a one-time decision, while tool access
happens repeatedly. This avoids passing the client every time you access the tool.

Also adopted the min_similarity from the server and removed the client side parsing.

Comment on lines 565 to 613
"""Return utility tools for tool discovery and execution

Utility tools enable dynamic tool discovery and execution based on natural language queries
using hybrid BM25 + TF-IDF search.
Utility tools enable dynamic tool discovery and execution based on natural language queries.
By default, uses local hybrid BM25 + TF-IDF search. When a semantic_client is provided,
uses cloud-based semantic search for higher accuracy on natural language queries.

Args:
hybrid_alpha: Weight for BM25 in hybrid search (0-1). If not provided, uses
ToolIndex.DEFAULT_HYBRID_ALPHA (0.2), which gives more weight to BM25 scoring
and has been shown to provide better tool discovery accuracy
(10.8% improvement in validation testing).
hybrid_alpha: Weight for BM25 in hybrid search (0-1). Only used when
semantic_client is not provided. If not provided, uses DEFAULT_HYBRID_ALPHA (0.2),
which gives more weight to BM25 scoring.
semantic_client: Optional SemanticSearchClient instance. Pass
toolset.semantic_client to enable cloud-based semantic search.

Returns:
Tools collection containing tool_search and tool_execute
UtilityTools collection with search_tool and execute_tool accessors

Note:
This feature is in beta and may change in future versions

Example:
# Semantic search (pass semantic_client explicitly)
toolset = StackOneToolSet()
tools = toolset.fetch_tools()
utility = tools.utility_tools(semantic_client=toolset.semantic_client)
result = utility.search_tool.call(query="onboard new hire")

# Local BM25+TF-IDF search (default, no semantic_client)
utility = tools.utility_tools()
result = utility.search_tool.call(query="onboard new hire")
"""
from stackone_ai.utility_tools import (
ToolIndex,
create_tool_execute,
create_tool_search,
)
from stackone_ai.utility_tools import create_tool_execute

# Create search index with hybrid search
index = ToolIndex(self.tools, hybrid_alpha=hybrid_alpha)
if semantic_client is not None:
from stackone_ai.utility_tools import create_semantic_tool_search

search_tool = create_semantic_tool_search(
semantic_client, available_connectors=self.get_connectors()
)
execute_tool = create_tool_execute(self)
return UtilityTools([search_tool, execute_tool])

# Create utility tools
# Default: local BM25+TF-IDF search
from stackone_ai.utility_tools import ToolIndex, create_tool_search

index = ToolIndex(self.tools, hybrid_alpha=hybrid_alpha)
filter_tool = create_tool_search(index)
execute_tool = create_tool_execute(self)

return Tools([filter_tool, execute_tool])
return UtilityTools([filter_tool, execute_tool])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should have the argument that is passed to utility_tools be either search_method: str or search_client: SearchClient. I think that search_method: str = "bm25" works best. We know that if they pass "semantic" then we can just create the semantic search client here or inside the create semantic search tool function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is great idea, I have added the search_method with the value of the bm25 and senamtic .. It makes things way simpler.

Comment on lines 610 to 613
filter_tool = create_tool_search(index)
execute_tool = create_tool_execute(self)

return Tools([filter_tool, execute_tool])
return UtilityTools([filter_tool, execute_tool])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_tool should be search_tool to match the semantic definition

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make it sync with bm25 I used filter_tool but I think its right time to move to the seerch_tool as its being the standard.

# Search for employee management tools
result = filter_tool.call(query="manage employees create update list", limit=5, minScore=0.0)
# Search for employee management tools
result = utility_tools.search_tool.call(query="manage employees create update list", limit=5)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need search_tool.call() not search_tool()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't have __call__ to StackOneTool s but added as it make sense and reads the code much better .. Updated examples to use tool(query="...") instead of tool.call(query="..."). .call() and .execute()
still work as before.

"""Utility tools collection with typed accessors for search and execute tools."""

@property
def search_tool(self) -> StackOneTool:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w/r to the last comment which says search_tool() should make the call.

we should make this get_search_tool() which should remove tool = self.get_tool("tool_search") call and put the logic in here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, renamed search_tool property to get_search_tool() method with inline lookup . Same for execute_tool aligned with naming

Comment on lines 659 to 663
"""
tool = self.get_tool("tool_execute")
if tool is None:
raise StackOneError("tool_execute not found in this UtilityTools collection")
return tool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as with search

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as well

"default": 5,
"nullable": True,
},
"minSimilarity": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this parameter not be min_similarity to match the api? + same with top_k?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. renamed all tool parameters to snake_case to match the API: minSimilarity, tomin_similarity, minScore to min_score, limit to top_k .. I also updated the existing BM25 and semantic variants for so that it remains consistent..


# Search for tools
result = filter_tool.execute(
result = search_tool.execute(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_tool.execute() or search_tool.search() ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure about this .execute() is the standard StackOneTool interface and all tools use it. we just added __call__ which makes use get_search_tool()(query="..."), so users won't interact with .execute() directly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think .execute is the standard for executing a specific tool

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 8 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="stackone_ai/utility_tools.py">

<violation number="1" location="stackone_ai/utility_tools.py:228">
P2: The new argument parsing drops backward compatibility for `limit`/`minScore`, so existing callers will silently get default values instead of their requested limits/scores. Consider accepting the legacy keys as fallbacks.</violation>

<violation number="2" location="stackone_ai/utility_tools.py:344">
P2: The semantic tool search now ignores legacy `limit`/`minSimilarity` arguments, so existing integrations will fall back to defaults. Add backward-compatible fallbacks to preserve behavior.</violation>
</file>

<file name="stackone_ai/models.py">

<violation number="1" location="stackone_ai/models.py:642">
P2: Removing the `search_tool`/`execute_tool` properties is a breaking API change: `utility.search_tool` now returns a method rather than a StackOneTool, so existing integrations calling `.search_tool.call(...)` will fail. Consider keeping property aliases that return `get_search_tool()`/`get_execute_tool()` to preserve compatibility.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@shashi-stackone
Copy link
Author

Thanks @willleeney for great feedback on this PR and suggesting some fundamental changes in the SDKs API which aligned to future direction. I think its good time to make these changes and also making a note to update the usage in the integration where the SDK usage is already integrated (or will be integrated in future) e.g ADK, Pydantic or other places.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="stackone_ai/toolset.py">

<violation number="1" location="stackone_ai/toolset.py:632">
P2: Passing the uninitialized `_semantic_client` breaks semantic utility search after `fetch_tools()`. `utility_tools(search_method="semantic")` now raises because `_semantic_client` remains `None` unless the property was accessed earlier. Restore lazy initialization here so Tools gets a valid client.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


# Search for tools
result = filter_tool.execute(
result = search_tool.execute(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

Comment on lines 332 to 341
from stackone_ai import StackOneToolSet

toolset = StackOneToolSet()

# Search by intent — returns Tools collection ready for any framework
tools = toolset.search_tools("manage employee records", account_ids=["your-account-id"], top_k=5)
openai_tools = tools.to_openai()

# Lightweight: inspect results without fetching full tool definitions
results = toolset.search_action_names("time off requests", top_k=5)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. This is the abstraction that we want.

I like that we can inject the semantic client at toolset = StackOneToolSet(semantic_client).
I also think it's good that we have account_ids as a parameter for search_tools.

My last area of confusion is that in the examples we have the ability to do this it like this (and this is what the examples suggest.

toolset = StackOneToolSet()
all_tools = toolset.fetch_tools(account_ids=_account_ids)
utility = all_tools.utility_tools(search_method="semantic")
search_result = utility_tools.get_search_tool()(query="list all employees", top_k=1)
tools_found = search_result.get("tools", [])

This feels chaotic. We have tools which are the available tools given the amount ids (this makes sense). But then we have a separate abstraction for utility_tools and for search and execute. Can we just get rid of the utility_tools layer and instead do all_tools.get_search_tool(search_method='semantic')?

Also why is search_result here not the same as tools from toolset.search_tools() and so we then have to do some get("tools") on the search result.

My suggestion is to have this instead:

all_tools = toolset.fetch_tools(account_ids=_account_ids)
search_tool = toolset.get_search_tool(semantic_client=...)
tools = search_tool("manage employee records")

hence removing the extract utility tools abstraction as it doesn't feel needed on top of the StackOneToolSet abstraction + aligning the search_tool.call() with toolset.search_tools()

@shashi-stackone
Copy link
Author

Hey @willleeney Thanks for suggesting above change. After removing the utility_tools abstraction SDK looking so clean. Please have look at the latest changes and let me know if there still chances of improvements ..

As we are planning further refactor I am also thinking to include the following either as pert of this PR or furhter PRS

  • Adding tool parameter schema needed for each Agent frameworks (Explain later if needed)
  • Adding more conversations in the SDKs to_pydantic, to_adk, to_dspy etc natievly so that conversion become one liner
  • Explore more enhancements as we go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants