Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reasoning_engines.ReasoningEngine.create unable to pickle LangchainAgent using a VertexAISearchRetriever retriever tool #3710

Open
steve-ahlswede opened this issue Apr 30, 2024 · 2 comments
Assignees
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.

Comments

@steve-ahlswede
Copy link

When I try to use the ReasoningEngine to deploy my LangchainAgent, it fails due to being unable to pickle the Agent. This seems to stem from the tool I am using, as if I remove the tool from the Agent it works fine.

Environment details

  • OS type and version: MacOS Monterey v 12.6.8
  • Python version: 3.9.13
  • pip version: Using poetry
  • google-cloud-aiplatform version: 1.48.0

Steps to reproduce

  1. Create a structured Datastore in your GCP project via Terraform using the google_discovery_engine_data_store resource.
  2. Add some structured data to the datastore about movies via the UI (or via the API if you want).
  3. Run below Python code to try and deploy a Langchain agent which uses the datastore as a Langchain retriever tool.

Code example

from langchain_google_community import VertexAISearchRetriever  # type: ignore
from langchain.tools.retriever import create_retriever_tool
from langchain_core.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
    MessagesPlaceholder,
)
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
import vertexai  # type: ignore
from vertexai.preview import reasoning_engines  # type: ignore

PROJECT_ID = "TEMP"  # Set to your Project ID
LOCATION_ID = "global"  # Set to your data store location
DATA_STORE_ID = "example-datastore-id"  # Set to your data store ID
DATASTORE_TYPE = 1  # 1 for structured data
BUCKET_NAME = "bucket-name"
BUCKET_LOCATION = "us-central1"
model = "gemini-1.5-pro-preview-0409"

vertexai.init(
    project=PROJECT_ID,
    location=BUCKET_LOCATION,
    staging_bucket=f"gs://{BUCKET_NAME}",
)

retriever = VertexAISearchRetriever(
    project_id=PROJECT_ID, location_id=LOCATION_ID, data_store_id=DATA_STORE_ID, max_documents=3, engine_data_type=DATASTORE_TYPE
)
tool = create_retriever_tool(retriever, "search_movie_data", "Searches and returns information about movies.")


prompt = ChatPromptTemplate.from_messages(
    [
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

remote_app = reasoning_engines.ReasoningEngine.create(
    reasoning_engines.LangchainAgent(
        model,
        prompt=prompt,
        tools=[tool],
        model_kwargs={"convert_system_message_to_human": True},
    ),
    requirements=[
        "google-cloud-aiplatform[reasoningengine,langchain]",
        "langchain_google_community",
    ],
    display_name="My Test App 50110",
)

Stack trace

Traceback (most recent call last):
  File "/Users/steveahlswede/Repos/genai/genai-builder-temp/examples/deploy_structured.py", line 40, in <module>
    remote_app = reasoning_engines.ReasoningEngine.create(
  File "/Users/steveahlswede/Repos/genai/genai-builder-temp/.venv/lib/python3.9/site-packages/vertexai/reasoning_engines/_reasoning_engines.py", line 228, in create
    _prepare(
  File "/Users/steveahlswede/Repos/genai/genai-builder-temp/.venv/lib/python3.9/site-packages/vertexai/reasoning_engines/_reasoning_engines.py", line 379, in _prepare
    cloudpickle.dump(reasoning_engine, f)
  File "/Users/steveahlswede/Repos/genai/genai-builder-temp/.venv/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 55, in dump
    CloudPickler(
  File "/Users/steveahlswede/Repos/genai/genai-builder-temp/.venv/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
  File "stringsource", line 2, in grpc._cython.cygrpc.Channel.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__
@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Apr 30, 2024
@yeesian
Copy link

yeesian commented May 10, 2024

Thank you for filing an issue and sorry for the slow response!

Unfortunately this isn't going to work out-of-the-box yet, and you'll have to use a workaround by wrapping the tool into a python function like

def search_movie_data(query: str) -> str:
    """Searches and returns information about movies."""
    from langchain_google_community import VertexAISearchRetriever

    retriever = VertexAISearchRetriever(
        project_id=PROJECT_ID,
        data_store_id=DATA_STORE_ID,
        location_id=LOCATION_ID,
        engine_data_type=DATASTORE_TYPE,
        max_documents=3,
    )

    result = str(retriever.invoke(query))
    return result

# ...

remote_app = reasoning_engines.ReasoningEngine.create(
    reasoning_engines.LangchainAgent(
        ...
        tools=[search_movie_data],
        ...
    ),
    ...,
)

You can find an end-to-end example at https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/reasoning-engine/tutorial_vertex_ai_search_rag_agent.ipynb

Can you try it and let us know if the workaround works for you in the meantime?

@akos-sch
Copy link

akos-sch commented May 17, 2024

Found a workaround for this by defining a custom tool with a class:

class SearchInput(BaseModel):
    query: str = Field(description="Query for the retriever")


class MyRetriever(BaseTool):
    name = "name"
    description = "description"
    args_schema: Type[BaseModel] = SearchInput
    project_id: str = "<GCP-PROJECT>"
    location_id: str = "<data-store-location>"
    data_store_id: str = "<data-store-id>"
    engine_data_type: int = 0

    def _run(self, query: str, run_manager: Optional[CallbackManagerForToolRun] = None) -> str:
        """Use the tool."""
        project_id = self.project_id
        location_id = self.location_id
        data_store_id = self.data_store_id
        engine_data_type = self.engine_data_type
        retriever = VertexAISearchRetriever(
            project_id=project_id,
            data_store_id=data_store_id,
            location_id=location_id,
            engine_data_type=engine_data_type,
            max_documents=2,
        )
        return str(retriever.invoke(input=query))

    async def _arun(self, query: str, run_manager: Optional[AsyncCallbackManagerForToolRun] = None) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("custom_search does not support async")

And then just instantiate with MyRetriever() and pass it as a tool to the agent.

It is based on this: https://python.langchain.com/v0.1/docs/modules/tools/custom_tools/#subclass-basetool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.
Projects
None yet
Development

No branches or pull requests

3 participants