Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user provide VertexAI submit related parameter throught _EvaluatableLanguageModel. evaluate functoin when using Vertex AI Model Evaluation #3691

Open
hsuyuming opened this issue Apr 29, 2024 · 2 comments
Assignees
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.

Comments

@hsuyuming
Copy link

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Is your feature request related to a problem? Please describe.
Normally when we try to create/submit VertexAI pipeline job[1], we would like to provide our own service account, and network setting (e.g: network, reserved_ip_ranges), Unfortunately, when using VertexAI Model Evaluate evaluate[2], when we call model.evaluate, it is not allow us pass those submit related parameters into evaluate then bypass it to submit function.

[1] https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/pipeline_jobs.py#L104-L383
[2] https://github.com/googleapis/python-aiplatform/blob/main/vertexai/language_models/_evaluatable_language_models.py#L586-L675

Describe the solution you'd like
I hope this python sdk able to allow user provide submit related params when they use execute evaluate function.
Describe alternatives you've considered

class _EvaluatableLanguageModel:

    """Mixin class for LLMs that support model evaluation."""




    # TODO (b/282975912): convert training job specific args to a TrainingConfig

    def evaluate(

        self,

        *,

        task_spec: _EvaluationTaskSpec,

        only_summary_metrics: Optional[bool] = True,

        machine_type: Optional[str] = None,

        reserved_ip_ranges: Optional[List[str]] = None,


        service_account: Optional[str] = None,

        network: Optional[str] = None

    ) -> Union[

        EvaluationMetric,

        EvaluationClassificationMetric,

        EvaluationSlicedClassificationMetric,

    ]:

        """Runs model evaluation using the provided input and ground truth data.




        This creates an evaluation job and blocks until the job completes, about

        10 - 20 minutes.




        Example:

        ```

        model = TextGenerationModel.from_pretrained("text-bison@001")

        eval_metrics = model.evaluate(

            task_spec=EvaluationTextGenerationSpec(

                ground_truth_data="gs://my-bucket/ground-truth.jsonl",

            )

        )

        ```




        Args:

            task_spec (_EvaluationTaskSpec):

                Required. The configuration spec for your model evaluation job. Choose the spec corresponding

                with the evaluation task you are performing, one of: EvaluationClassificationSpec, EvaluationTextGenerationSpec,

                EvaluationTextSummarizationSpec, EvaluationQuestionAnsweringSpec.




                For example, a valid classification `task_spec` is:

                EvaluationTextClassificationSpec(

                    ground_truth_data=["gs://bucket/path/to/your/data.jsonl"],

                    class_names=["cheddar", "gouda", "camembert"],

                    target_column_name="cheese_type",

                )

            only_summary_metrics (bool):

                Optional. Setting this field to False only affects the metrics returned for text classification tasks.

                When False, text classification metrics will include additional sliced metrics fields, with metrics for

                each label slice in the data.

            machine_type (str):

                Optional. The type of the machine to run the evaluation job on. The default value is "e2-highmem-16". For

                tasks with a large evaluation dataset, a bigger machine type may be required.

                For more details about this input config, see

                https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types.




        Returns:

            Union[EvaluationMetric, EvaluationClassificationMetric, List[EvaluationClassificationMetric]]

                The evaluation metrics from this evaluation job. When `only_summary_metrics=False` is passed

                and the evaluation task type is 'text-classification', the return type will be List[EvaluationClassificationMetric],

                where each value in the list is the metrics associated with a particular classification label.

        """




        model_info = _model_garden_models._get_model_info(

            self._model_id,

            schema_to_class_map={self._INSTANCE_SCHEMA_URI: type(self)},

        )

        model_name = _get_model_resource_name_and_validate(

            model_name=self._model_resource_name, model_info=model_info

        )




        # TODO(b/296402511): get service_account from aiplatform_initializer and pass it to the template here and to PipelineJob after cl/539823838 is submitted

        template_params = _populate_eval_template_params(

            task_spec=task_spec,

            model_name=model_name,

            machine_type=machine_type,

            network=aiplatform_initializer.global_config.network,

            encryption_spec_key_name=aiplatform_initializer.global_config.encryption_spec_key_name,

        )




        template_path = _get_template_url(task_spec.task_name)




        pipeline_job = aiplatform.PipelineJob(

            template_path=template_path,

            parameter_values=template_params,

            display_name=f"llm-eval-sdk-{aiplatform_utils.timestamped_unique_name()}",

        )

        pipeline_job.submit(

            network=network,

            service_account=service_account,

            reserved_ip_ranges: Optional[List[str]] = None,

        )




        eval_job = _LanguageModelEvaluationJob(pipeline_job=pipeline_job)




        _LOGGER.info(

            "Your evaluation job is running and will take 15-20 minutes to complete. Click on the PipelineJob link to view progress."

        )




        # NOTE: only_summary_metrics is passed because getting metrics from the artifact is faster than downloading from GCS

        # GCS is only needed for additional metrics for text-classification tasks

        return eval_job.result(only_summary_metrics=only_summary_metrics)

Additional context
Nope

@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Apr 29, 2024
@Ark-kun
Copy link
Contributor

Ark-kun commented May 4, 2024

we would like to provide our own service account, and network setting (e.g: network

These parameters are already supported. They can be specified in vertexai.init(...).

@hsuyuming
Copy link
Author

@Ark-kun Ark-kun self-assigned this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.
Projects
None yet
Development

No branches or pull requests

2 participants