Allow user provide VertexAI submit related parameter throught _EvaluatableLanguageModel. evaluate functoin when using Vertex AI Model Evaluation #3691

hsuyuming · 2024-04-29T16:16:17Z

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Is your feature request related to a problem? Please describe.
Normally when we try to create/submit VertexAI pipeline job[1], we would like to provide our own service account, and network setting (e.g: network, reserved_ip_ranges), Unfortunately, when using VertexAI Model Evaluate evaluate[2], when we call model.evaluate, it is not allow us pass those submit related parameters into evaluate then bypass it to submit function.

[1] https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/pipeline_jobs.py#L104-L383
[2] https://github.com/googleapis/python-aiplatform/blob/main/vertexai/language_models/_evaluatable_language_models.py#L586-L675

Describe the solution you'd like
I hope this python sdk able to allow user provide submit related params when they use execute evaluate function.
Describe alternatives you've considered

class _EvaluatableLanguageModel:

    """Mixin class for LLMs that support model evaluation."""




    # TODO (b/282975912): convert training job specific args to a TrainingConfig

    def evaluate(

        self,

        *,

        task_spec: _EvaluationTaskSpec,

        only_summary_metrics: Optional[bool] = True,

        machine_type: Optional[str] = None,

        reserved_ip_ranges: Optional[List[str]] = None,


        service_account: Optional[str] = None,

        network: Optional[str] = None

    ) -> Union[

        EvaluationMetric,

        EvaluationClassificationMetric,

        EvaluationSlicedClassificationMetric,

    ]:

        """Runs model evaluation using the provided input and ground truth data.




        This creates an evaluation job and blocks until the job completes, about

        10 - 20 minutes.




        Example:

        ```

        model = TextGenerationModel.from_pretrained("text-bison@001")

        eval_metrics = model.evaluate(

            task_spec=EvaluationTextGenerationSpec(

                ground_truth_data="gs://my-bucket/ground-truth.jsonl",

            )

        )

        ```




        Args:

            task_spec (_EvaluationTaskSpec):

                Required. The configuration spec for your model evaluation job. Choose the spec corresponding

                with the evaluation task you are performing, one of: EvaluationClassificationSpec, EvaluationTextGenerationSpec,

                EvaluationTextSummarizationSpec, EvaluationQuestionAnsweringSpec.




                For example, a valid classification `task_spec` is:

                EvaluationTextClassificationSpec(

                    ground_truth_data=["gs://bucket/path/to/your/data.jsonl"],

                    class_names=["cheddar", "gouda", "camembert"],

                    target_column_name="cheese_type",

                )

            only_summary_metrics (bool):

                Optional. Setting this field to False only affects the metrics returned for text classification tasks.

                When False, text classification metrics will include additional sliced metrics fields, with metrics for

                each label slice in the data.

            machine_type (str):

                Optional. The type of the machine to run the evaluation job on. The default value is "e2-highmem-16". For

                tasks with a large evaluation dataset, a bigger machine type may be required.

                For more details about this input config, see

                https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types.




        Returns:

            Union[EvaluationMetric, EvaluationClassificationMetric, List[EvaluationClassificationMetric]]

                The evaluation metrics from this evaluation job. When `only_summary_metrics=False` is passed

                and the evaluation task type is 'text-classification', the return type will be List[EvaluationClassificationMetric],

                where each value in the list is the metrics associated with a particular classification label.

        """




        model_info = _model_garden_models._get_model_info(

            self._model_id,

            schema_to_class_map={self._INSTANCE_SCHEMA_URI: type(self)},

        )

        model_name = _get_model_resource_name_and_validate(

            model_name=self._model_resource_name, model_info=model_info

        )




        # TODO(b/296402511): get service_account from aiplatform_initializer and pass it to the template here and to PipelineJob after cl/539823838 is submitted

        template_params = _populate_eval_template_params(

            task_spec=task_spec,

            model_name=model_name,

            machine_type=machine_type,

            network=aiplatform_initializer.global_config.network,

            encryption_spec_key_name=aiplatform_initializer.global_config.encryption_spec_key_name,

        )




        template_path = _get_template_url(task_spec.task_name)




        pipeline_job = aiplatform.PipelineJob(

            template_path=template_path,

            parameter_values=template_params,

            display_name=f"llm-eval-sdk-{aiplatform_utils.timestamped_unique_name()}",

        )

        pipeline_job.submit(

            network=network,

            service_account=service_account,

            reserved_ip_ranges: Optional[List[str]] = None,

        )




        eval_job = _LanguageModelEvaluationJob(pipeline_job=pipeline_job)




        _LOGGER.info(

            "Your evaluation job is running and will take 15-20 minutes to complete. Click on the PipelineJob link to view progress."

        )




        # NOTE: only_summary_metrics is passed because getting metrics from the artifact is faster than downloading from GCS

        # GCS is only needed for additional metrics for text-classification tasks

        return eval_job.result(only_summary_metrics=only_summary_metrics)

Additional context
Nope

Ark-kun · 2024-05-04T00:13:10Z

we would like to provide our own service account, and network setting (e.g: network

These parameters are already supported. They can be specified in vertexai.init(...).

hsuyuming · 2024-05-07T19:42:52Z

Hi @Ark-kun , looks like service account is not support yet.
https://github.com/googleapis/python-aiplatform/blob/main/vertexai/language_models/_evaluatable_language_models.py#L650-L657

product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Apr 29, 2024

Ark-kun self-assigned this May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow user provide VertexAI submit related parameter throught _EvaluatableLanguageModel. evaluate functoin when using Vertex AI Model Evaluation #3691

Allow user provide VertexAI submit related parameter throught _EvaluatableLanguageModel. evaluate functoin when using Vertex AI Model Evaluation #3691

hsuyuming commented Apr 29, 2024

Ark-kun commented May 4, 2024

hsuyuming commented May 7, 2024

Allow user provide VertexAI submit related parameter throught _EvaluatableLanguageModel. evaluate functoin when using Vertex AI Model Evaluation #3691

Allow user provide VertexAI submit related parameter throught _EvaluatableLanguageModel. evaluate functoin when using Vertex AI Model Evaluation #3691

Comments

hsuyuming commented Apr 29, 2024

Ark-kun commented May 4, 2024

hsuyuming commented May 7, 2024