Allow load_table_from_dataframe to Ignore Extra Schema Fields #1812
Labels
api: bigquery
Issues related to the googleapis/python-bigquery API.
priority: p3
Desirable enhancement or fix. May not be included in next release.
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
Description:
Environment details
OS: MacOS Sonoma 14.1.1
Python version: 3.10
google-cloud-bigquery version: 3.17.1
Steps to reproduce
Create a BigQuery schema with additional fields not present in the DataFrame.
Use load_table_from_dataframe with the defined schema to load data into BigQuery.
Current behavior
Currently, when using load_table_from_dataframe from the Python BigQuery client, if the provided schema contains fields that are not present in the DataFrame, a ValueError is raised:
ValueError: bq_schema contains fields not present in dataframe: {'field_not_present'}.
Expected behavior
In contrast to the command line behavior when loading JSON data into a BigQuery table, the Python client currently requires a strict match between the DataFrame columns and the provided schema. This behavior can be limiting, as the command line tool does not enforce this match when loading json data.
I propose that load_table_from_dataframe be enhanced to allow a more flexible schema matching, similar to the command line tool's behavior. Specifically, it should not raise an error if the schema contains additional fields not present in the DataFrame. This would allow for more versatile data loading scenarios where the DataFrame might not always have the complete set of fields defined in the BigQuery table schema.
Use case
This feature would be particularly useful in scenarios where the DataFrame is dynamically generated and might not always contain the full set of fields as per the BigQuery schema. Allowing the function to ignore extra schema fields would enable more flexible and robust data loading operations.
The text was updated successfully, but these errors were encountered: