Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing number of stream in case of using bqstorage client #1695

Open
nitishxp opened this issue Oct 23, 2023 · 0 comments
Open

Passing number of stream in case of using bqstorage client #1695

nitishxp opened this issue Oct 23, 2023 · 0 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@nitishxp
Copy link

nitishxp commented Oct 23, 2023

Hi,

I am using GCP batch to download the data from the BigQuery table using the bqStorage library

num_consumers = 2
iteratable = client.list_rows(bq_table_id)
iteratable._preserve_order = True // to ensure that only 1 read stream is created
for page in iteratable.to_arrow_iterable(bqstorage_client=client, max_queue_size=num_consumers * 2):
     data: list[dict] = [json.loads(json.dumps(d, default=str)) for d in page.to_pylist()]) // do something which Python object

The problem that I start to face is the number of threads spawned when _preserver_order is not set which causes it to spawn up to 128 threads in case using (e2-medium, e2-small) machine type hangs the program can we add an argument in the function to control the number of reads stream that we have.

eg:
for page in iteratable.to_arrow_iterable(bqstorage_client=client, max_queue_size=num_consumers * 2, no_of_stream=<>):

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Oct 23, 2023
@Linchin Linchin added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants