Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: omit read_session with latest google-cloud-bigquery-storage #748

Merged
merged 13 commits into from
Jul 16, 2021

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Jul 9, 2021

read_session is unnecessary as of google-cloud-bigquery-storage>=2.6.0.
This will allow us to more loudly deprecate the use of rows(read_session). googleapis/python-bigquery-storage#229

Rather than require 2.6.0, version switches will allow us to keep our
requirements range wider. Will want to give this version some time to bake
before making it the minimum version. #747

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #742 🦕

`read_session` is unnecessary as of `google-cloud-bigquery-storage>=2.6.0`.
This will allow us to more loudly deprecate the use of `rows(read_session)`.

Rather than require 2.6.0, version switches will allow us to keep our
requirements range wider. Will want to give this version some time to bake
before making it required.
@tswast tswast requested a review from a team July 9, 2021 21:20
@tswast tswast requested a review from a team as a code owner July 9, 2021 21:20
@tswast tswast requested review from stephaniewang526 and removed request for a team July 9, 2021 21:20
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 9, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 9, 2021
Comment on lines 74 to 75
if installed_version >= _BQ_STORAGE_OPTIONAL_READ_SESSION_VERSION:
_IS_BQ_STORAGE_READ_SESSION_OPTIONAL = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially be tricky, because the value of the _IS_BQ_STORAGE_READ_SESSION_OPTIONAL "constant" depends on whether _verify_bq_storage_version() has been called or not.

It would IMO be more straightforward if we performed version detection at import time and already make the final decision on the value of _IS_BQ_STORAGE_READ_SESSION_OPTIONAL then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't import google.cloud.bigquery_storage in _download_table_bqstorage_stream. Since that function gets a client and other objects passed in to it.

How about we make sure we call _verify_bq_storage_version() from _download_table_bqstorage_stream() and optimize to not call the version parsing too often?

Copy link
Contributor

@plamut plamut Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overhead of parsing a version is most likely negligible compared to the request/response times, I was more concerned that changing a module "constant" might represent a landmine in the "right" constellation of the planets.

Perhaps just adding a comment that this private "constant" is not a constant and can change when calling _verify_bq_storage_version() would be enough? You know, just to warn the readers about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think there's a way to do this without "constants" that aren't actually constant. I think maybe I'll create a little class to hold our version comparisons.

@tswast tswast requested a review from plamut July 14, 2021 20:14
@@ -26,7 +26,7 @@
from google.cloud._helpers import _RFC3339_MICROS
from google.cloud._helpers import _RFC3339_NO_FRACTION
from google.cloud._helpers import _to_bytes
import pkg_resources
import packaging.version
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add type annotations, but it turns out pkg_resources is just a thin wrapper around packaging.

https://github.com/pypa/setuptools/blob/a4dbe3457d89cf67ee3aa571fdb149e6eb544e88/pkg_resources/__init__.py/#L112

plamut
plamut previously approved these changes Jul 15, 2021
Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and clean!

Consider maybe adding a few unit tests directly testing the BQStorageVersions class, although it does seem that the class is tested indirectly through other tests (the coverage does not complain).

Edit: I mean, the coverage check does complain, but not about _helpers.BQStorageVersions.

Comment on lines 49 to 51
class BQStorageVersions:
def __init__(self):
self._installed_version = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version checker, sir, is classy (pun intended). Well done!

should thus be used in places where this assumption holds.

Because `pip` can install an outdated version of this extra despite the constraints
in setup.py, the the calling code can use this helper to verify the version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I now noticed you copied my typo, let's remove it.

Suggested change
in setup.py, the the calling code can use this helper to verify the version
in `setup.py`, the calling code can use this helper to verify the version

@plamut plamut self-requested a review July 15, 2021 13:41
@plamut plamut dismissed their stale review July 15, 2021 13:42

found a typo

@tswast
Copy link
Contributor Author

tswast commented Jul 15, 2021

Turns out the coverage error was due to my use of a lambda that never executed. I'll remove those and add a few tests for the underlying class in my next commit.

@plamut plamut merged commit 4ff8bed into googleapis:master Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

don't pass read_session to BQ Storage API rows from to_dataframe and to_arrow
2 participants