Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactoring of Timeout logic introduced incompatibility about RPC Timeout #654

Open
yoshizow opened this issue May 10, 2024 · 1 comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@yoshizow
Copy link

Hello,

I'm using the python-pubsub library and noticed that the timeout behavior is strange. After some investigation, I believe it is due to an incompatibility introduced by #462.

The default retry and timeout settings in the python-pubsub library are as follows:
https://github.com/googleapis/python-pubsub/blob/v2.21.1/google/pubsub_v1/services/publisher/transports/base.py#L166-L181

default_retry=retries.Retry(
    initial=0.1,
    maximum=60.0,
    multiplier=4,
    predicate=retries.if_exception_type(
        core_exceptions.Aborted,
        core_exceptions.Cancelled,
        core_exceptions.DeadlineExceeded,
        core_exceptions.InternalServerError,
        core_exceptions.ResourceExhausted,
        core_exceptions.ServiceUnavailable,
        core_exceptions.Unknown,
    ),
    deadline=600.0,
),
default_timeout=60.0,

Before #462 was introduced, default_timeout=60.0 was interpreted as default_timeout=ConstantTimeout(60).
This should be intended to set the RPC Timeout, i.e., the timeout for a single RPC call not including retries, to 60 seconds.
However, after the introduction of #462, this line is now interpreted as default_timeout=TimeToDeadlineTimeout(60).
Since TimeToDeadlineTimeout determines the total timeout time across retries rather than determining the RPC Timeout, I don't believe that it can be used as a direct replacement for ConstantTimeout.

The strange behavior I mentioned at the beginning came from this: when the server did not respond in the first 60 seconds, the remaining timeout for RPC call was 0 seconds, and the gRPC client returned a 504 Deadline Exceeded on every retry.

TimeToDeadlineTimeout by itself is useful, but given the original intent, I think it is necessary to specify the RPC Timeout as well as the total timeout.

Finally, in my project, I implemented the following ClampMaxTimeout so that the RPC Timeout can be set while respecting the behavior of TimeToDeadlineTimeout.

RPC_TIMEOUT_SECS   = 60.0
TOTAL_TIMEOUT_SECS = 600.0
...
publisher_options=pubsub_v1.types.PublisherOptions(
    timeout=compose(
        api_core.timeout.TimeToDeadlineTimeout(TOTAL_TIMEOUT_SECS),
        ClampMaxTimeout(RPC_TIMEOUT_SECS),
    ),
    ...
...
class ClampMaxTimeout(object):
    def __init__(self, max_timeout: float):
        self._max_timeout = max_timeout

    def __call__(self, func: Callable) -> Callable:
        @functools.wraps(func)
        def func_with_max_timeout(*args, **kwargs):
            if kwargs.get("timeout") is not None:
                kwargs["timeout"] = min(kwargs["timeout"], self._max_timeout)
            else:
                kwargs["timeout"] = self._max_timeout
            return func(*args, **kwargs)

        return func_with_max_timeout

    def __str__(self) -> str:
        return "<ClampMaxTimeout max_timeout={:.1f}>".format(self._max_timeout)

def compose(f: Callable, g: Callable) -> Callable:
    return lambda x: f(g(x))

To keep backward compatibility, I believe it is desirable to introduce a similar process in python-api-core.

@vchudnov-g
Copy link
Contributor

Thanks for reporting this issue! I had a quick look and this appears to be an instance where we should at least be clearer, and possibly implement the change we suggest. We'll look into this.


Notes as we investigate:

@vchudnov-g vchudnov-g added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants