Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error and warning improvements #122

Merged
merged 7 commits into from
Aug 14, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 16 additions & 9 deletions deepgram/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,25 @@
from .scopes import Scopes
from .invitations import Invitations
from .extra import Extra
from .errors import DeepgramSetupError, DeepgramApiError


class Deepgram:
def __init__(self, options: Union[str, Options]) -> None:
if not isinstance(options, str) and not options.get('api_url'):
raise ValueError("DG: API URL must be valid or omitted")
t_options: Options = {
'api_key': options
} if isinstance(options, str) else options
if 'api_key' not in t_options:
raise ValueError("DG: API key is required")
self.options = t_options
if not isinstance(options, (str, dict)):
raise DeepgramSetupError("`options` must be a dictionary or an API key string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using the name DeepgramAuthError rather than DeepgramSetupError? Is the setup only for API key / authentication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line you highlighted is not an Auth error and instead implies that the user passed the incorrect type to the constructor. We could also throw a TypeError here, but DeepgramSetupError keeps the error codes consistent. Also DeepgramSetupError inherits from ValueError which is applicable when the user passes the wrong value/type to the constructor.


# Convert to dictionary if the api key was passed as a string
if isinstance(options, str):
options: Options = {"api_key": options}

if 'api_key' not in options:
raise DeepgramSetupError("API key is required")

if "api_url" in options and options.get("api_url", None) is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consistently use single or double quotes, not both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree. The existing code has a mix of single quotes and double quotes. Python's standard is double quotes, so that's what I use. I did not want to add a ton of formatting changes to this PR, so I kept them minimal. I can change this one in particular because you raised it, but I'd like to go through and reformat the entire codebase in a later PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I hadn't noticed that before! Yeah agreed that it's better served by doing a cleanup sweep in a separate PR.

raise DeepgramSetupError("API URL must be valid or omitted")

self.options = options

@property
def keys(self) -> Keys:
Expand Down Expand Up @@ -59,4 +66,4 @@ def extra(self) -> Extra:
return Extra(self.options)


__all__ = ["Deepgram"]
__all__ = [Deepgram, DeepgramSetupError, DeepgramApiError]
2 changes: 2 additions & 0 deletions deepgram/_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ class UpdateResponse(TypedDict):
class Options(TypedDict, total=False):
api_key: str
api_url: str # this URL should /not/ include a trailing slash
suppress_warnings: bool
raise_warnings_as_errors: bool
Comment on lines +30 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these have default values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypedDicts do not have default values, and none of the other classes in this file have default values. Technically these types should have Optional around them, but very few of the other optional fields in this file use Optional so I left it off here as well. This is something we can clean up in the future.

We could also convert the TypedDicts to dataclasses (I prefer dataclasses to TypedDicts). Alternatively, we could use TypedDict's Required and NotRequired types defined in this pep: https://peps.python.org/pep-0655/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference - do you think TypedDicts are marginally more performant, but dataclasses are more flexible? We do have an awful lot of TypedDicts already, so it makes sense to defer to that standard for now.

My interest in an explicit default value is mainly to make clear what the behavior will be when the user doesn't set anything. But I guess in this case, it's logical for warnings to be warnings by default... rather than being suppressed or errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, TypedDicts are more performant than dataclasses. I forgot the SDKs are written with high performance as a top priority.

I completely agree with you, it would be really nice to have default values.



class UrlSource(TypedDict):
Expand Down
66 changes: 55 additions & 11 deletions deepgram/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@
import platform
import websockets
import websockets.client
import websockets.exceptions
import warnings
from ._constants import DEFAULT_ENDPOINT
from ._types import Options
from ._version import __version__
from .errors import DeepgramApiError

Payload = Optional[Union[dict, str, bytes, IO]]

Expand Down Expand Up @@ -94,32 +97,60 @@ async def _request(
timeout = aiohttp.ClientTimeout(total=timeout)

async def attempt():
# Define `content` outside the try block so we can differentiate between errors
# thrown by `aiohttp.request` and `resp.raise_for_status()`. This can be removed
# after the aiohttp issue mentioned below is fixed.
content = None
try:
async with aiohttp.request(
method, destination, data=_normalize_payload(payload),
headers=updated_headers, raise_for_status=True, timeout=timeout
headers=updated_headers, timeout=timeout
) as resp:
# The error handling for this async code is more difficult than normal
# because aiohttp does not include the response body in the exception
# object. We therefore need to read the response body before throwing
# an error, and attach the response body to the error. This should
# be fixed by aiohttp in v4.x. See the github issue here for info:
# https://github.com/aio-libs/aiohttp/issues/4600
# Once that work is complete, we can use `raise_for_status=True` as a
# parameter in the `aiohttp.request` call.
content = (await resp.text()).strip()
resp.raise_for_status()
if not content:
return None
body = json.loads(content)
if body.get('error'):
raise Exception(f'DG: {content}')
if body.get('error') or body.get('err_msg'):
raise DeepgramApiError(body, http_library_error=None)
if options.get("raise_warnings_as_errors") and "metadata" in body and "warnings" in body["metadata"]:
raise DeepgramApiError({"warnings": body["metadata"]["warnings"]}, http_library_error=None)
if (not options.get("suppress_warnings")) and "metadata" in body and "warnings" in body["metadata"]:
for warning in body["metadata"]["warnings"]:
warnings.warn(warning["message"])
return body
except aiohttp.ClientResponseError as exc:
raise (Exception(f'DG: {exc}') if exc.status < 500 else exc)
# If the request returned a response body and errored, return the response body.
# Otherwise return the request's error message.
if content is not None:
try:
body = json.loads(content)
raise DeepgramApiError(body, http_library_error=exc) from exc if exc.status < 500 else exc
except:
pass
else:
raise DeepgramApiError(exc.message, http_library_error=exc) from exc if exc.status < 500 else exc
except aiohttp.ClientError as exc:
raise exc
raise DeepgramApiError(exc, http_library_error=exc) from exc

tries = RETRY_COUNT
while tries > 0:
try:
return await attempt()
except aiohttp.ClientError as exc:
if isinstance(payload, io.IOBase):
raise exc # stream is now invalid as payload
# stream is now invalid as payload
# the way aiohttp handles streaming form data
# means that just seeking this back still runs into issues
raise DeepgramApiError(exc, http_library_error=exc) from exc
tries -= 1
continue
return await attempt()
Expand Down Expand Up @@ -148,21 +179,34 @@ def attempt():
if not content:
return None
body = json.loads(content)
if body.get('error'):
raise Exception(f'DG: {content}')
if body.get('error') or body.get('err_msg'):
raise DeepgramApiError(body, http_library_error=None)
if options.get("raise_warnings_as_errors") and "metadata" in body and "warnings" in body["metadata"]:
raise DeepgramApiError({"warnings": body["metadata"]["warnings"]}, http_library_error=None)
Comment on lines +184 to +185
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative for when nested dictionary keys might not be present is chaining gets:

if options.get("raise_warnings_as_errors") and body.get("metadata", {}).get("warnings", {}):

The streaming test suite uses that style and I've been a fan of it.

This applies in a few cases throughout the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I personally prefer the in operator rather than nested .gets. For complex "contains" checks, I have to really look a the .gets and the parentheses to understand how the code branches and which entries are default values vs. conditionals. For example, metadata and warnings are conditionals, but both {}s are default values. If I do not read a comma or parenthesis correctly, I may misunderstand which is which. This is particularly true when strings are used as default values.

On the other hand, I can typically read if statements with with in operator left-to-right (as if it were a sentence) and understand the logic. (Which I hope is true for the if statement I wrote.)

Python does not have a style recommendation for this, but if Deepgram wants to define it then that works for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside I see with the in syntax is that it gets verbose and repetitive for nested dictionaries. But you're only going 2 levels deep so it's not too bad. I'm fine with it as-is and I don't know of a specific style guide to be followed here. (Seems like the absence of a DG Python style guide is becoming a theme in this PR discussion!)

if (not options.get("suppress_warnings")) and "metadata" in body and "warnings" in body["metadata"]:
for warning in body["metadata"]["warnings"]:
warnings.warn(warning["message"])
return body
except urllib.error.HTTPError as exc:
raise (Exception(f'DG: {exc}') if exc.code < 500 else exc)
# Try to get the HTTP error's request body. Deepgram returns a JSON response
# body for error messages. We want to return that as the error message if
# it exists.
try:
error_body = json.loads(exc.read().decode("utf-8"))
except Exception:
error_body = str(exc)
raise DeepgramApiError(error_body, http_library_error=exc) from exc

tries = RETRY_COUNT
while tries > 0:
try:
return attempt()
except urllib.error.URLError as exc:
if isinstance(payload, io.IOBase):
raise exc # stream is now invalid as payload
# stream is now invalid as payload
# the way aiohttp handles streaming form data
# means that just seeking this back still runs into issues
raise DeepgramApiError(exc, http_library_error=exc) from exc
tries -= 1
continue
return attempt()
Expand All @@ -186,7 +230,7 @@ async def attempt():
destination, extra_headers=updated_headers, ping_interval=5
)
except websockets.exceptions.InvalidHandshake as exc:
raise Exception(f'DG: {exc}')
raise DeepgramApiError(exc, http_library_error=exc) from exc

tries = RETRY_COUNT
while tries > 0:
Expand Down
87 changes: 87 additions & 0 deletions deepgram/errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
from typing import Union, Optional
import websockets.exceptions
import urllib.error
import aiohttp
import uuid


class DeepgramError(Exception):
pass


class DeepgramSetupError(DeepgramError, ValueError):
pass


class DeepgramApiError(DeepgramError):
"""An error returned by the Deepgram API.

If the error was raised by an http client, the client's error message
is accessible via the `http_library_error` field. This may be useful
for handling different error codes, such as 429s or 503s.

The `error` field is set to the API's error message (dict), if avilable.
Otherwise the `error` field is set to the parent exception's message (str).

The `warning` field is set to the API's warning messages (list[str]), if available.

The `request_id` field is set to the API's request ID, if available.
"""
def __init__(
self,
*args: object,
http_library_error: Optional[
Union[
urllib.error.HTTPError,
urllib.error.URLError,
websockets.exceptions.InvalidHandshake,
aiohttp.ClientResponseError,
Comment on lines +35 to +38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we are presenting all of these errors as a type of Deepgram error? Would we maybe want other non-DG errors to bubble up rather than wrapping them as our own?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous iteration of the code, the Python SDK was catching all non-DG errors and returning them as Exception, sometimes without using the from operator (!!!). That means the non-DG errors - and the information available in them - were inaccessible.

I agree that we could allow these errors to bubble up on their own rather than catch them and wrap them in a Deepgram error. It would allow users to be more explicit about how to handle errors, but it would also make error handling more difficult for the user because they would need to handle multiple exception types.

The two approaches are targeted at different audiences: The implementation where errors are wrapped up into a single DG exception makes it easy to do simply error handling and therefore makes it easy for newer programmers to handle failed API requests. On the other hand, the implementation where errors bubble up requires the user to be explicit about how they handle the different use cases, which is likely more useful for power users.

The code I wrote grabs the underlying non-DG error and adds it as a field to the DG error. I hope this satisfies both user groups because: the programmers with less experience can simply catch one error, and the power users can grab the underlying HTTP error and perform more advanced error handling.

Aside: I said, "the implementation where errors bubble up requires the user to be explicit about how they handle the different use cases". That isn't completely true because users could always write except Exception: to catch all errors, but I try to highly discourage the use of except Exception. It is almost always too broad. For example, if a user passes an incorrect type that results in a TypeError, they wouldn't want their retry logic to trigger and keep making API calls with the same TypeError.

After that long explanation, I hope this code serves both new users and power users by making it easy to handle failed API requests in both simple and complex ways.

aiohttp.ClientError,
]
] = None,
):
super().__init__(*args)
self.http_library_error = http_library_error
self.error: Union[str, dict] # If you change the type, change it in the docstring as well!
self.warnings: Optional[list[str]] = None # If you change the type, change it in the docstring as well!
self.request_id: Optional[uuid.UUID] = None
self.http_error_status: Optional[int] = None

# Set the `error`, `warning`, and `request_id` fields
if isinstance(args[0], dict) and "err_msg" in args[0]:
self.error = args[0]["err_msg"]
if "metadata" in args[0] and "warnings" in args[0]["metadata"]:
self.warnings = args[0]["metadata"]["warnings"]
elif "warnings" in args[0]: # Occurs when `raise_warnings_as_errors` is enabled
self.warnings = args[0]["warnings"]
if "metadata" in args[0] and "request_id" in args[0]["metadata"]: # Occurs when Deepgram returns a success response (for warnings)
self.request_id = uuid.UUID(args[0]["request_id"])
elif "request_id" in args[0]: # Occurs when Deepgram returns a failed response
self.request_id = uuid.UUID(args[0]["request_id"])
elif isinstance(args[0], str):
self.error = args[0]
else:
self.error = str(args[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're referring to args[0] a lot here, it would be helpful for readability to give that a named variable so it's clearer what that arg is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed. I created a new variable: error_or_warning_data = args[0]


# Set the error code from the underlying exception, if possible
if http_library_error is not None:
# Note: The following Exceptions do not have HTTP error codes:
# - urllib.error.URLError
# - websockets.exceptions.InvalidHandshake
# - aiohttp.ClientError
if isinstance(http_library_error, urllib.error.HTTPError):
self.http_error_status = http_library_error.code
elif isinstance(http_library_error, aiohttp.ClientResponseError):
self.http_error_status = http_library_error.status

def __str__(self) -> str:
if self.request_id:
if self.warnings:
warning_string = f"\n\n{self.warnings}"
else:
warning_string = ""
if self.http_error_status:
return f"Request `{self.request_id}` returned {self.http_error_status}: {self.error}" + warning_string
else:
return f"Request `{self.request_id}` returned {self.error}" + warning_string
return super().__str__()
15 changes: 8 additions & 7 deletions deepgram/transcription.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
LiveTranscriptionResponse, Metadata, EventHandler)
from ._enums import LiveTranscriptionEvent
from ._utils import _request, _sync_request, _make_query_string, _socket_connect
from .errors import DeepgramApiError


class PrerecordedTranscription:
Expand Down Expand Up @@ -50,8 +51,9 @@ async def __call__(
"""

if 'buffer' in source and 'mimetype' not in source:
raise Exception(
'DG: Mimetype must be provided if the source is bytes'
raise DeepgramApiError(
'Mimetype must be provided if the source is bytes',
http_library_error=None,
)
payload = cast(
Union[bytes, Dict],
Expand Down Expand Up @@ -106,8 +108,9 @@ def __call__(
"""

if 'buffer' in source and 'mimetype' not in source:
raise Exception(
'DG: Mimetype must be provided if the source is bytes'
raise DeepgramApiError(
'Mimetype must be provided if the source is bytes',
http_library_error=None,
)
payload = cast(
Union[bytes, Dict],
Expand Down Expand Up @@ -334,7 +337,7 @@ class Transcription:
return transcription responses for audio files and live audio streams, respectively.

"""

def __init__(self, options: Options) -> None:
self.options = options

Expand All @@ -354,7 +357,6 @@ async def prerecorded(
self.options, full_options, endpoint
)(source, timeout=timeout)


def sync_prerecorded(
self, source: TranscriptionSource,
options: PrerecordedOptions = None,
Expand All @@ -371,7 +373,6 @@ def sync_prerecorded(
self.options, full_options, endpoint
)(source, timeout=timeout)


async def live(
self, options: LiveOptions = None, endpoint = "/listen", **kwargs
) -> LiveTranscription:
Expand Down