-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize a2o utils #655
base: main
Are you sure you want to change the base?
Generalize a2o utils #655
Conversation
…stance as a parameter. Also rename references to "a2o"
…baw functionality to baw_utils
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I branched into a new change in order to handle notebook compatibility and some invisible bookkeeping stuff; hope that's ok. There's still a relevant question about the preferred domain, though.
# Extract the recording UID. Example: | ||
# 'site_0277/20210428T100000+1000_Five-Rivers-Dry-A_909057.flac' -> 909057 | ||
# 'site_0277/20210428T100000+1000_Five-Rivers-Dry-A_909057.wav' -> 909057 | ||
pattern = re.compile(r'.*_(\d+)\.[^\.]+$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's upgrade this to a file-level compiled pattern; it will then compile once on import, saving a few cycles every time we construct a url. Something like:
FILE_ID_TO_UID_PATTERN = re.compile(r'.*_(\d+).[^\.]+$')
yield ex | ||
finally: | ||
session.close() | ||
|
||
|
||
def get_a2o_embeddings_config() -> config_dict.ConfigDict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is thin enough that we might as well move this last function into baw_utils as well.
|
||
|
||
def make_baw_audio_url_from_file_id( | ||
file_id: str, offset_s: float, window_size_s: float, baw_domain: str = "data.acousticsobervatory.org" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo here, which threw me for a loop: data.acousticsobervatory.org
Is there a reason to prefer the data
subdomain vs the api
subdomain we were using before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah no, my mistake. I will fix
a2o_utils had some values hardcoded to only work with the A2O and not other instances of the Bioacoustic Workbench (BAW) instances, such as Ecosounds.
A2O utils that can apply to other BAW instances have been generalized to allow this, and been moved to a new baw_utils module. A2O-specific functionality has stayed in a2o_utils. No changes to the agile_monitoring notebook are needed, since, if no baw_domain is supplied to the BootstrapState constructor, it defaults to the a2o baw_domain.
Also modifies the utility function that parses the audio_recording_id out of a filename to work with filetypes besides flac.