Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalog_to_dd.py: input and output of dt.cc #516

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

erhmestel
Copy link

What does this PR do?

Editing catalog_to_dd.py to allow input of existing correlation measurements and selection of output filename. Defaults keep everything the same as it currently is.

  1. existing_corr_file option added to write_correlations & compute_differential_times. Default None skips new sections of code.
  2. existing_corr_file read in with new function: _read_correlation_file returning a existing event pairs dictionary keyed to core event, with list of paired events.
  3. existing_pairs dictionary used to remove events paired with core event from sub_catalog so they are not recalculated
  4. existing_corr_file read in & output into output file, followed by newly calculated correlations.
  5. Added option of output_filename to write_correlations. Default "dt.cc"

Why was it initiated? Any relevant Issues?

To prevent having to recalculate correlations each time new events are added & to allow the output of different correlation files without overwriting.

Not been extensively tested. In small amount of testing code works as is, but could be more completely implemented. Things to do:

  • speed up saving existing correlations output (do I need to write out every line individually?)
  • sort output by event numbers (interleaving existing with new)
  • check existing correlations for extra correlations which wouldn't be otherwise calculated
  • implement checking whether there are new stations missing from existing correlations which are being missed

PR Checklist

  • develop base branch selected?
  • This PR is not directly related to an existing issue (which has no PR yet).
  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGES.md.
  • First time contributors have added your name to CONTRIBUTORS.md.
    Boxes left unchecked when I don't know

Correlations from existing_corr_file added to write_correlations & compute_differential_times. Default None skips new sections of code.
1. existing_corr_file read in with new function: _read_correlation_file returning a existing event pairs dictionary keyed to core event, with list of paired events. 
2. existing_pairs dictionary used to remove events paired with core event from sub_catalog so they are not recalculated
3. existing_corr_file read in & output into output file, followed by newly calculated correlations. Added option of output_filename to write_correlations. Default "dt.cc"

Works as is, could be more completely implemented. Things to do:
- implement checking whether there are stations missing from existing correlations
- speed up saving existing correlations output (do I need to write out every line individually?)
- sort output by event numbers (interleaving existing with new)
- check existing correlations for those which wouldn't be calculated otherwise?
Same changes as previous commit, but made to master version of code, rather than older one I was working with.
Comma error removed
with open(existing_corr_file, "r") as in_f:
for line in in_f:
f.write(line)
## TODO: speed up saving existing correlations output (do I need to write out every line individually?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to just dump everything you can do something like:

with open(blah) as f:
    f.write(["\n".join(l for l in in_f])

assuming that your lines don't already have a newline character at the end of them. If they do then you can just ''.join(in_f) and write that string.

@calum-chamberlain
Copy link
Member

calum-chamberlain commented Sep 26, 2022

Thanks for this @erhmestel! Can you update your branch to track develop (press the "Update branch" button underneath all the test ticks, then pull the updated branch back to your local computer) please?

This looks good, even if stickler doesn't think so. I like using PyCharm for this kind of big project and I set up a ruler at 80 characters long (I forget how to do this, but this SO answer has it. PyCharm is pretty good at removing trailing whitespace for you and you can set up code linters (flake8 is the one run by stickler I think) which will highlight non pep8 friendly code.

I would go for cleaning that up, then work on writing a short test that tests this new behaviour. Once we have a test then we can play around with optimising. If there are any edge cases that you can think of that might result in odd behaviour (e.g. when no new events have been added, or if events are in the correlation file, but not in the catalog provided for the update, ...) then having tests for them would be helpful as well so that we can avoid code crashing as much as possible!

I should also use this as an opportunity to better document:

  1. Default action of computing correlation squared (and provide an option to output correlation);
  2. Default actions of travel time computation as tt1 - tt2.

If you have any suggestions of where those docs should go (e.g. where you would look for them) then let me know, otherwise we can put them in in all the relevant places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants