feat: multiple important but minor updates (#325)

* multiple updates Signed-off-by: Yiyu Ni <[email protected]> * update notebooks Signed-off-by: Yiyu Ni <[email protected]> * pass pre-commit * update project to the new io Signed-off-by: Yiyu Ni <[email protected]> * update jbook config Signed-off-by: Yiyu Ni <[email protected]> * add colab Signed-off-by: Yiyu Ni <[email protected]> * update to io 0.1.16 Signed-off-by: Yiyu Ni <[email protected]> * moved s3_anon Signed-off-by: Yiyu Ni <[email protected]> * update tutorials Signed-off-by: Yiyu Ni <[email protected]> --------- Signed-off-by: Yiyu Ni <[email protected]>
noisepy · Oct 18, 2024 · f17e693 · f17e693
1 parent 6a219e5
commit f17e693
Show file tree

Hide file tree

Showing 47 changed files with 270 additions and 1,399 deletions.
diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml
@@ -69,13 +69,14 @@ jobs:
     - name: Test Cross-Correlation (S1)
       run: |
         noisepy cross_correlate \
-        --config configs/s3_anon.yaml \
+        --config tutorials/s3_anon.yml \
         --raw_data_path s3://scedc-pds/continuous_waveforms/ \
         --ccf_path $RUNNER_TEMP/CCF \
-        --net_list=CI \
+        --networks=CI \
         --start=2023-01-01 \
         --end=2023-01-03 \
         --stations=ARV,BAK \
+        --channels=BHE,BHN,BHZ \
         --xml_path=s3://scedc-pds/FDSNstationXML/CI/ \
         --freq_norm ${{matrix.freq_norm}} \
         --stop_on_error \
@@ -107,14 +108,15 @@ jobs:
     - name: Test Cross-Correlation (S1)
       run: |
         mpiexec -n 2 noisepy cross_correlate \
-        --config configs/s3_anon.yaml \
+        --config tutorials/s3_anon.yml \
         --mpi \
         --raw_data_path s3://scedc-pds/continuous_waveforms/ \
         --ccf_path $RUNNER_TEMP/CCF \
-        --net_list=CI \
+        --networks=CI \
         --start=2023-01-01 \
         --end=2023-01-03 \
         --stations=ARV,BAK \
+        --channels=BHE,BHN,BHZ \
         --xml_path=s3://scedc-pds/FDSNstationXML/CI/  \
         --freq_norm ${{matrix.freq_norm}} \
         --stop_on_error \
@@ -148,4 +150,4 @@ jobs:
         --xml_path s3://scedc-pds/FDSNstationXML/CI/ \
         --stations "SBC,RIO" --start_date 2022-02-02 --end_date 2022-02-04 \
         --stop_on_error \
-        --config configs/s3_anon.yaml
+        --config tutorials/s3_anon.yml
diff --git a/.gitignore b/.gitignore
@@ -40,6 +40,8 @@ pip-log.txt
 pip-delete-this-directory.txt
 
 # Unit test / coverage reports
+coverage_html_report
+covrep
 htmlcov/
 .tox/
 .nox/

diff --git a/README.md b/README.md
@@ -11,12 +11,12 @@ NoisePy is a Python package designed for fast and easy computation of ambient no
 ## Major updates coming
 NoisePy is going through a major refactoring to make this package easier to develop and deploy. Submit an issue, fork the repository and create pull requests to [contribute](CONTRIBUTING.md).
 
-# Installation
+## Installation
 The nature of NoisePy being composed of python scripts allows flexible package installation, which is essentially to build dependent libraries the scripts and related functions live upon. We recommend using [conda](https://docs.conda.io/en/latest/) or [pip](https://pypi.org/project/pip/) to install.
 
 **Note the order of the command lines below matters**
 
-## With Conda and pip
+### With Conda and pip
 ```bash
 conda create -n noisepy -y python=3.10 pip
 conda activate noisepy
@@ -29,21 +29,21 @@ pip install ipykernel notebook
 python -m ipykernel install --user --name noisepy
 ```
 
-## With Conda and pip and MPI support
+### With Conda and pip and MPI support
 ```bash
 conda create -n noisepy -y python=3.10 pip mpi4py
 conda activate noisepy
 pip install noisepy-seis[mpi]
 ```
 
-## With virtual environment
+### With virtual environment
 ```bash
 python -m venv noisepy
 source noisepy/bin/activate
 pip install noisepy-seis
 ```
 
-## With virtual environment and MPI support
+### With virtual environment and MPI support
 An MPI installation is required. E.g. for macOS using [brew](https://brew.sh/) :
 ```bash
 brew install open-mpi
@@ -55,7 +55,7 @@ source noisepy/bin/activate
 pip install noisepy-seis[mpi]
 ```
 
-# Functionality
+## Functionality
 Here is a list of features of the package:
 * download continous noise data based:
    + on webservices using obspy's core functions of [get_station](https://docs.obspy.org/packages/autogen/obspy.clients.fdsn.client.Client.get_stations.html) and [get_waveforms](https://docs.obspy.org/packages/autogen/obspy.clients.fdsn.client.Client.get_waveforms.html)
@@ -67,28 +67,25 @@ Here is a list of features of the package:
    + *Ambient noise monitoring*: measure dv/v using a wide variety of techniques in time, fourier, and wavelet domain (Yuan et al., 2021)
    + *Surface wave dispersion*: construct dispersion images using conventional techniques.
 
-# Usage
+## Usage
 
 To run the code on a single core, open the terminal and activate the noisepy environment before run following commands. To run on institutional clusters, see installation notes for individual packages on the module list of the cluster.
 
-## Deploy using Docker
+### Deploy using Docker
 We use I/O on disk, so users need root access to the file system. To install rootless docker, see instructions [here](https://docs.docker.com/engine/security/rootless/#install).
 ```bash
 docker pull  ghcr.io/noisepy/noisepy:latest
 docker run -v ~/tmp:/tmp ghcr.io/noisepy/noisepy:latest cross_correlate --path /tmp
 ```
 
-# Tutorials
-A short tutorial on how to use NoisePy can be is available as a [web page](https://noisepy.github.io/NoisePy/noisepy_scedc_tutorial.html) or [Jupyter notebook](https://github.com/noisepy/NoisePy/blob/main/tutorials/noisepy_scedc_tutorial.ipynb) and can be
-[run directly in Colab](https://colab.research.google.com/github/noisepy/NoisePy/blob/main/tutorials/noisepy_scedc_tutorial.ipynb).
-
-This tutorial presents one simple example of how NoisePy might work. We strongly encourage you to download the NoisePy package and play it on your own! If you have any comments and/or suggestions during running the codes, please do not hesitate to contact us through email or open an issue in this github page!
+## Tutorials
+Short tutorials on how to use NoisePy can be is available [here](https://noisepy.github.io/NoisePy/) and can be run directly in Colab. These tutorials present simple examples of how NoisePy might work. We strongly encourage you to download the NoisePy package and play it on your own! If you have any comments and/or suggestions during running the codes, please do not hesitate to contact us through email or open an issue in this github page!
 
 Chengxin Jiang ([email protected])
 Marine Denolle ([email protected])
 Yiyu Ni ([email protected])
 
-## Taxonomy
+### Taxonomy
 Taxonomy of the NoisePy variables.
 
 * ``station`` refers to the site that has the seismic instruments that records ground shaking.
@@ -101,12 +98,12 @@ Taxonomy of the NoisePy variables.
 * ``substack,substack_len`` boolean, window length over which to substack the correlation (to save storage or do monitoring), it has to be a multiple of ``cc_len``.
 * ``time_chunk, nchunk`` refers to the time unit that defined a single job. for instace, ``cc_len`` is the correlation length (e.g., 1 hour, 30 min), the overall duration of the experiment is the total length (1 month, 1 year, ...). The time chunk could be 1 day: the code would loop through each cc_len window in a for loop. But each day will be sent as a thread.
 
-# Acknowledgements
+## Acknowledgements
 Thanks to our contributors so far!
 
 [![Contributors](https://contrib.rocks/image?repo=noisepy/NoisePy)](https://github.com/noisepy/NoisePy/graphs/contributors)
 
-## Use this reference when publishing on your work with noisepy
+### Use this reference when publishing on your work with noisepy
 
 Main code:
 

diff --git a/configs/default.yml b/configs/default.yml
diff --git a/pyproject.toml b/pyproject.toml
@@ -50,7 +50,7 @@ dependencies = [
     "PyYAML==6.0",
     "pydantic-yaml==1.0",
     "psutil>=5.9.5,<6.0.0",
-    "noisepy-seis-io>=0.1.14",
+    "noisepy-seis-io>=0.1.16",
     "scipy==1.12.0"
 ]
 

diff --git a/src/noisepy/functions_2019/S0B_to_ASDF_2019.py b/src/noisepy/functions_2019/S0B_to_ASDF_2019.py
@@ -57,7 +57,7 @@
 
 # useful parameters for cleaning the data
 input_fmt = "sac"  # input file format between 'sac' and 'mseed'
-samp_freq = 10  # targeted sampling rate
+sampling_rate = 10  # targeted sampling rate
 stationxml = False  # station.XML file exists or not
 rm_resp = RmResp.NO  # select 'no' to not remove response and
 # use 'inv','spectrum','RESP', or 'polozeros' to remove response
@@ -101,7 +101,7 @@
     "respdir": respdir,
     "freqmin": freqmin,
     "freqmax": freqmax,
-    "samp_freq": samp_freq,
+    "sampling_rate": sampling_rate,
     "inc_hours": inc_hours,
     "start_date": start_date,
     "end_date": end_date,
@@ -155,7 +155,7 @@
     # rough estimation on memory needs needed in S1 (assume float32 dtype)
     nsec_chunk = inc_hours / 24 * 86400
     nseg_chunk = int(np.floor((nsec_chunk - cc_len) / step)) + 1
-    npts_chunk = int(nseg_chunk * cc_len * samp_freq)
+    npts_chunk = int(nseg_chunk * cc_len * sampling_rate)
     memory_size = nsta * npts_chunk * 4 / 1024**3
     if memory_size > MAX_MEM:
         raise ValueError(

diff --git a/src/noisepy/functions_2019/noise_module.py b/src/noisepy/functions_2019/noise_module.py
@@ -283,7 +283,7 @@ def stacking_rma(cc_array, cc_time, cc_ngood, stack_para):
     nstacks:    number of overall segments for the final stacks
     """
     # load useful parameters from dict
-    samp_freq = stack_para["samp_freq"]
+    sampling_rate = stack_para["sampling_rate"]
     smethod = stack_para["stack_method"]
     rma_substack = stack_para["rma_substack"]
     rma_step = stack_para["rma_step"]
@@ -344,7 +344,7 @@ def stacking_rma(cc_array, cc_time, cc_ngood, stack_para):
         if smethod == StackMethod.LINEAR:
             allstacks1 = np.mean(cc_array, axis=0)
         elif smethod == StackMethod.PWS:
-            allstacks1 = pws(cc_array, samp_freq)
+            allstacks1 = pws(cc_array, sampling_rate)
         elif smethod == StackMethod.ROBUST:
             (
                 allstacks1,
@@ -354,7 +354,7 @@ def stacking_rma(cc_array, cc_time, cc_ngood, stack_para):
             allstacks1 = selective_stack(cc_array, 0.001)
         elif smethod == StackMethod.ALL:
             allstacks1 = np.mean(cc_array, axis=0)
-            allstacks2 = pws(cc_array, samp_freq)
+            allstacks2 = pws(cc_array, sampling_rate)
             allstacks3 = robust_stack(cc_array, 0.001)
             allstacks4 = selective_stack(cc_array, 0.001)
         nstacks = np.sum(cc_ngood)

diff --git a/src/noisepy/monitoring/monitoring_utils.py b/src/noisepy/monitoring/monitoring_utils.py
@@ -197,7 +197,7 @@ def calc_segments(fft_params: ConfigParameters, num_chunk: int, MAX_MEM: int) ->
             num_segmts = int(np.floor((fft_params.inc_hours * 3600 - fft_params.cc_len) / fft_params.step))
         else:
             num_segmts = int(fft_params.inc_hours / (fft_params.substack_len / 3600))
-    npts_segmt = int(2 * fft_params.maxlag * fft_params.samp_freq) + 1
+    npts_segmt = int(2 * fft_params.maxlag * fft_params.sampling_rate) + 1
     memory_size = num_chunk * num_segmts * npts_segmt * 4 / 1024**3
 
     if memory_size > MAX_MEM:

diff --git a/src/noisepy/seis/correlate.py b/src/noisepy/seis/correlate.py
@@ -124,7 +124,7 @@ def cc_timespan(
     """
     LOADING NOISE DATA AND DO FFT
     """
-    nnfft = int(next_fast_len(int(fft_params.cc_len * fft_params.samp_freq)))  # samp_freq should be sampling_rate
+    nnfft = int(next_fast_len(int(fft_params.cc_len * fft_params.sampling_rate)))
 
     t_chunk = tlog.reset()  # for tracking overall chunk processing time
     all_channels = raw_store.get_channels(ts)
@@ -165,7 +165,7 @@ def cc_timespan(
         return []
 
     ch_data_tuples = _read_channels(
-        executor, ts, raw_store, missing_channels, fft_params.samp_freq, fft_params.single_freq
+        executor, ts, raw_store, missing_channels, fft_params.sampling_rate, fft_params.single_freq
     )
     # only the channels we are using
 
@@ -475,14 +475,14 @@ def _read_channels(
     ts: DateTimeRange,
     store: RawDataStore,
     channels: List[Channel],
-    samp_freq: int,
+    sampling_rate: int,
     single_freq: bool = True,
 ) -> List[Tuple[Channel, ChannelData]]:
     ch_data_refs = [executor.submit(_safe_read_data, store, ts, ch) for ch in channels]
     ch_data = get_results(ch_data_refs, "Read channel data")
     tuples = list(filter(lambda tup: tup[1].data.size > 0, zip(channels, ch_data)))
 
-    return _filter_channel_data(tuples, samp_freq, single_freq)
+    return _filter_channel_data(tuples, sampling_rate, single_freq)
 
 
 def _safe_read_data(store: RawDataStore, ts: DateTimeRange, ch: Channel) -> ChannelData:
@@ -494,24 +494,24 @@ def _safe_read_data(store: RawDataStore, ts: DateTimeRange, ch: Channel) -> Chan
 
 
 def _filter_channel_data(
-    tuples: List[Tuple[Channel, ChannelData]], samp_freq: int, single_freq: bool = True
+    tuples: List[Tuple[Channel, ChannelData]], sampling_rate: int, single_freq: bool = True
 ) -> List[Tuple[Channel, ChannelData]]:
     frequencies = set(t[1].sampling_rate for t in tuples)
-    frequencies = list(filter(lambda f: f >= samp_freq, frequencies))
+    frequencies = list(filter(lambda f: f >= sampling_rate, frequencies))
     if len(frequencies) == 0:
-        logging.warning(f"No data available with sampling frequency >= {samp_freq}")
+        logging.warning(f"No data available with sampling rate >= {sampling_rate}")
         return []
     if single_freq:
         closest_freq = min(
             frequencies,
-            key=lambda f: max(f - samp_freq, 0),
+            key=lambda f: max(f - sampling_rate, 0),
         )
-        logger.info(f"Picked {closest_freq} as the closest sampling frequence to {samp_freq}. ")
+        logger.info(f"Picked {closest_freq} as the closest sampling rate to {sampling_rate}. ")
         filtered_tuples = list(filter(lambda tup: tup[1].sampling_rate == closest_freq, tuples))
         logger.info(f"Filtered to {len(filtered_tuples)}/{len(tuples)} channels with sampling rate == {closest_freq}")
     else:
-        filtered_tuples = list(filter(lambda tup: tup[1].sampling_rate >= samp_freq, tuples))
-        logger.info(f"Filtered to {len(filtered_tuples)}/{len(tuples)} channels with sampling rate >= {samp_freq}")
+        filtered_tuples = list(filter(lambda tup: tup[1].sampling_rate >= sampling_rate, tuples))
+        logger.info(f"Filtered to {len(filtered_tuples)}/{len(tuples)} channels with sampling rate >= {sampling_rate}")
 
     return filtered_tuples
 
@@ -523,7 +523,7 @@ def check_memory(params: ConfigParameters, nsta: int) -> int:
     # crude estimation on memory needs (assume float32)
     nsec_chunk = params.inc_hours / 24 * 86400
     nseg_chunk = int(np.floor((nsec_chunk - params.cc_len) / params.step))
-    npts_chunk = int(nseg_chunk * params.cc_len * params.samp_freq)
+    npts_chunk = int(nseg_chunk * params.cc_len * params.sampling_rate)
     memory_size = nsta * npts_chunk * 4 / 1024**3
     if memory_size > MAX_MEM:
         raise ValueError(

diff --git a/src/noisepy/seis/fdsn_download.py b/src/noisepy/seis/fdsn_download.py
@@ -80,8 +80,6 @@ def download(direc: str, prepro_para: ConfigParameters) -> None:
 
     # client/data center. see https://docs.obspy.org/packages/obspy.clients.fdsn.html for a list
     client = Client(prepro_para.client_url_key)
-    chan_list = prepro_para.channels
-    sta_list = prepro_para.stations
     executor = ThreadPoolExecutor()
 
     tlog = TimeLogger(logger, logging.INFO)
@@ -101,11 +99,12 @@ def download(direc: str, prepro_para: ConfigParameters) -> None:
         f"""Download
         From: {starttime}
         To: {endtime}
-        Stations: {sta_list}
-        Channels: {chan_list}
+        Networks: {prepro_para.networks}
+        Stations: {prepro_para.stations}
+        Channels: {prepro_para.channels}
         """
     )
-    ncomp = len(chan_list)
+    ncomp = len(prepro_para.channels)
 
     # prepare station info (existing station list vs. fetching from client)
     if prepro_para.down_list:
@@ -119,9 +118,9 @@ def download(direc: str, prepro_para: ConfigParameters) -> None:
         # calculate the total number of channels to download
         # loop through specified network, station and channel lists
         bulk_req = []
-        for inet in prepro_para.net_list:
-            for ista in sta_list:
-                for ichan in chan_list:
+        for inet in prepro_para.networks:
+            for ista in prepro_para.stations:
+                for ichan in prepro_para.channels:
                     bulk_req.append((inet, ista, "*", ichan, starttime, endtime))
 
         # gather station info
@@ -163,7 +162,7 @@ def download(direc: str, prepro_para: ConfigParameters) -> None:
     # rough estimation on memory needs (assume float32 dtype)
     nsec_chunk = prepro_para.inc_hours / 24 * 86400
     nseg_chunk = int(np.floor((nsec_chunk - prepro_para.cc_len) / prepro_para.step)) + 1
-    npts_chunk = int(nseg_chunk * prepro_para.cc_len * prepro_para.samp_freq)
+    npts_chunk = int(nseg_chunk * prepro_para.cc_len * prepro_para.sampling_rate)
     memory_size = nsta * npts_chunk * 4 / 1024**3
     if memory_size > MAX_MEM:
         raise ValueError(

diff --git a/src/noisepy/seis/main.py b/src/noisepy/seis/main.py
@@ -151,7 +151,7 @@ def count(pat):
         store = SCEDCS3DataStore(
             raw_dir,
             catalog,
-            channel_filter(params.net_list, params.stations, params.channels),
+            channel_filter(params.networks, params.stations, params.channels),
             DateTimeRange(params.start_date, params.end_date),
             params.storage_options,
         )