Skip to content

Commit

Permalink
feat: multiple important but minor updates (#325)
Browse files Browse the repository at this point in the history
* multiple updates

Signed-off-by: Yiyu Ni <[email protected]>

* update notebooks

Signed-off-by: Yiyu Ni <[email protected]>

* pass pre-commit

* update project to the new io

Signed-off-by: Yiyu Ni <[email protected]>

* update jbook config

Signed-off-by: Yiyu Ni <[email protected]>

* add colab

Signed-off-by: Yiyu Ni <[email protected]>

* update to io 0.1.16

Signed-off-by: Yiyu Ni <[email protected]>

* moved s3_anon

Signed-off-by: Yiyu Ni <[email protected]>

* update tutorials

Signed-off-by: Yiyu Ni <[email protected]>

---------

Signed-off-by: Yiyu Ni <[email protected]>
  • Loading branch information
niyiyu authored Oct 18, 2024
1 parent 6a219e5 commit f17e693
Show file tree
Hide file tree
Showing 47 changed files with 270 additions and 1,399 deletions.
12 changes: 7 additions & 5 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,13 +69,14 @@ jobs:
- name: Test Cross-Correlation (S1)
run: |
noisepy cross_correlate \
--config configs/s3_anon.yaml \
--config tutorials/s3_anon.yml \
--raw_data_path s3://scedc-pds/continuous_waveforms/ \
--ccf_path $RUNNER_TEMP/CCF \
--net_list=CI \
--networks=CI \
--start=2023-01-01 \
--end=2023-01-03 \
--stations=ARV,BAK \
--channels=BHE,BHN,BHZ \
--xml_path=s3://scedc-pds/FDSNstationXML/CI/ \
--freq_norm ${{matrix.freq_norm}} \
--stop_on_error \
Expand Down Expand Up @@ -107,14 +108,15 @@ jobs:
- name: Test Cross-Correlation (S1)
run: |
mpiexec -n 2 noisepy cross_correlate \
--config configs/s3_anon.yaml \
--config tutorials/s3_anon.yml \
--mpi \
--raw_data_path s3://scedc-pds/continuous_waveforms/ \
--ccf_path $RUNNER_TEMP/CCF \
--net_list=CI \
--networks=CI \
--start=2023-01-01 \
--end=2023-01-03 \
--stations=ARV,BAK \
--channels=BHE,BHN,BHZ \
--xml_path=s3://scedc-pds/FDSNstationXML/CI/ \
--freq_norm ${{matrix.freq_norm}} \
--stop_on_error \
Expand Down Expand Up @@ -148,4 +150,4 @@ jobs:
--xml_path s3://scedc-pds/FDSNstationXML/CI/ \
--stations "SBC,RIO" --start_date 2022-02-02 --end_date 2022-02-04 \
--stop_on_error \
--config configs/s3_anon.yaml
--config tutorials/s3_anon.yml
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
coverage_html_report
covrep
htmlcov/
.tox/
.nox/
Expand Down
29 changes: 13 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ NoisePy is a Python package designed for fast and easy computation of ambient no
## Major updates coming
NoisePy is going through a major refactoring to make this package easier to develop and deploy. Submit an issue, fork the repository and create pull requests to [contribute](CONTRIBUTING.md).

# Installation
## Installation
The nature of NoisePy being composed of python scripts allows flexible package installation, which is essentially to build dependent libraries the scripts and related functions live upon. We recommend using [conda](https://docs.conda.io/en/latest/) or [pip](https://pypi.org/project/pip/) to install.

**Note the order of the command lines below matters**

## With Conda and pip
### With Conda and pip
```bash
conda create -n noisepy -y python=3.10 pip
conda activate noisepy
Expand All @@ -29,21 +29,21 @@ pip install ipykernel notebook
python -m ipykernel install --user --name noisepy
```

## With Conda and pip and MPI support
### With Conda and pip and MPI support
```bash
conda create -n noisepy -y python=3.10 pip mpi4py
conda activate noisepy
pip install noisepy-seis[mpi]
```

## With virtual environment
### With virtual environment
```bash
python -m venv noisepy
source noisepy/bin/activate
pip install noisepy-seis
```

## With virtual environment and MPI support
### With virtual environment and MPI support
An MPI installation is required. E.g. for macOS using [brew](https://brew.sh/) :
```bash
brew install open-mpi
Expand All @@ -55,7 +55,7 @@ source noisepy/bin/activate
pip install noisepy-seis[mpi]
```

# Functionality
## Functionality
Here is a list of features of the package:
* download continous noise data based:
+ on webservices using obspy's core functions of [get_station](https://docs.obspy.org/packages/autogen/obspy.clients.fdsn.client.Client.get_stations.html) and [get_waveforms](https://docs.obspy.org/packages/autogen/obspy.clients.fdsn.client.Client.get_waveforms.html)
Expand All @@ -67,28 +67,25 @@ Here is a list of features of the package:
+ *Ambient noise monitoring*: measure dv/v using a wide variety of techniques in time, fourier, and wavelet domain (Yuan et al., 2021)
+ *Surface wave dispersion*: construct dispersion images using conventional techniques.

# Usage
## Usage

To run the code on a single core, open the terminal and activate the noisepy environment before run following commands. To run on institutional clusters, see installation notes for individual packages on the module list of the cluster.

## Deploy using Docker
### Deploy using Docker
We use I/O on disk, so users need root access to the file system. To install rootless docker, see instructions [here](https://docs.docker.com/engine/security/rootless/#install).
```bash
docker pull ghcr.io/noisepy/noisepy:latest
docker run -v ~/tmp:/tmp ghcr.io/noisepy/noisepy:latest cross_correlate --path /tmp
```

# Tutorials
A short tutorial on how to use NoisePy can be is available as a [web page](https://noisepy.github.io/NoisePy/noisepy_scedc_tutorial.html) or [Jupyter notebook](https://github.com/noisepy/NoisePy/blob/main/tutorials/noisepy_scedc_tutorial.ipynb) and can be
[run directly in Colab](https://colab.research.google.com/github/noisepy/NoisePy/blob/main/tutorials/noisepy_scedc_tutorial.ipynb).

This tutorial presents one simple example of how NoisePy might work. We strongly encourage you to download the NoisePy package and play it on your own! If you have any comments and/or suggestions during running the codes, please do not hesitate to contact us through email or open an issue in this github page!
## Tutorials
Short tutorials on how to use NoisePy can be is available [here](https://noisepy.github.io/NoisePy/) and can be run directly in Colab. These tutorials present simple examples of how NoisePy might work. We strongly encourage you to download the NoisePy package and play it on your own! If you have any comments and/or suggestions during running the codes, please do not hesitate to contact us through email or open an issue in this github page!

Chengxin Jiang ([email protected])
Marine Denolle ([email protected])
Yiyu Ni ([email protected])

## Taxonomy
### Taxonomy
Taxonomy of the NoisePy variables.

* ``station`` refers to the site that has the seismic instruments that records ground shaking.
Expand All @@ -101,12 +98,12 @@ Taxonomy of the NoisePy variables.
* ``substack,substack_len`` boolean, window length over which to substack the correlation (to save storage or do monitoring), it has to be a multiple of ``cc_len``.
* ``time_chunk, nchunk`` refers to the time unit that defined a single job. for instace, ``cc_len`` is the correlation length (e.g., 1 hour, 30 min), the overall duration of the experiment is the total length (1 month, 1 year, ...). The time chunk could be 1 day: the code would loop through each cc_len window in a for loop. But each day will be sent as a thread.

# Acknowledgements
## Acknowledgements
Thanks to our contributors so far!

[![Contributors](https://contrib.rocks/image?repo=noisepy/NoisePy)](https://github.com/noisepy/NoisePy/graphs/contributors)

## Use this reference when publishing on your work with noisepy
### Use this reference when publishing on your work with noisepy

Main code:

Expand Down
41 changes: 0 additions & 41 deletions configs/default.yml

This file was deleted.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ dependencies = [
"PyYAML==6.0",
"pydantic-yaml==1.0",
"psutil>=5.9.5,<6.0.0",
"noisepy-seis-io>=0.1.14",
"noisepy-seis-io>=0.1.16",
"scipy==1.12.0"
]

Expand Down
6 changes: 3 additions & 3 deletions src/noisepy/functions_2019/S0B_to_ASDF_2019.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@

# useful parameters for cleaning the data
input_fmt = "sac" # input file format between 'sac' and 'mseed'
samp_freq = 10 # targeted sampling rate
sampling_rate = 10 # targeted sampling rate
stationxml = False # station.XML file exists or not
rm_resp = RmResp.NO # select 'no' to not remove response and
# use 'inv','spectrum','RESP', or 'polozeros' to remove response
Expand Down Expand Up @@ -101,7 +101,7 @@
"respdir": respdir,
"freqmin": freqmin,
"freqmax": freqmax,
"samp_freq": samp_freq,
"sampling_rate": sampling_rate,
"inc_hours": inc_hours,
"start_date": start_date,
"end_date": end_date,
Expand Down Expand Up @@ -155,7 +155,7 @@
# rough estimation on memory needs needed in S1 (assume float32 dtype)
nsec_chunk = inc_hours / 24 * 86400
nseg_chunk = int(np.floor((nsec_chunk - cc_len) / step)) + 1
npts_chunk = int(nseg_chunk * cc_len * samp_freq)
npts_chunk = int(nseg_chunk * cc_len * sampling_rate)
memory_size = nsta * npts_chunk * 4 / 1024**3
if memory_size > MAX_MEM:
raise ValueError(
Expand Down
6 changes: 3 additions & 3 deletions src/noisepy/functions_2019/noise_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ def stacking_rma(cc_array, cc_time, cc_ngood, stack_para):
nstacks: number of overall segments for the final stacks
"""
# load useful parameters from dict
samp_freq = stack_para["samp_freq"]
sampling_rate = stack_para["sampling_rate"]
smethod = stack_para["stack_method"]
rma_substack = stack_para["rma_substack"]
rma_step = stack_para["rma_step"]
Expand Down Expand Up @@ -344,7 +344,7 @@ def stacking_rma(cc_array, cc_time, cc_ngood, stack_para):
if smethod == StackMethod.LINEAR:
allstacks1 = np.mean(cc_array, axis=0)
elif smethod == StackMethod.PWS:
allstacks1 = pws(cc_array, samp_freq)
allstacks1 = pws(cc_array, sampling_rate)
elif smethod == StackMethod.ROBUST:
(
allstacks1,
Expand All @@ -354,7 +354,7 @@ def stacking_rma(cc_array, cc_time, cc_ngood, stack_para):
allstacks1 = selective_stack(cc_array, 0.001)
elif smethod == StackMethod.ALL:
allstacks1 = np.mean(cc_array, axis=0)
allstacks2 = pws(cc_array, samp_freq)
allstacks2 = pws(cc_array, sampling_rate)
allstacks3 = robust_stack(cc_array, 0.001)
allstacks4 = selective_stack(cc_array, 0.001)
nstacks = np.sum(cc_ngood)
Expand Down
2 changes: 1 addition & 1 deletion src/noisepy/monitoring/monitoring_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ def calc_segments(fft_params: ConfigParameters, num_chunk: int, MAX_MEM: int) ->
num_segmts = int(np.floor((fft_params.inc_hours * 3600 - fft_params.cc_len) / fft_params.step))
else:
num_segmts = int(fft_params.inc_hours / (fft_params.substack_len / 3600))
npts_segmt = int(2 * fft_params.maxlag * fft_params.samp_freq) + 1
npts_segmt = int(2 * fft_params.maxlag * fft_params.sampling_rate) + 1
memory_size = num_chunk * num_segmts * npts_segmt * 4 / 1024**3

if memory_size > MAX_MEM:
Expand Down
24 changes: 12 additions & 12 deletions src/noisepy/seis/correlate.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ def cc_timespan(
"""
LOADING NOISE DATA AND DO FFT
"""
nnfft = int(next_fast_len(int(fft_params.cc_len * fft_params.samp_freq))) # samp_freq should be sampling_rate
nnfft = int(next_fast_len(int(fft_params.cc_len * fft_params.sampling_rate)))

t_chunk = tlog.reset() # for tracking overall chunk processing time
all_channels = raw_store.get_channels(ts)
Expand Down Expand Up @@ -165,7 +165,7 @@ def cc_timespan(
return []

ch_data_tuples = _read_channels(
executor, ts, raw_store, missing_channels, fft_params.samp_freq, fft_params.single_freq
executor, ts, raw_store, missing_channels, fft_params.sampling_rate, fft_params.single_freq
)
# only the channels we are using

Expand Down Expand Up @@ -475,14 +475,14 @@ def _read_channels(
ts: DateTimeRange,
store: RawDataStore,
channels: List[Channel],
samp_freq: int,
sampling_rate: int,
single_freq: bool = True,
) -> List[Tuple[Channel, ChannelData]]:
ch_data_refs = [executor.submit(_safe_read_data, store, ts, ch) for ch in channels]
ch_data = get_results(ch_data_refs, "Read channel data")
tuples = list(filter(lambda tup: tup[1].data.size > 0, zip(channels, ch_data)))

return _filter_channel_data(tuples, samp_freq, single_freq)
return _filter_channel_data(tuples, sampling_rate, single_freq)


def _safe_read_data(store: RawDataStore, ts: DateTimeRange, ch: Channel) -> ChannelData:
Expand All @@ -494,24 +494,24 @@ def _safe_read_data(store: RawDataStore, ts: DateTimeRange, ch: Channel) -> Chan


def _filter_channel_data(
tuples: List[Tuple[Channel, ChannelData]], samp_freq: int, single_freq: bool = True
tuples: List[Tuple[Channel, ChannelData]], sampling_rate: int, single_freq: bool = True
) -> List[Tuple[Channel, ChannelData]]:
frequencies = set(t[1].sampling_rate for t in tuples)
frequencies = list(filter(lambda f: f >= samp_freq, frequencies))
frequencies = list(filter(lambda f: f >= sampling_rate, frequencies))
if len(frequencies) == 0:
logging.warning(f"No data available with sampling frequency >= {samp_freq}")
logging.warning(f"No data available with sampling rate >= {sampling_rate}")
return []
if single_freq:
closest_freq = min(
frequencies,
key=lambda f: max(f - samp_freq, 0),
key=lambda f: max(f - sampling_rate, 0),
)
logger.info(f"Picked {closest_freq} as the closest sampling frequence to {samp_freq}. ")
logger.info(f"Picked {closest_freq} as the closest sampling rate to {sampling_rate}. ")
filtered_tuples = list(filter(lambda tup: tup[1].sampling_rate == closest_freq, tuples))
logger.info(f"Filtered to {len(filtered_tuples)}/{len(tuples)} channels with sampling rate == {closest_freq}")
else:
filtered_tuples = list(filter(lambda tup: tup[1].sampling_rate >= samp_freq, tuples))
logger.info(f"Filtered to {len(filtered_tuples)}/{len(tuples)} channels with sampling rate >= {samp_freq}")
filtered_tuples = list(filter(lambda tup: tup[1].sampling_rate >= sampling_rate, tuples))
logger.info(f"Filtered to {len(filtered_tuples)}/{len(tuples)} channels with sampling rate >= {sampling_rate}")

return filtered_tuples

Expand All @@ -523,7 +523,7 @@ def check_memory(params: ConfigParameters, nsta: int) -> int:
# crude estimation on memory needs (assume float32)
nsec_chunk = params.inc_hours / 24 * 86400
nseg_chunk = int(np.floor((nsec_chunk - params.cc_len) / params.step))
npts_chunk = int(nseg_chunk * params.cc_len * params.samp_freq)
npts_chunk = int(nseg_chunk * params.cc_len * params.sampling_rate)
memory_size = nsta * npts_chunk * 4 / 1024**3
if memory_size > MAX_MEM:
raise ValueError(
Expand Down
17 changes: 8 additions & 9 deletions src/noisepy/seis/fdsn_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,6 @@ def download(direc: str, prepro_para: ConfigParameters) -> None:

# client/data center. see https://docs.obspy.org/packages/obspy.clients.fdsn.html for a list
client = Client(prepro_para.client_url_key)
chan_list = prepro_para.channels
sta_list = prepro_para.stations
executor = ThreadPoolExecutor()

tlog = TimeLogger(logger, logging.INFO)
Expand All @@ -101,11 +99,12 @@ def download(direc: str, prepro_para: ConfigParameters) -> None:
f"""Download
From: {starttime}
To: {endtime}
Stations: {sta_list}
Channels: {chan_list}
Networks: {prepro_para.networks}
Stations: {prepro_para.stations}
Channels: {prepro_para.channels}
"""
)
ncomp = len(chan_list)
ncomp = len(prepro_para.channels)

# prepare station info (existing station list vs. fetching from client)
if prepro_para.down_list:
Expand All @@ -119,9 +118,9 @@ def download(direc: str, prepro_para: ConfigParameters) -> None:
# calculate the total number of channels to download
# loop through specified network, station and channel lists
bulk_req = []
for inet in prepro_para.net_list:
for ista in sta_list:
for ichan in chan_list:
for inet in prepro_para.networks:
for ista in prepro_para.stations:
for ichan in prepro_para.channels:
bulk_req.append((inet, ista, "*", ichan, starttime, endtime))

# gather station info
Expand Down Expand Up @@ -163,7 +162,7 @@ def download(direc: str, prepro_para: ConfigParameters) -> None:
# rough estimation on memory needs (assume float32 dtype)
nsec_chunk = prepro_para.inc_hours / 24 * 86400
nseg_chunk = int(np.floor((nsec_chunk - prepro_para.cc_len) / prepro_para.step)) + 1
npts_chunk = int(nseg_chunk * prepro_para.cc_len * prepro_para.samp_freq)
npts_chunk = int(nseg_chunk * prepro_para.cc_len * prepro_para.sampling_rate)
memory_size = nsta * npts_chunk * 4 / 1024**3
if memory_size > MAX_MEM:
raise ValueError(
Expand Down
2 changes: 1 addition & 1 deletion src/noisepy/seis/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ def count(pat):
store = SCEDCS3DataStore(
raw_dir,
catalog,
channel_filter(params.net_list, params.stations, params.channels),
channel_filter(params.networks, params.stations, params.channels),
DateTimeRange(params.start_date, params.end_date),
params.storage_options,
)
Expand Down
Loading

0 comments on commit f17e693

Please sign in to comment.