Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read netcdf files from web into xarray #3

Open
fostermh opened this issue Sep 27, 2023 · 2 comments
Open

Read netcdf files from web into xarray #3

fostermh opened this issue Sep 27, 2023 · 2 comments

Comments

@fostermh
Copy link

this appears to work

import xarray as xr
from bs4 import BeautifulSoup
import requests

url = 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/'
ext = 'nc'

def listFD(url, ext=''):
    page = requests.get(url).text
    soup = BeautifulSoup(page, 'html.parser')
    return [url +  node.get('href') for node in soup.find_all('a') if node.get('href').endswith(ext)]



files = listFD(url, ext)
files = ["%s#mode=bytes" % x for x in files]

len(files)
files_subset = files[1:10]
ds = xr.open_mfdataset(files_subset, compat='override', coords='all')
ds

and results in

<xarray.Dataset>
Dimensions:           (bottle: 24)
Coordinates:
  * bottle            (bottle) float64 1.0 2.0 3.0 4.0 ... 21.0 22.0 23.0 24.0
Data variables: (12/33)
    filename          (bottle) object '1601010.btl' '1601010.btl' ... nan nan
    file_header_text  (bottle) object '* Sea-Bird SBE 9 Data File:\n* FileNam...
    instrument_model  (bottle) object '9' '9' '9' '9' '9' ... nan nan nan nan
    instrument_type   (bottle) object 'CTD-bottle' 'CTD-bottle' ... nan nan
    start_latitude    (bottle) float64 68.5 68.5 68.5 68.5 ... nan nan nan nan
    start_longitude   (bottle) float64 -58.52 -58.52 -58.52 ... nan nan nan
    ...                ...
    svCM              (bottle) float64 dask.array<chunksize=(24,), meta=np.ndarray>
    svDM              (bottle) float64 dask.array<chunksize=(24,), meta=np.ndarray>
    svWM              (bottle) float64 dask.array<chunksize=(24,), meta=np.ndarray>
    wetCDOM           (bottle) float64 dask.array<chunksize=(24,), meta=np.ndarray>
    Upoly1            (bottle) float64 dask.array<chunksize=(24,), meta=np.ndarray>
    cpar              (bottle) float64 dask.array<chunksize=(24,), meta=np.ndarray>
Attributes: (12/16)
    history:               2021-11-30T14:50:13.901854 Read by seabird Python ...
    DATE_CREATION:         20211130145013
    LATITUDE:              58.55866666666667
    LONGITUDE:             -52.838166666666666
    date_created:          2016-06-07T18:13:14
    date_modified:         2021-11-30T22:50:13.901854
    ...                    ...
    md5:                   753b4bf26928c9c670f5a1b79dfd1ae5
    original_header:       * Sea-Bird SBE 9 Data File:\n* FileName = E:\CTD-R...
    original_header_json:  {\n  "instrument_header": {\n    "FileName": "E:\\...
    sbe_model:             9
    seasave:               V 7.23.2
    start_time:            Jun 07 2016 18:07:01 [NMEA time, header]
@mpiannucci
Copy link
Contributor

mpiannucci commented Sep 27, 2023

You can also get the files using fsspec:

import fsspec

fs_read = fsspec.filesystem('http')
files = fs_read.glob('https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/*.nc')
files = [f'{x}#mode=bytes' for x in files]
files

Result:

['https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601001.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601002.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601003.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601004.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601005.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601006.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601007.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601008.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601009.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601010.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601011.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601012.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601013.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601014.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601016.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601017.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601018.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601019.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601021.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601024.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601025.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601026.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601027.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601028.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601029.btl.nc#mode=bytes',
...
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601199.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601200.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601201.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601202.btl.nc#mode=bytes',
 'https://pac-dev2.cioos.org/dev/Amundsen_Bottle_Files/1601203.btl.nc#mode=bytes']

@fostermh
Copy link
Author

good to know, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants