All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
Multiplexer
object now supports pickling.- Function
get_google_auth
was removed; now using utilities fromcloudly
.
- API changes to the methods
copy_file
andcopy_dir
: the first parameter used to be the "target", now it is the "source". Now, both methods write toself
. This is consistent with thewrite_*
methods of files. - Clarified how
download_*
andupload_*
implementations of a blob store relate to the genericcopy_dir
andcopy_file
methods. Now, callingcopy_dir
andcopy_file
will delegate to tailoreddownload_*
andupload_*
methods when available.
- Finetuning about GCS retry.
GcsBlobUpath.write_meta
adds retry onServiceUnavailable
.
- Finetune
Multiplexer
: removing a data element from internal data list before yielding it.
GcsBlobUpath
gets new methodsread_meta
andwrite_meta
.- Revise the design of
GcsBlobUpath.lock
so that user does not need to lock a "helper file"; instead, user locks the file they want to read/write, and this method internally locks a helper file. Upath.lock
now yieldsself
in the context manager.- Removed
upathlib.gcs
in favor ofupathlib._gcs
. - Some finetuning to GCS internals.
- Refinements to
upathlib.serializer
:- Removed
TextSerializer
. - The base is the new protocol
Serializer
. - Besides
serialize
anddeserialize
, it gets two new classmethods:dump
andload
. - Added
orjson
serializers.
- Removed
- Removed methods
Upath.write_pickle_z
,Upath.read_pickle_z
,Upath.write_pickle_lz4
,Upath.read_pickle_lz4
. - Bumped Python versoin to 3.10, although there's no firm feature requirement that I know of at this time.
- Lower log level from 'info' to 'debug' for one message.
- Bug fix in GCS related to retry and locking.
- Changed
upathlib.azure
toupathlib._azure
. - Changed
upathlib.gcs
toupathlib._gcs
; addedupathlib._gcs
for back-compat transition. - Made
get_google_auth
importable fromupathlib.__init__
. - Changed
upathlib._multiplexer
toupathlib.multiplexer
; removedMultiplexer
fromupathlib.__init__
. - Improvements to error messages.
- New class
Multiplexer
. - Finetune in GCS ratelimit handling in
write_bytes
.
- Fixes and finetuning on GCS "retry" logic.
- Removed optional depencency
lz4
. User can install it if they need to use it. - Fix: the parameter
timeout
forGcsBlobUpath.lock
should be applied to therelease_retry
in addition toacquire_retry
.
- Make
ZstdPickleSerializer
thread safe. - Remove functions
z_compress
,z_decompress
,zstd_compress
,zstd_decompress
,lz4_compress
,lz4_decompress
.
zstandard
became a required dependency.- Enhancements to
ZstdPickleSerializer
.
- Finetune "delay time" in the retry logic of GCS locking.
- Add retries on GCS write rate-limiting error.
- Finetune retries in GCS locking.
- Add certain retries on Google authentication.
- Adjust retry delays in GCS locking.
google_api_core.Retry.timeout
workaround- Remove dependency on
mpservice
; the new module_util.py
is copied frommpservice
.
- Bug in GCS locking related to retry condition.
- Upgrade `mpservice``.
LocalUpath.lock
no longer deletes the file after lock release.
- Fine-tuned "retry" logic in GCS.
- Fine-tuned
LocalUpath.lock
. - Support parameter
concurrent: bool
in folder operations (and GCS file download) where possible.
- Methods that use thread pool get parameter
concurrent
with defaultTrue
.
orjson
related methods.- all uses of
overrides
- optional dependency
lz4
Lz4PickleSerializer
- functions
z_compress
,z_decompress
,zstd_compress
,zstd_decompress
,lz4_compress
,lz4_decompress
- methods
write_pickle_lz4
,read_pickle_lz4
zstandard
becomes an "optional", rather than mandatory, dependency.
- Fine-tune multipart download of large blobs from Google Cloud Storage.
Upgraded Python to 3.10 for development and testing.
Fixed an error in parameter type annotation related to overrides
that was revealed in this migration.
There was no changes to any functionality.
- Deprecated orjson serializers and read/write methods.
LocalUpath.write_bytes
now accepts file-like data as input.
- Suppress progress printouts in
rmrf
. - Bug fix and clarification on
lock
---upon exiting the context manager, the lock file must be deleted.
- Fine-tune
GcsBlobUpath.lock
. - Run doctest in
test_docs.py
. - Fix dependency
filelock
version and hack it to usetime.perf_counter
instead oftime.monotonic
. - Use
mpservice
for global thread pools to make them safe with forked processes.
GcsBlobUpath
finetune and bug fix related to exceptions, retry, lock.LocalUpath.lock
finetune.
- Parameters
project_id
andcredentials
toGcsBlobUpath.__init__
. - Classmethods
register_read_write_byte_format
,register_read_write_text_format
ofUpath
. - Parameter
thread_pool_executors
toUpath.__init__
. - Methods
rename_dir
andrename_file
inUpath
(both remain inLocalUpath
). - Classes
ZJsonSerializer
,ZstdJsonSerializer
. - Back-compat module
upathlib.gcp
. - Methods
export_dir
,export_file
,import_dir
,import_file
. (Concentrate on thecopy_*
methods.) - Method
with_path
renamed to_with_path
and has become an intermediate implementation helper based on the new propertyroot
. - Properties
GcsBlobUpath.{client, bucket}
have become private methods_client
,_bucket
. - Method
GcsBlobUpath.blob
has become private_blob
. GcsBlobUpath.get_blob
is removed.
LocalUpath.localpath
. (UseLocalUpath.path
instead.)
LocalUpath.path
overrides the super version to returnpathlib.Path
.- The tests module
upathlib.tests
was renamed_tests
. - Simplified comparison and ordering special methods.
- Enhancements to documentation, including Sphinx documentation generation and hosting on readthedocs.
- New method
as_uri
. Comparison, ordering, and hash special methods are changed to use the output ofas_uri
. LocalUpath
now implements theos.PathLike
protocol.- Methods
read_text
,write_text
,read_json
,write_json
get parametersencoding
anderrors
. - New property
root
. - Initial support for Windows by
LocalUpath
.test_local.py
passed on Windows, but there could be corner cases that will fail on Windows.
- New helper function
resolve_path
. - Many directory operations dropped the
desc
parameter but gained thequiet
parameter, defaulting toFalse
. But,rmdir
andrmrf
default toquiet=True
. GcpBlobUpath
is also exposed in__init__.py
.- Parameters
project_id
andcredentials
ofGcpBlobUpath
are deprecated. This info is moved to a classmethod. Similar changes toAzureBlobUpath
. - Renamed
upathlib.gcp
toupathlib.gcs
, andGcpBlobUpath
toGcsBlobUpath
.
- Handle "empty folders" in GCP.
- Use Google's standard way (via
google.auth.default
) to get GCP credentials if needed. - Allow disabling progress bar in most cases by setting
desc
toFalse
. - Thread-pool management;
_run_in_executor
became an instance method (as opposed to classmethod).
- Improvement to robustness in large directory upload to Gcp.
- Improved progress report when downloading/uploading a directory.
- Fine-tuned methods
import_file
andexport_file
. - Bug fix in
GcpBlotUpath.with_path
. The bug causes scalability (when operating on upwards of 56000 files) and speed issues inupload_dir
, because every blob will create its ownClient
. - Removed
LockAcquisitionTimeoutError
. AddedLockAcquireError
,LockReleaseError
. GcpBlobUpath.lock
reduces default wait time to improve responsiveness.
- Make GCP info to
GcpBlobUpath
optional. - Remove home-made retrying utility
Backoff
. Useopnieumw
. - Bug fix related to concurrent downloading involving large files.
- GCP download of large blobs uses threading concurrency.
- API change:
write_bytes
andwrite_text
returnNone
. - Improvements to handling of concurrency.
- Increased default concurrency level from 4 to 16.
- Simplified retry logic. For example, GCP's
download_to_file
has its own handling of retry; now we rely ondownload_to_file
to finish the task and do not retry on it. - Simplified parameters and behavior around 'overwrite'.
- Improvements to serializers:
- Serializers allow extra arguments in their
serialize
anddeserialize
methods. - Added dependency
zstandard
to provide compression. - New serializers
ZJsonSerializer
,ZstdJsonSerializer
,ZstdPickleSerializer
,ZstdOrjsonSerializer
.
- Serializers allow extra arguments in their
- Use
overrides
to enforce sanity in class inheritance. - Add tests on Azure and GCP using mocks.
- Add more retry logic to GCP.
- Refactor and simplify test/build process.
- GcpUpath: refresh cache upon timeout error.
- Relax version requirements on dependencies.
- Minor fine-tuning.
- Bug fix.
- Bug fix in GcpBlobUpath
- Removed async methods, since the current simplementations are simply wrappers around thread-pool executions.
- Refactor the
__init__
method. - Efficiency improvements for GCP.
- Capture use-specified exceptions and retry in operations on multiple blobs.
- Bug fixes and improvements to GcpBlobUpath.
- Bug fix related to GCP.
- Implement blob locking for GCP.
- Improvements to 'lock' methods.
- Skip thread pool when
concurrency <= 1
. - Support extra dependencies via
options.extras_require
insetup.cfg
.
- Reworked
AzureBlobUpath.lock
andAzureBlobUpath.a_lock
.
remove_file
loses argumentmissing_ok
. Return 0 if file is not found.GcpBlobUpath
copy, download, upload, rename.BlobUpath
gets more methods for download and upload, which are thin wrappers around export and import methods.
- Another round of API fine tuning. There are some improvements to naming, consistency, and simplicity.
AzureBlobUpath
has custom blob copying, uploading, downloading.- Improvements to tests.
- Remove
AzureBlobUpath
andGcpBlobUpath
from package__init__.py
, making their dependencies optional in a future release.
- More "native" implementation of async methods.
- API refinements.
- Bug fix.
- Add implementations for Azure and GCP blob stores.
- Add JSON and pickle convenience methods to API.
- Add preliminary
lock
API.
- API iterations.
LocalUpath
implementation
First draft of API and LocalUpath.