Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: alluxio-py support alluxio and oss filesystem #76

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

liiuzq-xiaobai
Copy link

The implementation of the delegated filesystem for Alluxio and OSS has been completed. Specific notes:
1.Users need to specify in the configuration whether the delegated filesystem should be accelerated by Alluxio using the alluxio_enable flag. If set to true, the configuration file must still include the necessary initialization settings for the Alluxio filesystem.
2.The configuration file can include multiple OSS filesystems as delegated filesystems, but it is necessary to ensure that their bucket_name is unique. A unique delegated filesystem is determined by the combination of the delegated filesystem name and the bucket_name.

The implementation of the delegated filesystem for Alluxio and OSS has been completed.
Specific notes:
	1.Users need to specify in the configuration whether the delegated filesystem should be accelerated by Alluxio using the alluxio_enable flag. If set to true, the configuration file must still include the necessary initialization settings for the Alluxio filesystem.
	2.The configuration file can include multiple OSS filesystems as delegated filesystems, but it is necessary to ensure that their bucket_name is unique. A unique delegated filesystem is determined by the combination of the delegated filesystem name and the bucket_name.
local_close = os.close
local_mkdir = os.mkdir
local_remove = os.remove
local_rmdir = os.rmdir

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it not better to separate interface and implementation?

Constants.S3_FILESYSTEM_TYPE: self._validate_s3_config
}

def _load_config(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to add a hot update interface. For example, if the user does not have a configuration file and wants to configure config through the API

return self.config_data.keys()

@staticmethod
def _validate_oss_config(config):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File separation for each ufs check? A lot of ufs will definitely be expanded in the future.



@staticmethod
def _validate_s3_config(config):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The s3 interface also needs to be implemented

@@ -0,0 +1,14 @@
import os
from alluxio.posix import fileimpl
config_manager = fileimpl.ConfigManager("../../config/ufs_config.yaml")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config path cannot be hard-coded in the code. It must have the feasibility of dynamic configuration, such as environment variables and API injection.



def delegatefs_open_write():
write_file_path = f'oss://alhz-ossp-alluxio-test/alluxio-py/delegatefs-io-1.txt'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this file to tests.

…mits.Date:2024-09-20

The implementation of the delegated filesystem for Alluxio and OSS has been completed.
Specific notes:
	1.Users need to specify in the configuration whether the delegated filesystem should be accelerated by Alluxio using the alluxio_enable flag. If set to true, the configuration file must still include the necessary initialization settings for the Alluxio filesystem.
	2.The configuration file can include multiple OSS filesystems as delegated filesystems, but it is necessary to ensure that their bucket_name is unique. A unique delegated filesystem is determined by the combination of the delegated filesystem name and the bucket_name.
def get_config_fs_list(self) -> list:
return self.config_data.keys()

def update_config(self, fs_type, key, value):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in key value through **


def _load_config(self):
if not os.path.exists(self.config_file_path):
raise FileNotFoundError(f"{self.config_file_path} does not exist.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not necessarily depend entirely on the configuration file, and can be configured directly through hot update without a configuration file. If there is no corresponding configuration during use, an exception can be thrown directly.



def open(file: str, mode: str = "r", **kw):
logging.info("DelegateFileSystem opening file: %s", file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info change to debug.

fs = instance.get_file_system(file)
if fs:
try:
return fs.open(file, mode, **kw)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Through fspec reflection, parameters can be passed directly into ufs client. Is there no need to verify the legality of config parameters?

fs = instance.get_file_system(path)
if fs:
try:
logging.info("DelegateFileSystem getStatus filemeta: %s", path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the log level

return local_rename(src, dest, **kw)


class DelegateFileSystem:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate independent files

self.__init__file__system()
DelegateFileSystem.instance = self

def __create__file__system(self, fs_name: str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it impossible to distinguish uniqueness only through fs name? For example, there are multiple buckets under the same ufs.

…mits.Date:2024-09-29

The implementation of the delegated filesystem for Alluxio and OSS has been completed.
Specific notes:
	1.Users need to specify in the configuration whether the delegated filesystem should be accelerated by Alluxio using the alluxio_enable flag. If set to true, the configuration file must still include the necessary initialization settings for the Alluxio filesystem.
	2.The configuration file can include multiple OSS filesystems as delegated filesystems, but it is necessary to ensure that their bucket_name is unique. A unique delegated filesystem is determined by the combination of the delegated filesystem name and the bucket_name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants