Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance & Accuracy Idea] Abstract parsing method #9

Open
CodyCBakerPhD opened this issue Jul 14, 2024 · 2 comments
Open

[Performance & Accuracy Idea] Abstract parsing method #9

CodyCBakerPhD opened this issue Jul 14, 2024 · 2 comments

Comments

@CodyCBakerPhD
Copy link
Member

There are several parsing methods that I might test or adjust over time; it might be nice to allow selectability of which one to use, either for accuracy or speed/memory performance

@CodyCBakerPhD CodyCBakerPhD self-assigned this Jul 14, 2024
@CodyCBakerPhD
Copy link
Member Author

Overall, idea leans towards a unified API for performing the S3 log parsing process itself, so that alternative versions are easier to drop in / swap

@CodyCBakerPhD
Copy link
Member Author

Base class might follow the rough strategy here:

class S3LogParser:
    def __init__(parsed_folder_path: DirectoryPath, s3_log_file_path: FilePath | None = None, s3_log_folder_path: DirectoryPath | None = None):
        # assert XOR on paths options

        # if file, then parse single file according to rules of this class

        # if folder, then iterate directory structure according to rules of this class
         

    def _parse_line(line: str) -> FullLog | None:
        # Parse a single line of a single log file
        pass

    def _parse_lines(lines) -> list[FullLog]:
        # Read in and parse all lines (in buffered style) from a single log file

    def _reduce_elements(
        elements: list[str] = ["timestamps", "asset_id", "remote_ip", "bytes_sent"]  # Though actually a constrained literal over all possible 20+ fields
    ) -> list[ReducedLog]: # Though what constitutes a 'reduced log' type might change from class to class then...
          # Probably via __init__, control which subfields of an S3 log we which to reduce our parsed output to contain

    def _iterate_directory(s3_log_folder_path: DirectoryPath):
        # The rules for iterating directories; might need some inference on if it's a base/year/month level
        # Natsort did not work out of the box on the base

@CodyCBakerPhD CodyCBakerPhD removed their assignment Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant