Add PySpark Imputation Methods #77

dombean · 2024-04-06T10:08:50Z

Add Imputation Methods to rdsa_utils/methods/imputation:

Rollback Method
Growth Rate (Forward) Method
Growth Rate (Backwards) Method

To ensure our rdsa-utils package, particularly within the methods/imputation directory, remains well-organised and user-friendly, we're implementing a clear naming convention for our file structure.

This approach allows us to clearly denote which library (PySpark or Pandas) a file is intended for, using prefixes (pyspark_ or pandas_) followed by the method name, such as rollback, growth_rate_forward, or growth_rate_backward.

This naming strategy facilitates the inclusion of library-specific methods without the need for each method to have a counterpart in the other library, thereby providing flexibility in our development process. It also helps in reducing unnecessary constraints and ensuring that users can easily navigate and identify the functionalities relevant to their needs. For shared logic that applies across both libraries, we use a common_utils.py file.

rdsa-utils/
│
├── methods/
│   ├── imputation/
│   │   ├── pyspark_rollback.py
│   │   ├── pyspark_growth_rate_forward.py
│   │   ├── pyspark_growth_rate_backward.py
│   │   ├── pandas_rollback.py             # Optional, only if exists
│   │   ├── pandas_growth_rate_forward.py  # Optional, only if exists
│   │   ├── pandas_growth_rate_backward.py # Optional, only if exists
│   │   └── common_utils.py                # For shared logic, if any

The text was updated successfully, but these errors were encountered:

dombean added the enhancement New feature or request label Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PySpark Imputation Methods #77

Add PySpark Imputation Methods #77

dombean commented Apr 6, 2024

Add PySpark Imputation Methods #77

Add PySpark Imputation Methods #77

Comments

dombean commented Apr 6, 2024