Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future for Importers with V3 #103

Open
Servinjesus1 opened this issue Jun 28, 2024 · 14 comments
Open

Future for Importers with V3 #103

Servinjesus1 opened this issue Jun 28, 2024 · 14 comments

Comments

@Servinjesus1
Copy link

Martin has released "v3" of beancount, which is just the refactored pythonic bits, as the default version. With this, things like bean-extract and bean-identify are finally sunset, replaced with beangulp. What are your thoughts on adapting your importers to this updated framework?

For starters, beancount.ingest can probably become beangulp, although I can't find the regression pytest. Then, shell scripts have to become python scripts, I guess. Probably other work to ensure your importer frameworks remain compatible with the modern beangulp importer.

As someone who was just starting to get the hang of your scripts and was writing my first importer when this change happened, it seems overwhelming to have to adapt to a new framework, so I'd greatly appreciate some expert advice on how to use the new framework :)

@redstreet
Copy link
Owner

redstreet commented Jun 28, 2024

Hello there,
Beangulp is at its core exactly the same as bean-identify/extract/etc., and some 90-98% of the code remains unchanged. It should be pretty easy to convert beancount_reds_importers to use beangulp. I started to do this in 2022, but the beangulp maintainer was not willing to release a PyPI then, so I couldn't. I'll certainly do it (or perhaps someone else will send a PR), but am unlikely have the time to get to it for at least a few weeks, perhaps longer.

Until then, I suggest using Beancount v2. There's no significant difference between that and v3 for most users, and upgrading to v3+beancount_reds_importers when it uses beangulp would be trivial.

Does that help?

@patbakdev
Copy link
Contributor

I have been using Red's importers with beangulp since 2022 without any problem. I moved over when he tried to move over. I have used the following approach.

This code was pulled from beangulp I think:

class Adapter(beangulp.Importer):
    @property
    def name(self):
        return self.importer.name()

    def identify(self, filepath):
        return self.importer.identify(beangulp.cache.get_file(filepath))

    def account(self, filepath):
        return self.importer.file_account(beangulp.cache.get_file(filepath))

    def file_account(self, file):
        return self.importer.file_account(beangulp.cache.get_file(file))

    def date(self, filepath):
        return self.importer.file_date(beangulp.cache.get_file(filepath))

    def filename(self, filepath):
        filename = self.importer.file_name(beangulp.cache.get_file(filepath))
        # The current implementation of the archive command does not
        # modify the filename returned by the importer. This preserves
        # backward compatibility with the old implementation of the
        # command.
        return misc_utils.idify(filename) if filename else None

    def extract(self, filepath, existing_entries):
        p = inspect.signature(self.importer.extract).parameters
        if len(p) > 1:
            return self.importer.extract(beangulp.cache.get_file(filepath), existing_entries)
        return self.importer.extract(beangulp.cache.get_file(filepath))

and then I create wrappers like this:

""" My BECU Importer Wrapper """
from beancount_reds_importers.importers import becu
import Adapter


class Importer(Adapter):
    def __init__(self, config):
        self.importer = _Importer(config)


class _Importer(becu.Importer):
    def custom_init(self):
        super().custom_init()

I call it from my own tool like this:

from beangulp import Ingest

...

@click.group("import")
@click.version_option()
@click.pass_context
def importer(ctx):
    """
    Import Data (Beangulp)
    """
    loader = importlib.machinery.SourceFileLoader("config", os.path.join(os.getcwd(), "Import.cfg"))
    spec = importlib.util.spec_from_loader("config", loader)
    config = importlib.util.module_from_spec(spec)
    loader.exec_module(config)

    hooks = [process_entries]

    ctx.obj = Ingest(config.importers, hooks)

It will be nice to pull all that out eventually. If you are in a hurry you could do the same ... or just wait.

@redstreet
Copy link
Owner

Oh very cool, thank you for sharing, @patbakdev!

I haven't looked through it all, but if you want to turn it into a PR at some point, that'd be most welcome.

@patbakdev
Copy link
Contributor

Not sure what you want to do here. This code was just about making v2 importers work with beangulp by adapting the interface until you were ready to move to beangulp for those that don't want to wait. But I suspect you will want to move the importers to use the beangulp interface instead. I guess the question is whether or not you want to have some backwards compatibility with V2. In which case we might be able to invert this code to work the other way around. But that seems like a design decision. Personally I would just freeze V2 in a branch and move to V3 instead of having bankv2 and bankv3 importers. Maybe there is a smart way to do this so nobody has to update anything (at one point I think I read that beangulp was backwards compatible).

@redstreet
Copy link
Owner

Sorry I wasn't being clear. You're right: the plan is to freeze v2 and move on to v3. However, it's going to be several weeks or longer before I have any time to do it. Until then, I was thinking of temoprarily providing an adapter such as yours for folks wanting to adopt Beancount v3 right away.

On further thought, providing an adapter "officially" may not be worth the effort it takes to test it, provide examples, and such. Your code is already in this thread, and that should be helpful for anyone wanting it. Thoughts welcome if I'm missing anything. Thank you for engaging in this conversation!

@patbakdev
Copy link
Contributor

Ok. Yes, we are on the same page then. I'm also subscribed to the thread in case anyone runs into trouble adapting the code to their needs in the mean time.

@jodoox
Copy link

jodoox commented Jul 7, 2024

Not sure how to implement the adapter workaround for a custom importer relying on the csv reader: from beancount_reds_importers.libreader import csvreader fails (as expected for v3) with ModuleNotFoundError: No module named 'beancount.ingest'. I don't see how this could work without patching the csvreader file?

EDIT: all good, got it- I thought beangulp was v3 only

@patbakdev
Copy link
Contributor

Glad you figured it out. All of my banking/investment importers are still using OFX so I don't have any experience with CSV. Although I am not sure why that would matter. I did attempt to switch from OFX to CSV for Fidelity once upon a time by creating an importer and if I look through that (dead) code it looks pretty much the same.

from beancount_reds_importers.libreader import csvreader
from beancount_reds_importers.libtransactionbuilder import investments
from MY_CODE import Adapter

class Importer(Adapter):
    def __init__(self, config):
        self.importer = _Importer(config)

class _Importer(csvreader.Importer, investments.Importer):
    IMPORTER_NAME = 'Fidelity Brokerage CSV'
   ...

So I would think one would just need to do the same for an existing importer.

class _Importer(fidelity_csv.Importer, investments.Importer):
    def custom_init(self):
        super().custom_init()

Anyways, hope that helps.

@blalor
Copy link

blalor commented Oct 20, 2024

I took a quick look at converting this project to beangulp and immediately tripped on the removal of the regression testing plugin at beancount/beangulp@d4d9212

@redstreet have you played with this at all? If not, @dnicolodi do you have any advice on converting importers that did use that framework?

@redstreet
Copy link
Owner

@blalor I haven't had a chance to look yet. Thanks for the pointer. That commit message says there's a simpler testing framework now. Is that the one you're looking for advice to convert to? If so, I haven't taken a look at it yet.

@blalor
Copy link

blalor commented Oct 20, 2024 via email

@dnicolodi
Copy link

dnicolodi commented Oct 20, 2024

I'm not sure I understand the question. There was a piece of code that tried to define some helpers to use pytest to test importers. The code was ugly, hard to use, and made obsolete by new pytest features. Thus it has been removed.

Those who want to keep using pytest for testing their importers, they can continue to do so: they just need to hook up their tests with pytest in the way they prefer. Which functionality of the code that has been removed do you miss?

We found that most importers are easier to test with something much simpler than pytest and that is the simple test subcommand that the importers framework implements. This simply compares the output of the importer with an expected output and provides some not too ugly report. This has the advantage that it does not require to write any code. Somewhere I have some code that allows unittest to discover tests provided by this simple mechanism. I think something similar can be written for pytest but I haven't looked into it.

@blalor
Copy link

blalor commented Oct 23, 2024

I'm not sure I understand the question. There was a piece of code that tried to define some helpers to use pytest to test importers. The code was ugly, hard to use, and made obsolete by new pytest features. Thus it has been removed.

Which new pytest features are you referring to?

We found that most importers are easier to test with something much simpler than pytest and that is the simple test subcommand that the importers framework implements. This simply compares the output of the importer with an expected output and provides some not too ugly report. This has the advantage that it does not require to write any code. Somewhere I have some code that allows unittest to discover tests provided by this simple mechanism. I think something similar can be written for pytest but I haven't looked into it.

This is an example of the old regression test fixture:

from beancount.ingest import regression_pytest as regtest
from beancount_reds_importers.importers import dcu
@regtest.with_importer(
dcu.Importer(
{
"main_account": "Assets:Banks:DCU:Checking",
"currency": "USD",
}
)
)
@regtest.with_testdir(path.dirname(__file__))
class TestDCUCSV(regtest.ImporterTestBase):
pass

Is there an example of using the new framework with pytest? If not, do you have any pointers?

@Servinjesus1
Copy link
Author

One change to note is that beangulp requests we use the beangulp.importer ABC (abstract base class) rather than the beangulp.importer.ImportProtocol currently referenced by these importers (well, technically beancount.ingest.importer.ImportProtocol). The big difference is the files are no longer cached so file is now just a string and doesn't need file.name done to it, e.g., to get the file name as a string.

Caching is then left as a manual choice for users to add to their importers as they see fit. I have yet to need this myself (e.g. convert a PDF statement) so I don't know the details but it looks like you have a decorator from beangulp.cache or I just replaced file -> cache.get_file(file) everywhere and that effectively reinstitutes caching.

Also, you need to add an account method to your importer that just returns the account:

def account(self):
    return self.config["main_account"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants