Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPRECATED! implement validator for company existion #2601

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

OlegPhenomenon
Copy link
Contributor

@OlegPhenomenon OlegPhenomenon commented Jul 10, 2023

bundle exec rake company_status:check_all -- --open_data_file_path=lib/tasks/data/ettevotja_rekvisiidid__lihtandmed.csv --missing_companies_output_path=lib/tasks/data/missing_companies_in_business_registry.csv --deleted_companies_output_path=lib/tasks/data/deleted_companies_from_business_registry.csv --download_path=https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip --soft_delete_enable=false

This rake task performs the following actions:

  • downloads an archive
  • unzips it
  • checks all companies from our registry to see if they are in the business registry based on the downloaded data
  • if not present, a query is made to the business registry
  • if a company has been deleted, it is saved in the file specified here at deleted_companies_output_path, if information about the company is missing, it is saved in the file specified here at missing_companies_output_path
  • we set company status and validation date to the Contact model
  • We can also decide whether to perform a soft deletion or not through a flag (needed for the first run).

Therefore, the attributes look like this:

  • open_data_file_path - specifies where the data is saved and retrieved from. Default value lib/tasks/data/ettevotja_rekvisiidid__lihtandmed.csv
  • missing_companies_output_path - specifies the path where companies not found in the business registry will be saved. Default value lib/tasks/data/missing_companies_in_business_registry.csv
  • deleted_companies_output_path - specifies the path where companies that have been removed from the registry will be saved. Default value deleted_companies_from_business_registry.csv
  • download_path - specifies where the data will be downloaded from. Default value https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip
  • soft_delete - Indicates whether to run soft deletion for companies that have been removed, gone bankrupt, or are missing from the business registry. (Default value False)

Since this command already includes default values, it is not necessary to enter any parameters; they were simply added for greater flexibility. Therefore, you can run the following command:
bundle exec rake company_status:check_all

and the data will be available in the directory lib/tasks/data

The job:
CompanyRegisterStatusJob.perform_later(days_interval = 14, spam_time_delay = 0.2, batch_size = 100, download_open_data_file_url='https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip')

This job accepts the following parameters:

  1. days_interval - selects domains that were last checked more than {days_interval} days ago.
  2. spam_time_delay - this is the time delay when querying the business registry.
  3. batch_size - the size of the batch for processing. This is needed for optimization.
  4. download_open_data_file_url - the URL from which to download the business registry data.

As indicated above, all these values have default settings, so they can be modified if necessary.

What the job does:

  • It selects companies from Estonia that were checked N days ago or companies that are in liquidation/bankruptcy/removed from the registry - or generally contain no information about having been validated (NULL value).
  • For each of these, a request is made to the registry to determine the status.
  • If the status is K/N or there is no information, we set ForceDelete if it is not already set or SoftDelete if kandeliik is Kustutamiskanne dokumentide hoidjata.
  • If the previous status was R, and the status in the business registry is R, we simply update the date of the check.
  • If a domain has ForceDelete due to the company's status, and the status is K/N, but the business registry shows status R, we cancel ForceDelete.
  • For domains in status_notes, we specify the following information Company no: {ident_number} if we set ForceDelete due to bankruptcy, company removal from the registry, or its absence.
  • If the domain status is L, we send them an email.
  • Also we use whitelist for skip some organization. Whitelist is indicated in application.yml file and it has this structure:
whitelist_companies:
  - '12345678'
  - '87654321'

POTENTIAL PROBLEM: It could happen that we decide to check a large array of data in one day, and say the next time we decide to check in a year, and logically this job might process a large list of companies exactly one year later. This should be kept in mind.

this PR related to this one #internetee/company_register#6

related tickets: internetee/company_register#4 internetee/company_register#5

@viezly
Copy link

viezly bot commented Jul 10, 2023

This pull request is split into 5 parts for easier review.
👀 Review pull request on Viezly

Changed files are located in these folders:

  • /
  • app/interactions/actions
  • app/jobs
  • app/mailers
  • app/models
  • app/views/mailers
  • db
  • lib/gem_monkey_patches
  • test

@OlegPhenomenon OlegPhenomenon force-pushed the business-registry-check-for-company-existing branch 6 times, most recently from b393eda to 6dd5635 Compare July 11, 2023 13:38
@OlegPhenomenon OlegPhenomenon requested a review from maricavor July 11, 2023 13:40
@OlegPhenomenon OlegPhenomenon force-pushed the business-registry-check-for-company-existing branch from 02671f1 to fb4c53c Compare July 12, 2023 12:18
@OlegPhenomenon OlegPhenomenon force-pushed the business-registry-check-for-company-existing branch from 6d73a34 to e4fa329 Compare May 6, 2024 12:02
@OlegPhenomenon OlegPhenomenon force-pushed the business-registry-check-for-company-existing branch from 8e871d9 to 4aafe29 Compare September 10, 2024 09:21
@OlegPhenomenon OlegPhenomenon force-pushed the business-registry-check-for-company-existing branch from 905017d to 814b561 Compare September 13, 2024 07:07
@vohmar
Copy link
Contributor

vohmar commented Sep 18, 2024

the output list of invalid org contacts currently includes all Estonian org type objects no matter the role. But as we set Force Delete only on domains where such an object is in the role of a registrant we need to generate a sub-list or add a role indicator to the output so it would be possible to filter out only the ones important in the context of ForceDelete.

@vohmar vohmar assigned OlegPhenomenon and unassigned vohmar Sep 18, 2024
@vohmar
Copy link
Contributor

vohmar commented Oct 4, 2024

latest test resulted in again with multiple instances of the same entity, but more importantly each entity was matched with only one domain so if a company had 3 domains force delete was set only on one of them

@OlegPhenomenon OlegPhenomenon changed the title implement validator for company existion DEPRECATED! implement validator for company existion Oct 7, 2024
@OlegPhenomenon OlegPhenomenon added on hold Not for merging for testing and development only labels Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Not for merging for testing and development only on hold
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants