You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that the version of scikit-learn v1.2.0 releases in Dec 2022 is breaking the formasaurus init command. See the following output:
Training form type detector on 1423 example(s)...
#9 4.760 Traceback (most recent call last):
#9 4.760 File "/usr/local/bin/formasaurus", line 33, in <module>
#9 4.761 sys.exit(load_entry_point('formasaurus==0.9.0', 'console_scripts', 'formasaurus')())
#9 4.761 File "/usr/local/lib/python3.9/site-packages/formasaurus-0.9.0-py3.9.egg/formasaurus/__main__.py", line 72, in main
#9 4.761 formasaurus.FormFieldClassifier.load()
#9 4.761 File "/usr/local/lib/python3.9/site-packages/formasaurus-0.9.0-py3.9.egg/formasaurus/classifiers.py", line 101, in load
#9 4.761 ex = cls.trained_on(DEFAULT_DATA_PATH)
#9 4.761 File "/usr/local/lib/python3.9/site-packages/formasaurus-0.9.0-py3.9.egg/formasaurus/classifiers.py", line 119, in trained_on
#9 4.761 ex.train(annotations)
#9 4.761 File "/usr/local/lib/python3.9/site-packages/formasaurus-0.9.0-py3.9.egg/formasaurus/classifiers.py", line 131, in train
#9 4.761 self.form_classifier.train(annotations)
#9 4.761 File "/usr/local/lib/python3.9/site-packages/formasaurus-0.9.0-py3.9.egg/formasaurus/classifiers.py", line 266, in train
#9 4.761 self.model = formtype_model.train(
#9 4.761 File "/usr/local/lib/python3.9/site-packages/formasaurus-0.9.0-py3.9.egg/formasaurus/formtype_model.py", line 128, in train
#9 4.762 return model.fit(X, y)
#9 4.762 File "/usr/local/lib/python3.9/site-packages/sklearn/pipeline.py", line 402, in fit
#9 4.762 Xt = self._fit(X, y, **fit_params_steps)
#9 4.762 File "/usr/local/lib/python3.9/site-packages/sklearn/pipeline.py", line 360, in _fit
#9 4.762 X, fitted_transformer = fit_transform_one_cached(
#9 4.762 File "/usr/local/lib/python3.9/site-packages/joblib/memory.py", line 349, in __call__
#9 4.762 return self.func(*args, **kwargs)
#9 4.762 File "/usr/local/lib/python3.9/site-packages/sklearn/pipeline.py", line 894, in _fit_transform_one
#9 4.762 res = transformer.fit_transform(X, y, **fit_params)
#9 4.762 File "/usr/local/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
#9 4.763 data_to_wrap = f(self, X, *args, **kwargs)
#9 4.763 File "/usr/local/lib/python3.9/site-packages/sklearn/pipeline.py", line 1193, in fit_transform
#9 4.763 results = self._parallel_func(X, y, fit_params, _fit_transform_one)
#9 4.763 File "/usr/local/lib/python3.9/site-packages/sklearn/pipeline.py", line 1215, in _parallel_func
#9 4.763 return Parallel(n_jobs=self.n_jobs)(
#9 4.763 File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 1088, in __call__
#9 4.764 while self.dispatch_one_batch(iterator):
#9 4.764 File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 901, in dispatch_one_batch
#9 4.764 self._dispatch(tasks)
#9 4.764 File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 819, in _dispatch
#9 4.764 job = self._backend.apply_async(batch, callback=cb)
#9 4.764 File "/usr/local/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
#9 4.764 result = ImmediateResult(func)
#9 4.764 File "/usr/local/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 597, in __init__
#9 4.764 self.results = batch()
#9 4.764 File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 288, in __call__
#9 4.765 return [func(*args, **kwargs)
#9 4.765 File "/usr/local/lib/python3.9/site-packages/joblib/parallel.py", line 288, in <listcomp>
#9 4.765 return [func(*args, **kwargs)
#9 4.765 File "/usr/local/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__
#9 4.765 return self.function(*args, **kwargs)
#9 4.765 File "/usr/local/lib/python3.9/site-packages/sklearn/pipeline.py", line 894, in _fit_transform_one
#9 4.765 res = transformer.fit_transform(X, y, **fit_params)
#9 4.765 File "/usr/local/lib/python3.9/site-packages/sklearn/pipeline.py", line 446, in fit_transform
#9 4.766 return last_step.fit_transform(Xt, y, **fit_params_last_step)
#9 4.766 File "/usr/local/lib/python3.9/site-packages/sklearn/feature_extraction/text.py", line 2121, in fit_transform
#9 4.766 X = super().fit_transform(raw_documents)
#9 4.766 File "/usr/local/lib/python3.9/site-packages/sklearn/feature_extraction/text.py", line 1358, in fit_transform
#9 4.768 self._validate_params()
#9 4.768 File "/usr/local/lib/python3.9/site-packages/sklearn/base.py", line 570, in _validate_params
#9 4.768 validate_parameter_constraints(
#9 4.768 File "/usr/local/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 97, in validate_parameter_constraints
#9 4.768 raise InvalidParameterError(
#9 4.768 sklearn.utils._param_validation.InvalidParameterError: The 'stop_words' parameter of TfidfVectorizer must be a str among {'english'}, an instance of 'list' or None. Got {'and', 'of', 'or'} instead.
This command works fine with the previous version of scikit-learn v1.1.3
The text was updated successfully, but these errors were encountered:
This should be fixed in https://github.com/scrapinghub/Formasaurus (released as 0.9.0). Unfortunately we lost access to this repo, so the development is moved to another location.
It seems that the version of scikit-learn v1.2.0 releases in Dec 2022 is breaking the
formasaurus init
command. See the following output:This command works fine with the previous version of scikit-learn v1.1.3
The text was updated successfully, but these errors were encountered: