Replies: 1 comment
-
Yes your understanding is correct @EasyLOB - the match phase of Zingg finds all records that are similar. What you are trying to achieve can be done by running the link phase which takes the first source as master and compares records from the second to that. Training for both the phases is the same. Just define both the datasets and run findTrainingData and label till you have a representation of the matching you want to do. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I have already read some articles, documentation and watched a Video about Zingg.
Refer to the text below, from the documentation:
Companies use entity resolution to connect disparate data sources to clean data, see non-obvious relationships across multiple data silos, and get a unified view of data. In the process, they may build a master data management system (MDM) by combining the data from the various sources and thus develop a single source of truth.
As far as I understood Zingg allways compares one or many data sources ( all X all ) looking for duplicates: is this understanding correct ?
What I need to do is to compare some ( string ) inputs, with a ( big ) table to find matches.
This ( big ) table is already a single-source-of-truth.
Eventually, if something new comes in the input and no match is found, I will include it manually in the ( big ) table.
Does Zingg work like this ?
If yes, how do I train Zingg when I already have a single-source-of-truth ?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions