Need help setting up. #870
-
The TL;DR of my problem is: I'd like to run a fuzzy match search on a public dataset containing companies registered in the UK. I'm following this AWS Glue article, but some information is missing. I have some questions to get started (sorry if they are dumb): 1- I should train a new model, right? Since my data schema is not the same as any of the Zingg training data. Thanks so much for the attention :) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
1- Yes, you need to train new model(even though if schema matches with some existing model, domain of data is not yet learned) Note: Instead of making training data by yourself, it's recommended to do findTrainingData and label phase and then train on zingg created training data. In this way Zingg would be able to learn domain and distribution of data more precisely Thanks! |
Beta Was this translation helpful? Give feedback.
1- Yes, you need to train new model(even though if schema matches with some existing model, domain of data is not yet learned)
2- After preparing data, you need to make config file like example. You just need additional columns as z_ismatch and z_cluster in your training data file. Also make sure your trainingData file is aligned to this example. Basically all the matching records should have same z_cluster and z_ismatch == 1 and un-mtached records should have z_ismatch == 0
3- yes, you can do that as well by providing "zinggDir": "s3://your bucket name/models" in config file and having your localy trained model inside s3://your bucket name/models. Make sure "modelId" in config file is sa…