You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
I am linking two datasets. Both of them contain unique id's as identifiers. After reading two datasets into pandas data frames I set those id's as their indexes. So that after the classification, I would be able to figure out which records from each dataset matched. But after setting those Id's as indexes, I am getting an error in the blocking step.
ValueError: index of DataFrame is not unique
I am sure the two IDs do not have duplicates. Here are some of the codes. Can you please help what the problem is?
@Isun907 I had a similar issue and I reindex my dataframe df.index = np.arange(len(df)) or do data_df.reset_index(col_level=1, drop=True, inplace=True) to solve this issue. Someone might have a better solution to this.
Hi
I am linking two datasets. Both of them contain unique id's as identifiers. After reading two datasets into pandas data frames I set those id's as their indexes. So that after the classification, I would be able to figure out which records from each dataset matched. But after setting those Id's as indexes, I am getting an error in the blocking step.
ValueError: index of DataFrame is not unique
I am sure the two IDs do not have duplicates. Here are some of the codes. Can you please help what the problem is?
import pandas as pd
import recordlinkage
firm_name = pd.read_csv(r"C:\Users\XXX\Dropbox\YYY\firmname.csv", index_col='ID_EMPLOYER', encoding='latin-1')
ccm_name = pd.read_csv(r"C:\Users\XXX\Dropbox\YYY\comphist.csv", index_col='ID_HCONM', encoding='latin-1')
indexer = recordlinkage.Index()
indexer.block(left_on='EMPLOYER_STATE', right_on='HSTATE')
candidates = indexer.index(firm_name, ccm_name)
Then I got this error messsage:
ValueError: index of DataFrame is not unique
Can anyone help please?
The text was updated successfully, but these errors were encountered: