You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been working with the Harvard Patent Dataverse 2010 datasets for quite a while now and have stumbled across an issue with unique inventor identification for records with assignee numbers starting with A or H (e.g. H000000000158 for Shell Oil Company instead of the regular 10266734). The algorithm seems to incorrectly assign different inventor ID's to records with such assignee numbers, while the other characteristics of the record are very similar or exactly the same as the records listing the 'regular' assignee number.
Here's an example for one of Shell's key inventors:
HAROLD J
VINEGAR BELLAIRE US 7631690 SHELL OIL COMPANY 10266734 166 04359687-1 2009
HAROLD J VINEGAR BELLAIRE US 7635023 SHELL OIL COMPANY 10266734 166 04359687-1 2009
HAROLD J VINEGAR BELLAIRE US 7635025 SHELL OIL COMPANY 10266734 166 04359687-1 2009
As you can see, these are OK. The inventor is correctly assigned with Invnum 04359687-1. However, the following records receive a different Invnum, while the inventor is of course the same based on the characteristics of the other data fields:
HAROLD J VINEGAR BELLAIRE US 7640980 SHELL OIL COMPANY H000000000158 166-268/166-302/166-369/405-52 07640980-0 2010
HAROLD J VINEGAR BELLAIRE US 7735935 SHELL OIL COMPANY H000000000158 299-5/166-2721/166-302/299-4 07735935-0 2010
HAROLD J VINEGAR BELLAIRE US 7681647 SHELL OIL COMPANY H000000000158 166-302/166-369 07681647-2 2010
For larger selections of data, this leads to a lot of missing connections and overall less connected or dense networks than is actually the case. So far, I've manually corrected the Invnum's for these records, but of course this is not the way to go for selections containing thousands of records ;-)
Would it be possible to address this issue in the next release of the datasets? Please let me know if there's any other info I can provide to further clarify this issue.
Thanks,
André
The text was updated successfully, but these errors were encountered:
I've been working with the Harvard Patent Dataverse 2010 datasets for quite a while now and have stumbled across an issue with unique inventor identification for records with assignee numbers starting with A or H (e.g. H000000000158 for Shell Oil Company instead of the regular 10266734). The algorithm seems to incorrectly assign different inventor ID's to records with such assignee numbers, while the other characteristics of the record are very similar or exactly the same as the records listing the 'regular' assignee number.
Here's an example for one of Shell's key inventors:
HAROLD J
VINEGAR BELLAIRE US 7631690 SHELL OIL COMPANY 10266734 166 04359687-1 2009
HAROLD J VINEGAR BELLAIRE US 7635023 SHELL OIL COMPANY 10266734 166 04359687-1 2009
HAROLD J VINEGAR BELLAIRE US 7635025 SHELL OIL COMPANY 10266734 166 04359687-1 2009
As you can see, these are OK. The inventor is correctly assigned with Invnum 04359687-1. However, the following records receive a different Invnum, while the inventor is of course the same based on the characteristics of the other data fields:
HAROLD J VINEGAR BELLAIRE US 7640980 SHELL OIL COMPANY H000000000158 166-268/166-302/166-369/405-52 07640980-0 2010
HAROLD J VINEGAR BELLAIRE US 7735935 SHELL OIL COMPANY H000000000158 299-5/166-2721/166-302/299-4 07735935-0 2010
HAROLD J VINEGAR BELLAIRE US 7681647 SHELL OIL COMPANY H000000000158 166-302/166-369 07681647-2 2010
For larger selections of data, this leads to a lot of missing connections and overall less connected or dense networks than is actually the case. So far, I've manually corrected the Invnum's for these records, but of course this is not the way to go for selections containing thousands of records ;-)
Would it be possible to address this issue in the next release of the datasets? Please let me know if there's any other info I can provide to further clarify this issue.
Thanks,
André
The text was updated successfully, but these errors were encountered: