Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Names in Metadata #351

Open
srisi opened this issue Apr 30, 2019 · 9 comments
Open

Fix Names in Metadata #351

srisi opened this issue Apr 30, 2019 · 9 comments
Assignees

Comments

@srisi
Copy link
Member

srisi commented Apr 30, 2019

There are some duplicates left in our metadata sheet.
Here's a list of likely candidates to look into.
If you have either checked or fixed a last name, please add it here so I can update the list.

Last name:
        Adams [DONE ra 2019-05-02] 
First name variants:
         W. C. [transcription error] 
         C. W. [standardized to this]

Last name: 
        Blackburn [DONE ra 2019-05-02]
First name variants:
         Jacob F. [standardized to this - Jack is a nickname throughout]
         Jack F.
 
Last name:
        Brown [DONE ra 2019-05-02]
First name variants:
         Gordon S.
         E. Cary [genuinely a different person!]
         Dean [this is G.S. Brown's title]
         C. S. [transcription error - actually G.S. for Gordon S.]

Last name:
        Cusick [DONE ra 2019-05-02]
First name variants:
         Unknown [both cases of this seem to be Paul V.]
         Paul V.

Last name:
        Fano [DONE ra 2019-05-02]
First name variants:
         M. R. [transcription error]
         R. M.

Last name:
        Floe [maybe done? ra 2019-05-02]
First name variants:
         Carl F.corbató [there's a missing semicolon after 'F.' here, which has conflated a couple of names into one entry... I've fixed this in the metadata, but this problem may have been masking other problems...]
         Unknown 

Last name:
        Glaser [DONE ra 2019-05-02]
First name variants:
         Erza [transcription error]
         Ezra 

Last name:
        Green [Done IA 2019-05-09] [ the names are different people ]
First name variants:
         J. W.
         W. D.

Last name:
        Herring
First name variants:
         Pendleton 
         Unknown [IA 2019-05-09 - I can't see how people identified the cc on documents written by Herring, Unknown] 

Last name:
        Hill [IA 2019-05-09 - no indication these are the same people Al and Albert could be different people]
First name variants:
         Albert G.
         Al

Last name:
        Hunter [Done IA 2019-05-09]
First name variants:
         G. Truman
         Truman 

Last name:
        Hurd [DONE ra 2019-05-02]
First name variants:
         Cuthbert C.
         Unknown [seems to be Cuthbert -- 2_3_morse_correspondence_m_z_77]

Last name:
        Johnson
First name variants:
         Eldon L.
         E. C.

Last name:
        Jones
First name variants:
         Dorothy P.
         Robert E.
         Unknown [IA 2019-05-09 no indication]

Last name:
        Little
First name variants:
         John D. c
         J. A.

Last name:
        Maxwell [DONE  ra 2019-05-02 - these are genuinely different people]
First name variants:
         Joseph R.
         I. R.

Last name:
        Mccarthy
First name variants:
         Unknown 
         John 

Last name:
        Mccormack
First name variants:
         James 
         E. L.

Last name:
        Morse [DONE ra 2019-05-02]
First name variants:
         Philip M.
         F. M. [foreign correspondent misspelling of Philip]

Last name:
        Mosteller [done ra 2019-05-09 - these are all C. Frederick Mosteller, Harvard statistician]
First name variants:
         Frederick 
         Unknown 
         C. F.

Last name:
        Murray [Done IA 2019-05-09 Different people]
First name variants:
         Jane F.
         J. M.

Last name:
        Mussard [DONE ra 2019-05-02]
First name variants:
         Jean M. [transcription error]
         Jean A. 

Last name:
        Panov [DONE ra 2019-05-02]
First name variants:
         Yu D. [transcription error]
         D. Yu

Last name:
        Pigford [DONE ra 2019-05-02]
First name variants:
         Thomas H.
         T. J. [index finger typo in the original! - clearly Thomas H as the same office]

Last name:
        Reissner [Done IA 2019-05-08]
First name variants:
         R. [R looks like E, you have too zoom right in to see]
         E. 

Last name:
        Shader
First name variants:
         Melvin A.
         Unknown [IA 2019-05-08 this is Melvin A.]
         Mel [still unclear if this is Melvin, if we can find out if Melvin A. is also a Dr. then we have sufficient information]

Last name:
        Steinberg
First name variants:
         Unknown 
         J. R.

Last name:
        Tucker
First name variants:
         Unknown 
         John A.
         C. E.

Last name:
        Unknown
First name variants:
         Rosemary 
         Elaine 
         Jewell 
         Jane 

Last name:
        Verzuh
First name variants:
         Edna Tamm
         Frank M.
         H. M.

Last name:
        Webber [done RA 2019-05-09]
First name variants:
         Unknown [pretty clearly Roger from the other folks on the doc]
         Roger P.
         D. S. r [this is Roger - D.S.R. is the abbreviation for his title]

Last name:
        Wells [done (?) ra 2019-05-09 - these seem to be different, but there isn't really enough info in the documents to tell... it may be that one of the docs (1_25_proposed_conference_28) to "Wells" is not to W.D. but actually to W.H., but I don't have enough info here to tell...]
First name variants:
         W. D.
         W. H.

Last name:
        Weyl [DONE ra 2019-05-02]
First name variants:
         F. Joachim
         Joachim F. [transcription error]
``
@ryaanahmed
Copy link
Member

I think we're calling nameparser's capitalize method incorrectly - putting a pin in this with this comment to come back - see, e.g., Mccormack above, which is correctly transcribed McCormack in the spreadsheet/csv.

I don't think this stuff in name_parser.py does quite what we want it to:

 69         name = HumanName(name_raw)
 70         # If first and middle initials have periods but not spaces -> separate, e.g. "R.K. Teague"
 71         if re.match('[a-zA-Z]\.[a-zA-Z]\.', name.first):
 72             name.middle = name.first[2]
 73             name.first = name.first[0]
 74 
 75         name.last = name.last.capitalize()
 76         name.first = name.first.strip('.').capitalize()
 77         name.middle = name.middle.strip('.').capitalize()

Poked around a little, and I think the thing to do is to set all of the name fields and then run name.capitalize() on the whole thing to modify it in place, and then extract name.last, name.first, and name.middle. Will take a proper look tmrw.

@ryaanahmed
Copy link
Member

@srisi

In [1]: from nameparser import HumanName                                                                                                                                                                    

In [2]: name = HumanName('McCormack, E. L.')                                                                                                                                                                

In [3]: name.last                                                                                                                                                                                           
Out[3]: 'McCormack'

In [4]: name.last.capitalize()                                                                                                                                                                              
Out[4]: 'Mccormack'

whereas...

In [15]: name = HumanName('Mccormack, E. L.')                                                                                                                                                               

In [16]: name.capitalize(force=True)                                                                                                                                                                        

In [17]: name.last                                                                                                                                                                                          
Out[17]: 'McCormack'

... which does seem off, but there you go. I'll fix and make a PR.

@ryaanahmed
Copy link
Member

ryaanahmed commented May 9, 2019

[2019-05-13 -- this comment replaced with updated list below]

@erica02139
Copy link
Collaborator

Computation Center phone directory records from 1955-56 will help address some of these; we'll have these visually tomorrow.

@erica02139 erica02139 self-assigned this May 12, 2019
@ryaanahmed
Copy link
Member

ryaanahmed commented May 13, 2019

For some reason @erica02139 's latest edited list sent by email didn't get attached to this issue. Here it is, replacing my comment above:

[deleted -- old, replaced by Erica's list below]

@erica02139
Copy link
Collaborator

Thanks, @ryaanahmed! Have done more; will post tonight: also, here are Computation Center directory records from 1956-1963: will post on Slack, too.
https://drive.google.com/open?id=1-34DNXHFO6kgb2lXIzL9W-qouWJzN8RQ

@mscuthbert
Copy link
Member

Perfect -- I just went over and took lots of pictures of the area. Will make a story.

@erica02139
Copy link
Collaborator

erica02139 commented May 16, 2019

Last name:
	Arden
First name variants:
	Bruce W. [real: mez]
	Unknown [2-14, p. 61 is likely “Arden, Dean N.”; 3-9, p. 3 and 3-10, p. 88 likely “Arden, Bruce W.” who was “Mr.” not “Dr.”]
	Dean M. [chg to “Dean N.”: mez]
	Dean A. [chg to “Dean N.”: text is “Dean”: mez]
	Dean N. [real: mez]

Last name:
	Brown
First name variants:
	Sanborn C. [real: mez]
	Gordon S. [real: mez]
	Unknown [chg to “Brown, Gordon S.”: mez]
	E. Cary [real: mez]

Last name:
	Caldwell
First name variants:
	Samuel H. [real: mez]
	David O. [real: mez]

Last name:
	Campbell
First name variants:
	Elizabeth J. [real: mez]
	Ashley S. [real on 2-14, p. 102; mis-entered for 2-14, p. 85 (chg to “unknown”): mez]
	Unknown [chg to “Ashley S.”: mez]
	Pamela [real: added details to 2-26, doc 2 “note” field: mez]

Last name:
	Case
First name variants:
	 Harold [real: mez]
	 Leon W. [real: mez]

Last name:
	Clark
First name variants:
	 George W. [real: mez]
	 M. [real: “M.” is for Melville: mez]

Last name:
	Coleman
First name variants:
	 Courtney [real:mez]
	 Albert F.

Last name:
	Davis
First name variants:
	 Philip J. [real:mez]
	 David M.
	 Sam H.

Last name:
	Floe
First name variants:
	 Unknown [likely “Carl F.”; referred to by last name in document also referencing “Corby”: mez]
	 Carl F. [real:mez]

Last name:
	Green
First name variants:
	 Alan I.
	 W. D. [real; name is “Green, William D.”: mez]
	 J. W.

Last name:
	Hansen
First name variants:
	 K. E. [real: mez]
	 R. J.

Last name:
	Harris
First name variants:
	 Rufus 
	 Louis 

Last name:
	Helwig
First name variants:
	 Frank C.
	 Diana B.

Last name:
	Herring
First name variants:
	 Pendleton 
	 Unknown 

Last name:
	Hill
First name variants:
	 Richard H.
	 Albert G.
	 Marjorie 
	 Laura 
	 Al 

Last name:
	Howard
First name variants:
	 R. 
	 J. 

Last name:
	Hunter
First name variants:
	 G. Truman
	 Truman 
	 P. L.

Last name:
	Johnson
First name variants:
	 Howard W.
	 Eldon L.
	 Anthony 
	 E. C.

Last name:
	Jones
First name variants:
	 Dorothy P.
	 Fletcher 
	 Robert E.
	 Unknown [IA 2019-05-09 no indication]


Last name:
	Killian
First name variants:
	 James R.
	 T. J.

Last name:
	Little
First name variants:
	 John D. C
	 J. A.

Last name:
	Mann
First name variants:
	 Leonard A.
	 Edward S.

Last name:
	Mason
First name variants:
	 E. A.
	 R. D.


Last name:
	McCormack
First name variants:
	 James 
	 E. L.

Last name:
	Miller
First name variants:
	 Unknown 
	 C. L.
	 S. 

Last name:
	Morris
First name variants:
	 J. C.
	 G. J.

Last name:
	Nelson
First name variants:
	 Clifford V.
	 Robert A.

Last name:
	Peterson
First name variants:
	 Carl M. F
	 L. 

Last name:
	Price
First name variants:
	 Daniel O.
	 B. G.

Last name:
	Robertson
First name variants:
	 Harold 
	 J. E.

Last name:
	Shader
First name variants:
	 Melvin A.
	 Mel [still unclear if this is Melvin, if we can find out if Melvin A. is also a Dr. then we have sufficient information]


Last name:
	Slater
First name variants:
	 John C.
	 John M.

Last name:
	Smith
First name variants:
	 Paul A.
	 E. H.

Last name:
	Stoddard
First name variants:
	 R. E.
	 P. A.

Last name:
	Stratton
First name variants:
	 Julian S. (changed to Julius A., president of MIT. Letter is addressed to MIT president).
	 Julius A.

Last name:
	Thompson
First name variants:
	 Greg R.
	 T. J.
	 C. G.

Last name:
	Tucker
First name variants:
	 Unknown 
	 John A.
	 C. E.

Last name:
	Unger
First name variants:
	 Ing H.
	 H. 

Last name:
	Unknown
First name variants:
	 Rosemary 
	 Jewell 
	 Elaine 
	 Jane 
	 Ray 

Last name:
	Verzuh
First name variants:
	 Edna Tamm
	 Frank M.
	 H. M.   //actually F.M. fixed. sr

Last name:
	Walker
First name variants:
	 Gordon L.
	 Eric 

Last name:
	Walsh
First name variants:
	 Fr Michael P
	 Joseph B.

Last name:
	Wells
First name variants:
	 W. D.
	 W. H. [different people. //sr]

Last name:
	Williams
First name variants:
	 Richard H.
	 Robert W.

@ryaanahmed
Copy link
Member

I'm removing this from the deploy-ready milestone. At this point, it looks to me like we've cleaned most of the truly wrong metadata; there's still some sleuthing to do around many of the 'unknown' first-name folks, but this can become ongoing project maintenance work and work for the summer.

@ryaanahmed ryaanahmed removed this from the Deploy-ready milestone May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants