Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2017-06-02 ~ Data Scraping: LinkedIn #10

Open
theo-armour opened this issue Jun 3, 2017 · 0 comments
Open

2017-06-02 ~ Data Scraping: LinkedIn #10

theo-armour opened this issue Jun 3, 2017 · 0 comments

Comments

@theo-armour
Copy link
Member

Over the past couple of weeks we made quite a bit of progress with accessing corporate data via LinkedIn. We have become familiar with its API . See scripts in linkedin oauth. The Linked in API is not that difficult to navigate.

There is, however, a significant issue with obtaining data from LinkedIn. The issue is LinkedIn. They are happy to share data for a price or you are well established. In other words, they take good care of their friends. On the other hand, if you are experimenting or learning then LinkedIn treats you the way it treats all people who are not friends: It makes gathering data difficult and seriously takes the fun out of things:

LinkedIn sues anonymous data scrapers

The only data we may freely and currently access is a limited amount of data relating to individual user profiles - plus a small amount of data about the organization they currently work for. We are not able access any stand-alone corporate data.

For example, this page for Google - https://www.linkedin.com/company-beta/1441/ - provides the number of Google employees and number of LinkedIn followers. The issue is that this data is loaded and displayed well after the basic source code has been loaded. We have not been been able to scrape data of these corporate pages using FOSS client-side tools. We can imagine may interesting ways of doing capturing the data. For example, we could build an app that uses screen capture and optical character recognition (OCR) to harvest the numbers.

But given LinkedIn's antipathy toward such tricks, we think it more prudent to wait until we become more established. As and when that happens we will contact LinkedIn and and establish a relationship that will enable preIQtiv to access useful, timely data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant