Merge pull request #1 from garaekz/fix/getExperience

🎉 Lots of work for 1.2.0
garaekz · Jan 29, 2023 · bfe4a43 · bfe4a43
2 parents 3020989 + d7f2d77
commit bfe4a43
Show file tree

Hide file tree

Showing 21 changed files with 565 additions and 397 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,17 @@
 
+
 # InScraper
 ### A playwright based LinkedIn based scraper
 
-This is (currently) a small library built in typescript in order to scrape LinkedIn profiles using the vanity URL (slug or custom URL) using Playwright and Cheerio.
+This is (currently) a small library built in typescript in order to scrape LinkedIn profiles using the vanity URL (slug or custom URL) using Playwright ~~and Cheerio~~.
+
+I'm trying to stick to semver but I'm not sure if I'm doing it right, so please be aware that this library is still in development and the API may change.
+
+## Why?
+I was looking for a way to get some information from LinkedIn profiles and I found that there are some libraries that do that, but they are not maintained anymore and they use Puppeteer, which is a bit slow and heavy. I wanted to try Playwright, which is a new library that is built on top of Chromium, Firefox and Webkit, so it's faster and lighter than Puppeteer. I also wanted to try to use TypeScript, so I decided to build this library. I hope you find it useful.
+
+## Disclaimer
+This library is provided as is, without any warranty. I am not responsible for any misuse of this library. Please be aware that web scraping may be against the terms of service of LinkedIn, try to use a side account because it may get you banned (haven't see that yet but could be).
 
 ## Installation
 
@@ -15,21 +24,18 @@ Using Yarn:
  To use the library, you will need to provide a valid LinkedIn cookie. You can obtain this by logging into LinkedIn and inspecting the cookies in your browser, search for the one called `li_at`. Once you have the cookie, you can pass it to the `createClient` function, which will return an instance of the `Client` class.
 
     import { createClient } from  'inscraper';
-    
+
     const cookieString = 'YOUR_COOKIE_HERE';
     const client = await  createClient(cookieString);
 
  The `Client` class has the following methods:
 
--   `getProfile(profileSlug: string)`: Returns the profile information of the user with the given profile slug, including their name, headline, about and experience sections.
-
--   `getExperience(profileSlug: string)`: Returns the experience of the user with the given profile slug.
-
--   `getBrowser(): Browser`: Returns the Playwright browser instance.
+-   `getProfile(slug: string)`: Returns the profile information of the user with the given profile slug, including their name, headline, about and experience sections.
 
--   `getContext(): BrowserContext`: Returns the Playwright context instance.
-
--   `close()`: Closes the Playwright browser.
+-   `getExperience(slug: string)`: Returns only the experience of the user with the given profile slug.
+
+-   `getScreenshot()`: You can use this method to get a screenshot of the current page. This is useful if you want to see what the page looks like after you have performed some actions. It only works if you have called the `getProfile` or `getExperience` methods before. It's built on top of Playwright's `screenshot` method, so you can pass the same options to it. See [Playwright's documentation](https://playwright.dev/docs/api/class-page#page-screenshot) for more information.
+
 
 Get some profile info:
 
@@ -41,34 +47,49 @@ const experience = await client.getExperience('profile-slug);
 console.log(experience);
 ```
 
-You can also use the `getBrowser()`  and  `getContext()` methods to perform other actions with Playwright and the `close()` method to close the browser when you are done scraping. Note that if the provided cookie is invalid, the library will throw an error, 'Cookies error'
+Note that if the provided cookie is invalid, the library will throw an error, 'Cookies error'
 
 ## Full Example
 ```
-import { createClient } from 'linkedin-scraper';
+import fs from "fs";
+import { PageScreenshotOptions } from 'playwright';
+import { createClient } from "inscraper/client";
 
 const cookieString = 'YOUR_COOKIE_HERE';
 const client = await createClient(cookieString);
 
-const profile = await client.getProfile('profile-slug');
-console.log(profile);
+const slug = 'profile-slug';
 
-const experience = await client.getExperience('profile-slug');
+const profile = await client.getProfile(slug);
+console.log(profile);
+const options: PageScreenshotOptions = {
+  type: "png",
+  fullPage: true,
+}
+const buffer = await profile.getScreenshot(options)
+fs.writeFileSync(`screenshots/${slug}.png`, buffer);
+
+const experience = await client.getExperience(slug);
 console.log(experience);
 await client.close()
 ```
 ## Compatibility 
 This library uses Playwright, which is compatible with Chromium, Firefox and Webkit. For this implementation, Chromium is being used.
 
 ## Dependencies
-This library depends on playwright and cheerio.
+This library depends on playwright ~~and cheerio~~.
 
 ## Contributions
 Your contributions are always welcome! Please feel free to submit a pull request or open an issue.
 
 ## Features
 - [x] Work with Cookies
 - [x] Get basic info from Profile
+- [x] Get Experience from Profile
+- [ ] Get Education from Profile
+- [ ] Get Skills from Profile
+- [ ] Get Recommendations from Profile
+- [x] Get Screenshots of a visited profile
 - [ ] Extend it to try and use voyager API (see if that's still a thing)
 - [ ] Add a test suite
 - [ ] Add more features to this list
@@ -79,7 +100,3 @@ This library is provided under the [MIT License](https://opensource.org/licenses
 
 ## Contact 
 Please feel free to contact me if you have any questions or issues. 
-## Additional notes
-Please be aware that web scraping may be against the terms of service of LinkedIn, try to use a side account because it may get you banned (haven't see that yet but could be).
-
-This is a work in progress and only with educational purposes, correct information handling is not a joke.
diff --git a/dist/client.js b/dist/client.js
diff --git a/dist/client.js.map b/dist/client.js.map