Collection of Livebook notebooks.
Livebook is also a great way to get your feet wet with Elixir concepts, like a powerful language scratchpad.
- install a compatible Elixir and Erlang version. You may wish to use asdf.
- We have included a
.tool-versions
file to support local Elixir versions
- We have included a
- Install Livebook via escript (preferred) or via the Desktop app .
- From this project folder run
livebook server index.livemd
. This opens the main navigation page where you can access all the other Livebook examples.
I'm always on the lookout for elixir job posts so when I stumbled on SpiderMan as a crawler and it's livebook example, I was intrigued. The example crawls [https://elixirjobs.net/] to create a CSV of jobs by link
, title
, sub_title
, date
, workplace
, and type
.
I wanted to take the example a few steps further:
- Crawl the newest 25 pages instead of all 63 at the time of writing this. We don't want to crawl the entire site and ~25 give us about the last year worth of posts.
- Reorder and change the columns to
date
,title
,company
,location
,workplace
,type
,link
, andpage_number
. - Convert the date to
yyyy-mm-dd
format, my ugliest Elixir code so far. - Sort the CSV by date descending to see the latest job first.
- Added sections to make navigation a little easier.
- In the section marked
Sorting the Results
, I left the section that evaluates to** (SyntaxError) nofile:5:1: unexpected token: "" (column 1, code point U+200B)
as U+200B is a zero width space, cleverly hidden in a paste job.
This largely builds on the elixir jobs base to crawl [https://elixir-radar.com/jobs] to create a CSV of jobs with some notable exceptions:
- I hardcoded the page numbers as I'm not sure of the pagination style.
1-6
seems to follow a pattern so far but we can address this later. - There's no date so we sort by page number descending.
- There's a somewhat larger
description
field that we could've pushed to the end.
This builds on the elixir radar jobs base to crawl [https://elixir-companies.com/en/companies] to create a CSV of companies.
- Due to the way the DOM is structured, fields aren't in independent elements. There's text with
<br>
tags that translate to\n
when parsing. - This involved pulling the last 1 or 2 elements from the end of the list as the first element was always one bit of information with the remaining portions covering one or more fields.
- The site feels so different to parse that it almost felt like starting from scratch.
- While Elixir Companies utilizes infinite scroll techniques in the browser to fetch page requests, it follows what I presume is a standard
page=number
query string format that is identical between the 3 sites. To me, these notebooks showcase how quickly I got up and running withspider_man
over other web crawling techniques. I'm a huge fan now.
Using the excellent req
library, we want to get the HTML of the job post url and convert the contents to markdown.
Job Application Fields to Markdown
Using the excellent req
library, we want to get the HTML of the job post application url and convert all form fields to markdown.
A scratch pad for various code doodles.
Odd behavior and awkward things I've run into in my experiences with Livebook. I'm by far no Elixir expert though I am getting more up to speed all the time.