Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EN] Parse the data on public transport in Yerevan #29

Open
ansakoy opened this issue Jun 20, 2023 · 2 comments
Open

[EN] Parse the data on public transport in Yerevan #29

ansakoy opened this issue Jun 20, 2023 · 2 comments
Labels
parsing Tasks that require data parsing

Comments

@ansakoy
Copy link
Collaborator

ansakoy commented Jun 20, 2023

Goal

The primary goal is to write a reusable parser to collect the data on public transport routs. It would be also nice to have an example of a resulting dataset for a particular date.

Tasks

This website http://marshrut.info/ presents data on routes and schedules of buses and trolleybuses in Yerevan. Write a reusable parser in your preferred language to grab the data from the website and pack them onto a nice machine-readable structure to store in a format, such as JSON or XML. These two would be preferable, because the data are likely to require a hierarchical structure, for instance:

[
  ...
  {
    "vehicle": STRING,  // specify if it is a bus or a trolleybus
    "number": STRING,  // the route's "number"; in fact, it should be a string in case it's alphanumeric
    "interval": NUMBER, // how often the vehicle arrives
    "measure": STRING, // looks like it's all in minutes, but just in case specify for each entry
    "stops_forward": ARRAY, // make it an array of stop names (strings)
    "stops_backward": ARRAY, // make it an array of stop names (strings) in the reverse order if the route back is the same or store the specific back route stops in this array
  }
  ...
]

This is just an example of a possible structure. If you can think of something more convenient, you're most welcome to implement it.

The key idea of such a parser is to make it as reusable and maintainable as possible. Schedules change quite often, so it would be great to be able to run this script at least on a daily basis to collect the actual data.

It would be also nice of course to have an example output of these data as a dataset for a particular date.

The website is in Armenian only, but in fact its structure is rather clear and simple, so if you don't know the language, it shouldn't be a problem. If you still run into language troubles that you cannot solve even with the help of Google Translate, please don't hesitate to contact us.

Context

The data presented at http://marshrut.info/ have a huge potential. They could be used in very helpful web and mobile apps to build optimal routes and predict arrival times, especially if combined with some spacial data. Unfortunately, they are not published as an API, so the first step to make use of these data is to parse the HTML pages.

Requirements

A public GitHub repository should be created to store and publish the code and possibly the data under one of the free and open licenses, such as Creative Commons or MIT. Please make the code as reusable and maintainable as possible and provide it with some instructions and requirements.

Wishes

It would be best if you also comment your code, so that even beginners can understand what it does.

Resources

http://marshrut.info/

Prepared by

The Open Data Armenia team prepared this task.

@ansakoy ansakoy added the parsing Tasks that require data parsing label Jun 20, 2023
@vlivyur
Copy link

vlivyur commented Nov 14, 2023

But sources of data for the website are pdfs from https://www.yerevan.am/hy/route-network/

@ivbeg
Copy link
Contributor

ivbeg commented Nov 15, 2023

@vlivyur if this original source is more precise, than it would be great to have this data too:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parsing Tasks that require data parsing
Projects
None yet
Development

No branches or pull requests

3 participants