Use Id<Stop> instead of Arc<Stop>: Id as string #126

antoine-de · 2022-03-23T16:54:20Z

Same as #123 but without generational_indexes.

I think the indexes are really great for performance since:

all structures are then Vector (great random access and cache friendly iterations)
the ids are short (integer vs string)

I don't think we need those performance in this crate though. Having only typed index (a think wrapper over the real gtfs identifier) would be more convenient I think.
It would:

ease debug (easy to print the gtfs identifier, instead of having a meaningless integer)
ease serialisation (same, we can serialize the string right away)
be a more more close to the gtfs representation

both approached would need benchmark/real use case to see if they are worth it.

Note: I quite like this approach.

If we want to go there, there is still work to:

change all gtfs access to support getter by ID instead of by &str.
rework the readme
better doc

Note: One canvas of the approach is that this would enforce ALL ids to be valid. I think this could be quite a good thing, but this will break the reading of many datasets out there.

Same as #123 but without generational_indexes. I think the indexes are really great for performance since: * all structures are then `Vector` (great random access and cache friendly iterations) * the ids are short (integer vs string) I don't think we need those performance in this crate though. Having only typed index (a think wrapper over the real gtfs identifier) would be more convenient I think. It would: * ease debug (easy to print the gtfs identifier, instead of having a meaningless integer) * ease serialisation (same, we can serialize the string right away) * be a more more close to the gtfs representation This is still *very* early WIP, I'm not convinced at all by the ergonomics, I'd like to keep the property of the Index in #123 that the `Id` have at least existed at one point (even if I don't plan to ensure this if an object is deleted).

This way and Id must be lookup before being created

antoine-de · 2022-05-03T19:51:20Z

I found that I could decrease drastically the number of String creation by implementing Borrow<str>, now I find the ergonomics not that bad

antoine-de · 2022-05-15T14:41:46Z

I think the API is quite nice now. I updated an example to show how we can use the stop_id now:

    // you can access a stop by a &str
    let _ = gtfs.get_stop_by_raw_id("stop1")?;

    let trip = gtfs.trips.get("trip1")?;
    let stop_id: &gtfs_structures::Id<gtfs_structures::Stop> =
        &trip.stop_times.first()?.stop;

    // or with a typed id if you have one

    // if no stops have been removed from the gtfs, you can safely access the stops by it's id
    let s = &gtfs.stops[stop_id];
    println!("stop name: {}", &s.name);

    // if some removal have been done, you can also you those method to get an Option<Stop>
    let s = gtfs.get_stop(stop_id)?;
    println!("stop description: {}", &s.description);

    // or you can access it via `stops.get`
    let s = gtfs.stops.get(stop_id)?;
    println!("stop location type: {:?}", &s.location_type);

    let mut s = gtfs.stops.get_mut(stop_id)?;
    s.code = Some("code".into());

antoine-de · 2022-10-06T11:35:35Z

Note: One canvas of the approach is that this would enforce ALL ids to be valid. I think this could be quite a good thing, but this will break the reading of many datasets out there.

RawGTFS reading would still be working, but not GTFS anymore.

Do anybody have some thoughts about this?

fchabouis · 2022-10-06T13:42:17Z

Does it mean that if an ID is invalid, the gtfs validator would output a fatal error ?
I would not like it, because all other errors and warnings would be hidden by this fatal error. I prefer an error on Ids to be a simple error.

antoine-de · 2022-10-09T11:45:20Z

hum, there it's a library to read a GTFS. the change of id would however lead to GTFS not loading (since the ID<Stop> cannot be invalid). It would still be possible to read RawGTFS though, since they would still have raw String Ids.

For transport.data.gouv.fr's validator usecase, you'd keep all the warnings/error on the rawGTFS, but loose the error needing the GTFS (and that's a lot 😱 https://github.com/etalab/transport-validator/blob/9a04e6/src/validate.rs#L65-L78)

I agree it's a pity since it would make a lot of dataset unreadable, but at the same time foreign keys are quite important, and I don't feel good about having Option<Id<T>> (or bake an invalid state into the ID) everywhere to express the fact that they can be invalid.

Any ideas / thoughts on this?

fchabouis · 2023-02-02T16:01:25Z

In our case, the business requirements are to make the validator "tolerant" (see this comment ) so that we get as many validation information as possible, and not just a single fatal error...

antoine-de marked this pull request as draft March 23, 2022 16:55

antoine-de added 3 commits March 23, 2022 23:04

try to encapsulate all behind a collection

3e175a8

This way and Id must be lookup before being created

Implement borrow and remove lots of to_owned() 🎉

974b9a7

implements AsRef<str> for easier composition

5bf0b27

implements various iterators

cec259d

antoine-de force-pushed the id_map branch from aebe2c1 to b1fb907 Compare May 15, 2022 14:35

antoine-de added 5 commits May 15, 2022 16:39

implements index on collection

7e146eb

rename methods

ae424e1

add a mutable getter on stops

d07f310

add a complete example on stops handling

227cc5e

remove useless comment

21da328

antoine-de force-pushed the id_map branch from c66d26d to 21da328 Compare May 15, 2022 14:42

antoine-de mentioned this pull request Oct 6, 2022

routes.agency_id: add check etalab/transport-validator#146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Id<Stop> instead of Arc<Stop>: Id as string #126

Use Id<Stop> instead of Arc<Stop>: Id as string #126

antoine-de commented Mar 23, 2022 •

edited

Loading

antoine-de commented May 3, 2022

antoine-de commented May 15, 2022

antoine-de commented Oct 6, 2022

fchabouis commented Oct 6, 2022

antoine-de commented Oct 9, 2022

fchabouis commented Feb 2, 2023

Use Id<Stop> instead of Arc<Stop>: Id as string #126

Are you sure you want to change the base?

Use Id<Stop> instead of Arc<Stop>: Id as string #126

Conversation

antoine-de commented Mar 23, 2022 • edited Loading

antoine-de commented May 3, 2022

antoine-de commented May 15, 2022

antoine-de commented Oct 6, 2022

fchabouis commented Oct 6, 2022

antoine-de commented Oct 9, 2022

fchabouis commented Feb 2, 2023

antoine-de commented Mar 23, 2022 •

edited

Loading