2012.06.07 Weekly Check in

<demory> hello everyone
<demory> looks like we're mostly here, let's get started
<demory> my check-in: getting ready to finally launch new OTP site; should be live tomorrow. You can still see it in launch-ready form at http://otp.dev.openplans.org/
<demory> Also, upgraded Deployer to 0.7.6 and making some workflow enhancements to help automate the graph-rebuilding process when upgrading existing deployments
<mele> looks great
- grant_h (d819d2ca@gateway/web/freenode/ip.216.25.210.202) has joined #opentripplanner
<demory> thanks!
<FrankP> Working on map tiles here using OSM. OTP-wise, I'm curious about the snapping issue for bike routing (talked briefly yesterday, but not sure if there's a path forward ... for my part, I can file a ticket if needed).
<novalis_> FrankP, a ticket wouldn't hurt
<FrankP> sounds good...will do.
<novalis_> I do intend to look into it, but I have been pulled off onto another project for a bit
<novalis_> I've also fixed a few other bugs over the past week
<FrankP> np. this is medium priority for us (both are Bibi's favorite demo trips for elevation, so she pointed it out)
<novalis_> demory, do we want a link to some of the blog posts we have written about OTP? There's James's post, and mine: http://openplans.org/2012/06/04/b-roll-david-solves-the-plaza-problem-with-help-from-de-berg-and-matt-conway/
<kpw> demory, novalis_dt: i think we should cross post those
<kpw> they're great posts
<kpw> and will be of interest to folks coming to the site
<demory> yeah, i like that idea
<FrankP> novalis_, is that the back of Nick B-S's head in the graphics for the b-roll blog?
<novalis_> Looks like it might be, but I don't recognize that desk
<FrankP> (sorry to focus on such trivialities...my OCD kicking in...but agree good to have a link to the blog roll--didn't see that writeup till just now)
<novalis_> BTW, Nick B-S is now running this thing called Hacker School -- a 3-month program for people to improve their programming . My wife quit her job as a lawyer, went to hacker school, and is now working as a programmer.
<demory> cool!
<demory> about the blog - once the new site's live let's talk about timing of some posts, both new and cross-posts -- I have some others I want to write too
<novalis_> OK, excellent.
<abyrd> I had a protracted fight with Jenkins yesterday and we should now have post-commit build trigger and emails when tests break.
<novalis_> Yay!
<abyrd> Also evaluating some optimizations for the length-constrained path search for the inference engine.
<abyrd> While discussing with him yesterday, on a hunch I asked about the isometric embedding approach I have experimented with in the past. It turns out that it may be equivalent to some of his previous work in finance (!).
<novalis_> Whoa.
<abyrd> Will be looking into that as well, need to reassure myself sure it's not a horribly complicated way to express the ALT algorithm.
<abyrd> Also building and plotting kernel density estimations of OTP response time distributions for the automated profiling framework.
<kpw> nice!
<abyrd> if you haven't seen a violin plot, check it out: http://1.bp.blogspot.com/_lq_qzMa71Ns/SrJSildeV9I/AAAAAAAAAlw/1z16wN46WZc/s400/violin.png
<novalis_> These are what?
<abyrd> I want to display a big row of these with commit ids under them
<novalis_> (other than cool-looking)
<abyrd> estimated probability density of trip planning response time (that's the violin envelope) with quartiles, median, and outliers overlayed.
<kpw> that's really cool-do you have otp data for this yet?
<novalis_> Neat.
<abyrd> yeah I was using it to make sure that transposing the stop time tables did not affect response time
<kpw> that's awesome
<kpw> somethign we can move into jenkins eventually, right?
<kpw> i'd love to see this as part of the build process
<demory> abyrd, how are you capturing the OTP response times?
<demory> that actually gets into something else I wanted to discuss, which is performance monitoring
<abyrd> it would be in the maven build, integration test section, so anytime anyone runs a build it could post a result to the database
<abyrd> or it could be standalone
<novalis_> I would prefer to have tests done on a consistent platform
<novalis_> But it might be nice to be able to run samples across my own machine
<abyrd> and to get a reliable distribution we need to run a ton of trips. we don't want to wait an hour for every maven build or tempt people to -DskipTests
<novalis_> Maybe an overnight build that runs a dozen or so known-good commits so that one could have a baseline for one's own dev work
<abyrd> we could run all the release tags
<novalis_> Yeah, that would be neat
<abyrd> I need to get a repeatable set of endpoints in a database though. Have just been using pseudorandom coordinates with gaussian distribution around the center of the city, and too many are in useless places
<abyrd> I think it maes more sense to just use vertex density as a proxy for population density
<abyrd> I should just make a graphbuilder to spit out a set of coordinates near random streets and save those
<novalis_> abyrd, I actually wrote a k-means clustering algorithm
<novalis_> Which is based on transit density
<novalis_> But we could break it out and generalize it
<novalis_> to handle clusters of street density
<novalis_> Which might be nice.
<abyrd> demory, just sending a request and counting how long until the response comes back. That way it's exercising the API as well.
<novalis_> That also automatically gives us a center/periphery metric
<demory> abyrd, ok
<abyrd> novalis_ saw that for the metadata center response. cool, especially the center-periphery possibility.
<abyrd> but I think to choose the endpoints we can just take the first N street vertices out of a shuffled list.
<novalis_> Yeah, that might work too.
<abyrd> that way we'll get some non-served areas, but much less than in dense transit-servable areas
<abyrd> we want some trips to fail for testing purposes
<novalis_> Yep
<abyrd> demory, what were your other concerns about performance monitoring
<demory> yeah, this gets back to the conversation yesterday about calculating memory requirements -- it would be helpful to have a better idea of the load on the server at any given time
<demory> basically, this old ticket I just dug up (assigned to me actually) -- https://github.com/openplans/OpenTripPlanner/issues/164
<demory> though in addition to capturing the actual request info, I'd also like to have info like processing time & memory usage
<demory> is there a way to calculate how much heap space a single request uses? can we just compare the free memory before and after?
<abyrd> not really because part of that is SPT (which is non-compressible) and part of it is trash states in the eden space that will be tossed out
<demory> ok
<abyrd> I'm not saying no way to measure, just that we can't simply subtract the two values
<abyrd> before/after
<abyrd> this could be another job for a profiling run, varying heap space
<abyrd> in fact I suppose we could do that systematically to each commit we profile
<demory> ok. well with or w/o memory tracking, it would still be helpful to be able to track how many requests are coming in
<abyrd> that seems like something the servlet container might provide
<demory> true
<demory> though logging trip plan requests in general is something we've been talking about for a while
<demory> so i may try to make some progress on #164 over the next week or so, this seems like a good time for it
<demory> and i might be able to re-use some of what kpw did for cibi.me
<abyrd> something as simple as a separate log file with query params (especially origin, destination, mode) request time and response time would be very useful to have
<abyrd> in fact I want to store the same kind of stuff in the profiling database. maybe these are the same problem.
<demory> so you think a log file rather than writing to a database?
<abyrd> what if we have a 'profile' parameter in the trip planning API that will return additional information in the reponse, including computation time. that way network latency does not affect our stats if the profiler happens to be running on a different machine than OTP being profiled.
<abyrd> the component that captures the computation time figure (within the OTP api webapp presumably) could either glue that additional data into the server response, save it to local log file/db or both
<abyrd> so we can get identcal info on "real life" requests and synthetic profiling requests
<novalis_> We just have to make sure the profiled server is not running anything else at the time
<demory> ok so the data would only be captured to a local file/db if the call to the api has profiling enabled?
<abyrd> novalis_, I was planning to run it on the CI server. if we actually triggered it from inside Jenkins with only one build executor, it would prevent any other builds from happening while it was running
<novalis_> OK
<abyrd> demory, I was thinking it would be an application-context switch
<abyrd> to log all incoming requests
<demory> ok
<demory> that makes sense
<novalis_> brb
<abyrd> the api query param would be for a remote user who was intentionally fabricating profiling trips
<abyrd> and wants the info whether or not the server has it enabled
<abyrd> I can't see any harm in reporting the computation time in the api response
<demory> got it
<abyrd> does this seem like a good way to do it to all of you? what other info would we want to log for later analysis?
<abyrd> for profiling and quality control it would be nice to have some kind of high-level summary of the response, to fuzzily observe how many of them change from one version to another
<demory> you mentioned the key ones earlier, though if memory usage is doable that could be helpful too
<demory> though I don't see why we shouldn't log all the trip query params, in addition to start/end/mode
<abyrd> i guess we already have duration, number of tranfers, walk distance, etc. that gives a pretty good indication
<abyrd> of the kind of response
<demory> oh do you envision logging the response as well as the request?
<abyrd> definitely
<abyrd> I want them in a database so i can see when commits cause responses to change
<demory> right
<abyrd> Hmm, i'd really like these two things to merge into one... they seem so related.
<abyrd> I suppose if we assume the OTP being profiled is on the same machine as the profiling request generator that makes it easier.
<abyrd> I wonder if the java database API has a CSV / flat file connector
<demory> you mean to log the request/response info to a flat file?
<abyrd> yeah, in the profiling case we tell it to connect to an existing DBMS and store all the request/result info there
<abyrd> in the everyday use case we switch off result fields and connect it to a text file
<abyrd> how does that sound?
<demory> would we ever want to write directly to a db even in the "everyday use" case?
<abyrd> sure, why not
<demory> i'm thinking of a high-volume production deployment..
<abyrd> then you can just ask the db for some stats on, say, all trips for a particular graph over the last day
<demory> but yeah, having the option of just writing to a log file in that case makes sense
<abyrd> or to single out all trips that took over 2 sec to respond
<abyrd> but if you don't want to mess with a db you just connect the thing to a text file
<abyrd> though it's just about as easy to connect it to an sqlite file
<demory> right
<demory> so all of that would be specified in application-context?
<abyrd> then you can use a standard web database frontend to examine the data
<abyrd> I think you'd just add a logger module and wire it up
<abyrd> turn on/off profiling mode
<abyrd> that's about it I think
<abyrd> yes, in app-context
<demory> ok, sounds good
<demory> so how should we move forward on this? i can take a stab at the logging component, tho prob won't get to it until the weekend or early next week
<demory> i need to wrap up some deployer stuff first
<abyrd> Yes, if you're interested in working on it do take a stab. I see no rush to get it done this week.
<abyrd> Another component would be a result browser
<abyrd> web interface that will let you issue queries and draw you violin plots :)
<demory> right
<abyrd> i think that part can be in Python because there we have scipy to do the kernel estimations and all that
<abyrd> and matplotlib
<abyrd> third component is endpoint generator and categorizer
<abyrd> that would just be run once to get a stable set of trips, but still needs to exist
<demory> OK
<abyrd> and should be in java (graphbuilder module, why not) to have easy access to internals
<demory> well I'll start w/ the basic logging component. have we ever written a db schema for requests/results that could be re-used?
<demory> anything from the nelson-nygaard project?
<abyrd> NN was just CSV output
<demory> ok
<abyrd> I made a simple one a while back but it was just a working draft
<demory> well if you have that handy it could be a good starting point
<demory> not a big deal though.. again, i won't get to this for a couple days anyway
<abyrd> https://github.com/abyrd/OTPProfiler/blob/master/resultsdb.py
<demory> great, thanks
<abyrd> although looking at it now it's near useless,
<abyrd> was more of a brainstorm than anything
<demory> yeah
<demory> i guess i'd want to rewrite this in java anyway
<demory> if we'd being doing this from within OTP
<abyrd> There is the issue of where the database would be initialized
<abyrd> can just be an SQL script right?
<demory> yeah, would it be up to the user to init the db before starting OTP?
<abyrd> the alternative is setting up otp with rights to create/drop tables on your dbms which is probably not better
<abyrd> in the case that it's a flat file output we would only be writing so I assume there is no setup
<abyrd> only people doing serious deployments would have to mess with initializing a database
<demory> that makes sense
<abyrd> though it would be pretty cool if it would set up an sqlite db for you
<abyrd> meaning OTP would create the tables if the file did not exist
<demory> well could we have an sql script that could either be used by an admin to set up the db themselves, or by OTP for sqlite?
<abyrd> i think so. we could shoot for really generic SQL that doesn't use any vendor-specific extensions
<abyrd> just a table schema
<demory> right
<abyrd> though we would want to index some columns
<abyrd> the browser could do that
<abyrd> no, i guess the browser should only have read perms
- bdiu has quit (Remote host closed the connection)
<abyrd> OK I need to run. Will save the chat for reference. I guess the result browser and pseudorandom endpoint generator are my parts of this.
<demory> yeah me too
<demory> i'll post this as an extended part of the check in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2012.06.07 Weekly Check in

Clone this wiki locally