Improve performance for getSamplingFeatureDatasets #144

Elijahwalkerwest · 2018-01-30T19:10:44Z

I have already utilized eager loading for the SQLALchemy query in the endpoint, but due to the nature of the data needed to be gathered, the current structure is incredibly slow as more Samplingfeatures are queried for. Need to make it usable for a query of all SamplingFeatures of a given Type.

horsburgh · 2018-01-30T19:51:54Z

@emiliom and @lsetiawan - this is the issue I was referring to on the phone today. I'm not sure it should be titled "improve performance", but rather may require a rethinking of the current approach.

emiliom · 2018-01-30T19:55:01Z

Thanks @Elijahwalkerwest and @horsburgh

@lsetiawan and I may or may not have time this week (after today) to give input. But if you can point us to the branch and code where @Elijahwalkerwest's current / latest implementation can be seen, that'll be helpful. Again, no promises!

Elijahwalkerwest · 2018-01-30T21:17:02Z

The most recent work should be in the development branch. I've been working on a few iterations to try and resolve this but nothing that has worked thus far, and so isn't up on github yet.

Added by Emilio, for convenience: https://github.com/ODM2/ODM2PythonAPI/blob/development/odm2api/ODM2/services/readService.py#L969

emiliom · 2018-02-01T17:08:27Z

Don and I have taken a quick look. Unfortunately we won't have time to help out on this possibly through next week (we have a hands-on workshop late next week that I'm co-organizing).

In the meantime, two things come to mind:

If you have an equivalent SQL SELECT statement that you've developed and run directly on the ODM2 database, that is reasonably fast (and I assume not directly translatable to SQLAlchemy queries), please share it here.
In general (and based in part on some quick looks from Don), it seems like getSamplingFeatureDatasets tries to get a ton of information all at once, and the outcome is inevitably large. We wonder if some sort of break up into smaller, complementary functions may be necessary.

Elijahwalkerwest · 2018-02-13T01:20:04Z

result = self._session_factory.engine.execute("SELECT * FROM odm2.samplingfeatures as SF
LEFT JOIN odm2.results as R on R.FeatureActionID in (
SELECT FeatureActionID
FROM odm2.featureactions as FA
WHERE FA.SamplingFeatureID = SF.SamplingFeatureID
)
LEFT JOIN odm2.datasets as DS on DS.DatasetID in (
SELECT DatasetID
FROM odm2.datasetsresults as DR
WHERE R.ResultID = DR.ResultID
)
WHERE SF.SamplingFeatureID=sfid")

Got SQLAlchemy to run raw query, this is the query I'm using. This look right to you guys?

Elijahwalkerwest · 2018-03-09T19:50:30Z

Updated version of this SQL query that is currently working, but is VERY SLOW.

Also I'm not sure if it's getting all the data needed. Here is the data that is needed for that endpoint.

DataSetID,
DataSetTitle,
DataSetAbstract,
ResultTypeCV,
SampledMediumCV,
VariableCode,
VariableNameCV,
startDate: minDate,
endDate: maxDate,
siteType: SiteTypeCV,
latitude:samplingFeature.related_features.Latitude,
longitude:samplingFeature.related_features.Longitude
SamplingFeatureCode,
SamplingFeatureName,

result = self._session_factory.engine.execute("SELECT * FROM odm2.samplingfeatures as SF
LEFT JOIN odm2.results as R on R.FeatureActionID in (
SELECT FeatureActionID
FROM odm2.featureactions as FA
WHERE FA.SamplingFeatureID = SF.SamplingFeatureID
)
LEFT JOIN odm2.datasets as DS on DS.DatasetID in (
SELECT DatasetID
FROM odm2.datasetsresults as DR
WHERE R.ResultID = DR.ResultID
)
WHERE SF.SamplingFeatureID in %s",
(
((sf_list),),
)
)

Currently this query is taking ~ 50 seconds PER Sampling feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance for getSamplingFeatureDatasets #144

Improve performance for getSamplingFeatureDatasets #144

Elijahwalkerwest commented Jan 30, 2018

horsburgh commented Jan 30, 2018

emiliom commented Jan 30, 2018

Elijahwalkerwest commented Jan 30, 2018 •

edited by emiliom

Loading

emiliom commented Feb 1, 2018

Elijahwalkerwest commented Feb 13, 2018

Elijahwalkerwest commented Mar 9, 2018

Improve performance for getSamplingFeatureDatasets #144

Improve performance for getSamplingFeatureDatasets #144

Comments

Elijahwalkerwest commented Jan 30, 2018

horsburgh commented Jan 30, 2018

emiliom commented Jan 30, 2018

Elijahwalkerwest commented Jan 30, 2018 • edited by emiliom Loading

emiliom commented Feb 1, 2018

Elijahwalkerwest commented Feb 13, 2018

Elijahwalkerwest commented Mar 9, 2018

Elijahwalkerwest commented Jan 30, 2018 •

edited by emiliom

Loading