Very high dimensional data and pyhsmm-SLDS? #3

hedgy123 · 2016-06-17T18:07:24Z

Hi again Matt and Scott,

I absolutely love your SLDS code and your bee example is awesome! I was wondering though, would this work for a very high-dimensional system?

To be specific, consider a dataset consisting of some large number of time series (of order 100) – for example, I have 100 sensors measuring different quantities for the same process. The process likely contains a number of distinct dynamical states. Can I feed this type of data into your pyslds.models.WeakLimitStickyHDPHMMSLDS model and let it pick out the distinct states? If so, could you suggest sensible parameters to start for such a high-dimensional system? I've played around with it for a little bit but it seems to choke, probably because my initial guesses weren't very good.

Assuming the above works, could you suggest the best way to extract in which state the system is at any particular time, or (even better) time boundaries of the distinct states?

Finally, I have a number of inputs that I know can potentially drive the sensor readings. Is there a way to input them into the model (so that it becomes a switching LDS with inputs)?

BTW, I have also been playing with your WeakLimitHDPHSMM, and while it does a fantastic job finding clusters in 2D distributions, I’ve been getting rather strange results with my 100-sensor data. Is it supposed to work for such a large number of dimensions or am I pushing it?

Thank you very much!!

slinderman · 2016-11-15T01:26:58Z

Hi @hedgy123, thanks for your feedback! I've been doing a bunch of work that addresses various aspects of your problem. The latest versions of PySLDS support input driven dynamics and input driven observations, at least with Gibbs sampling. Hopefully the examples make it clear how to add inputs to the model.

D_obs = 100 doesn't immediately strike me as a problem, as long as D_latent is reasonable (e.g. less than 10). You're right, however, that initialization could really help. What I typically do (and sorry that this isn't shown in the examples) is first run PCA or factor analysis to get an initial estimate of the latent states and the emission matrix, and use those to warm start the SLDS. Then I run a few iterations of resampling just the discrete states and the dynamics parameters to get everything into a good spot. Can you give that a shot? Let me know if you have any questions!

Scott

hedgy123 · 2016-11-18T14:02:09Z

That's so awesome! Thanks so much!

I tried it right away but couldn't compile the package + the dependencies: pylds has syntax errors under python 2.7; and pyslds, under 3.5. Specifically, resample_states((data, kwargs)) in pyslds.parallel - tuple unpacking was removed in python 3.

slinderman · 2016-11-18T16:45:09Z

Sorry about that @hedgy123 -- I've been doing all my testing in python3.5. Not sure why yours crashed on joblib though. Mine would only give a runtime error if _joblib_resample_states was called. In any case, I checked in some compatibility changes to pylds and pyslds. Can you check if that solves your problem?

mattjj · 2016-11-19T16:46:08Z

Python 2.7 is best Python. We should set up a simple Travis CI test to catch compatibility issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very high dimensional data and pyhsmm-SLDS? #3

Very high dimensional data and pyhsmm-SLDS? #3

hedgy123 commented Jun 17, 2016 •

edited

Loading

slinderman commented Nov 15, 2016

hedgy123 commented Nov 18, 2016

slinderman commented Nov 18, 2016

mattjj commented Nov 19, 2016

Very high dimensional data and pyhsmm-SLDS? #3

Very high dimensional data and pyhsmm-SLDS? #3

Comments

hedgy123 commented Jun 17, 2016 • edited Loading

slinderman commented Nov 15, 2016

hedgy123 commented Nov 18, 2016

slinderman commented Nov 18, 2016

mattjj commented Nov 19, 2016

hedgy123 commented Jun 17, 2016 •

edited

Loading