Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high dimensional data and pyhsmm-SLDS? #3

Open
hedgy123 opened this issue Jun 17, 2016 · 4 comments
Open

Very high dimensional data and pyhsmm-SLDS? #3

hedgy123 opened this issue Jun 17, 2016 · 4 comments

Comments

@hedgy123
Copy link

hedgy123 commented Jun 17, 2016

Hi again Matt and Scott,

I absolutely love your SLDS code and your bee example is awesome! I was wondering though, would this work for a very high-dimensional system?

To be specific, consider a dataset consisting of some large number of time series (of order 100) – for example, I have 100 sensors measuring different quantities for the same process. The process likely contains a number of distinct dynamical states. Can I feed this type of data into your pyslds.models.WeakLimitStickyHDPHMMSLDS model and let it pick out the distinct states? If so, could you suggest sensible parameters to start for such a high-dimensional system? I've played around with it for a little bit but it seems to choke, probably because my initial guesses weren't very good.

Assuming the above works, could you suggest the best way to extract in which state the system is at any particular time, or (even better) time boundaries of the distinct states?

Finally, I have a number of inputs that I know can potentially drive the sensor readings. Is there a way to input them into the model (so that it becomes a switching LDS with inputs)?

BTW, I have also been playing with your WeakLimitHDPHSMM, and while it does a fantastic job finding clusters in 2D distributions, I’ve been getting rather strange results with my 100-sensor data. Is it supposed to work for such a large number of dimensions or am I pushing it?

Thank you very much!!

@slinderman
Copy link
Collaborator

Hi @hedgy123, thanks for your feedback! I've been doing a bunch of work that addresses various aspects of your problem. The latest versions of PySLDS support input driven dynamics and input driven observations, at least with Gibbs sampling. Hopefully the examples make it clear how to add inputs to the model.

D_obs = 100 doesn't immediately strike me as a problem, as long as D_latent is reasonable (e.g. less than 10). You're right, however, that initialization could really help. What I typically do (and sorry that this isn't shown in the examples) is first run PCA or factor analysis to get an initial estimate of the latent states and the emission matrix, and use those to warm start the SLDS. Then I run a few iterations of resampling just the discrete states and the dynamics parameters to get everything into a good spot. Can you give that a shot? Let me know if you have any questions!

Scott

@hedgy123
Copy link
Author

That's so awesome! Thanks so much!

I tried it right away but couldn't compile the package + the dependencies: pylds has syntax errors under python 2.7; and pyslds, under 3.5. Specifically, resample_states((data, kwargs)) in pyslds.parallel - tuple unpacking was removed in python 3.

@slinderman
Copy link
Collaborator

Sorry about that @hedgy123 -- I've been doing all my testing in python3.5. Not sure why yours crashed on joblib though. Mine would only give a runtime error if _joblib_resample_states was called. In any case, I checked in some compatibility changes to pylds and pyslds. Can you check if that solves your problem?

@mattjj
Copy link
Owner

mattjj commented Nov 19, 2016

Python 2.7 is best Python. We should set up a simple Travis CI test to catch compatibility issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants