Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align video recordings with actions #1

Open
ClashLuke opened this issue Jun 26, 2022 · 5 comments
Open

Align video recordings with actions #1

ClashLuke opened this issue Jun 26, 2022 · 5 comments
Labels
engineering Largely requires software engineering knowledge

Comments

@ClashLuke
Copy link
Member

Unfortunately, we can't take one screenshot for every action, as screenshots take 100ms or more. However, recording an entire screen at 60 FPS (the maximum framerate most modern monitors support) is possible. If we later align those frames with the actions taken during a post-processing step, we arrive at roughly the same output without the massive latency overhead. This way, we can retain the capability of allowing the model to "see" what's on the screen.

@ClashLuke ClashLuke added this to the Proof of Concept milestone Jun 26, 2022
@ClashLuke ClashLuke added the engineering Largely requires software engineering knowledge label Jun 26, 2022
@rokosbasilisk
Copy link

i think this would be solved if we use ttyrec instead of videos.

@ClashLuke
Copy link
Member Author

How would you apply ttyrec on a regular desktop? Can it handle video games such as Overwatch?

@rokosbasilisk
Copy link

rokosbasilisk commented Jun 27, 2022

i did not get the question about "regular desktop" (is it about apps which are not terminal based?)
For most apps it should be possible to scale down the frames and convert to terminal-level graphics and extend ttyrecorder to save them , (ofc, with certain loss of quality).

Also at higher frame rates it would be really hard to collect/align the actions, it would be more important to think of a way to handle them from the model-side (eg, think of how CTC loss aligns characters at each timestep).

@ClashLuke
Copy link
Member Author

No, scaling down the frames is not possible. If you want to try it out, take a screenshot of this page and downscale it by a factor of 2.

@Vbansal21
Copy link

Vbansal21 commented Jun 29, 2022

@ClashLuke I'd recommend to read 'Grandmaster level in StarCraft-II using multi agent reinforcement learning'.
Here's the link

It has all that you'd need, real time inference with visual input using architecture consisting of tranformers, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engineering Largely requires software engineering knowledge
Projects
None yet
Development

No branches or pull requests

3 participants