diff --git a/docs/assets/images/DNNArchitecture.png b/docs/assets/images/DNNArchitecture.png deleted file mode 100644 index 124cad6..0000000 Binary files a/docs/assets/images/DNNArchitecture.png and /dev/null differ diff --git a/docs/assets/images/FirstFlights.png b/docs/assets/images/FirstFlights.png deleted file mode 100644 index a3c88ae..0000000 Binary files a/docs/assets/images/FirstFlights.png and /dev/null differ diff --git a/docs/assets/images/LongTrajectory_v2.png b/docs/assets/images/LongTrajectory_v2.png deleted file mode 100644 index 95eb528..0000000 Binary files a/docs/assets/images/LongTrajectory_v2.png and /dev/null differ diff --git a/docs/assets/images/Novel.png b/docs/assets/images/Novel.png deleted file mode 100644 index 8754e70..0000000 Binary files a/docs/assets/images/Novel.png and /dev/null differ diff --git a/docs/assets/images/PipelineDiagram_v3.png b/docs/assets/images/PipelineDiagram_v3.png deleted file mode 100644 index 4c8a977..0000000 Binary files a/docs/assets/images/PipelineDiagram_v3.png and /dev/null differ diff --git a/docs/assets/images/Robustness_v2.png b/docs/assets/images/Robustness_v2.png deleted file mode 100644 index 20f1365..0000000 Binary files a/docs/assets/images/Robustness_v2.png and /dev/null differ diff --git a/docs/assets/images/Samples_v3.png b/docs/assets/images/Samples_v3.png deleted file mode 100644 index 6e41130..0000000 Binary files a/docs/assets/images/Samples_v3.png and /dev/null differ diff --git a/docs/index.html b/docs/index.html index 257b5c2..e930b78 100644 --- a/docs/index.html +++ b/docs/index.html @@ -1,218 +1,316 @@ - + - - - Research Project: [Project Title] - - + + + + + SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum + + + + + + + + + + + + + + + + + + + + -
-
-

SOUS VIDE

-

Scene Optimized Understanding via Synthesized Visual Inertial Data from Experts

+ + + +
+
+
+
+
+

SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

+
+ + JunEn Low, + + Maximilian Adang, + + Javier Yu, + + + Keiko Nagami, + + + Mac Schwager, + +
+ +
+ Stanford University +
+ +
+
+
+
+
-
-

Abstract

-

We propose a new simulator, training approach, and - policy architecture, collectively called SOUS VIDE, for end-to- - end visual drone navigation. Our trained policies exhibit zero- - shot sim-to-real transfer with robust real-world performance - using only on-board perception and computation. Our simulator, - called FiGS, couples a computationally simple drone dynamics - model with a high visual fidelity Gaussian Splatting scene re- - construction. FiGS can quickly simulate drone flights producing - photo-realistic images at over 100 fps. We use FiGS to collect - 100k-300k observation-action pairs from an expert MPC with - privileged state and dynamics information, randomized over - dynamics parameters and spatial disturbances. We then distill - this expert MPC into an end-to-end visuomotor policy with a - lightweight neural architecture, called SV-Net. SV-Net processes - color image and IMU data streams into low-level body rate and - thrust commands at 20Hz onboard a drone. Crucially, SV-Net - includes a Rapid Motor Adaptation (RMA) module that adapts - at runtime to variations in the dynamics parameters of the drone. - In extensive hardware experiments, we show SOUS VIDE polices - to be robust to ±30% mass and thrust variations, 40 m/s wind - gusts, 60% changes in ambient brightness, shifting or removing - objects from the scene, and people moving aggressively through - the drone’s visual field. The project page and code can be found

-
+
+
+
+ +

+ SOUS VIDE creates end-to-end, zero-shot visual drone navigation policies that are robust to scene changes. +

+
+
+
-
-

Key Contributions

-
    -
  • Flying in Gaussian Splats (FiGS):A simulator that couples a GSplat scene model with a lightweight drone dynamics model to yield photorealistic visual flight data.
  • -
  • Expert Demonstration Data:We use an MPC expert to generate behavior cloning data in FiGS with randomized dynamics parameters and positional disturbances.
  • -
  • SV-Net:We introduce a lightweight policy architecture that takes image and IMU data to infer thrust and body rate control actions. The policy uses an RMA module to adapt online to varying flight conditions.
  • -
+
+
+ +
+
+

Abstract

+
+

+ We propose a new simulator, training approach, and policy architecture, + collectively called SOUS VIDE, for end-toend visual drone navigation. + Our trained policies exhibit zeroshot sim-to-real transfer with robust + real-world performance using only on-board perception and computation. + Our simulator, called FiGS, couples a computationally simple drone dynamics + model with a high visual fidelity Gaussian Splatting scene reconstruction. + FiGS can quickly simulate drone flights producing photorealistic images + at up to 130 fps. We use FiGS to collect 100k300k observation-action pairs + from an expert MPC with privileged state and dynamics information, randomized + over dynamics parameters and spatial disturbances. We then distill this + expert MPC into an end-to-end visuomotor policy with a lightweight neural + architecture, called SV-Net. SV-Net processes color image, optical flow + and IMU data streams into low-level body rate and thrust commands at 20Hz + onboard a drone. Crucially, SV-Net includes a Rapid Motor Adaptation (RMA) + module that adapts at runtime to variations in drone dynamics. In a campaign + of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% + mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting + or removing objects from the scene, and people moving aggressively through + the drone’s visual field. Code, data, and videos can be found in the links above. +

+ SOUS VIDE Pipeline.
+
+
+ -
-

Getting Started

-

The codebase accompanying this research is available on this repository. To replicate our experiments, follow these steps:

-
    -
  1. Clone the repository:
  2. -
    - git clone https://github.com/username/repository-name.git -
    -
  3. Install dependencies:
  4. -
    - pip install -r requirements.txt -
    -
  5. Run the experiments:
  6. -
    - python run_experiments.py -
    -
+ +
+
+

Video

+
+
+
+
+ +
+
-
-

Results

-

This work introduces SOUS VIDE, a novel training - paradigm leveraging Gaussian Splatting and lightweight vi- - suomotor policy architectures for end-to-end drone navigation. - By coupling high-fidelity visual data synthesis with online - adaptation mechanisms, SOUS VIDE achieves zero-shot sim- - to-real transfer, demonstrating remarkable robustness to varia- - tions in mass, thrust, lighting, and dynamic scene changes. Our - experiments underscore the policy’s ability to generalize across - diverse scenarios, including complex and extended trajectories, - with graceful degradation under extreme conditions. Notably, - the integration of a streamlined adaptation module enabled the - policy to overcome limitations of prior visuomotor approaches, - offering a computationally efficient yet effective solution for - addressing model inaccuracies. - These findings highlight the potential of SOUS VIDE as - a foundation for future advancements in autonomous drone - navigation. While its robustness and versatility are evident, - challenges such as inconsistent performance in multi-objective - tasks suggest opportunities for improvement through more - sophisticated objective encodings. Further exploration into - scaling the approach to more complex environments and in- - corporating additional sensory modalities could enhance both - adaptability and reliability. Ultimately, this work paves the - way for deploying learned visuomotor policies in real-world - applications, bridging the gap between simulation and practical - autonomy in drone operations.

-
-
-

Publication

-

This research is detailed in our paper titled "[Paper Title]", published at [Conference/Journal Name].

-

Read the full paper here.

-
+
+
-
-

Acknowledgments

-

This work was supported in part by DARPA grant HR001120C0107, ONR grant N00014-23-1-2354, and Lincoln Labs grant 7000603941. The second author was supported on an NDSEG fellowhsip. Toyota Research Institute provided funds to support this work.

+
+ + +
+
+

FiGS

+

+ Flying in Gaussian Splats (FiGS) is our lightweight simulator that renders images from a + Gaussian Splat along the trajectory solution of a simplified 9-dimensional drone dynamics + model to produce visual and state data. To produce this data, users need only provide a + short video recording of the scene with a single Aruco tag placed within it. +

+ Example data generation from FiGS. +
+
+ +
+
+

SV-Net

+

+ We train a visuomotor navigation policy using our SV-Net architecture, detailed below. + Notably, the architecture incorporates a history network to perform a variant of Rapid + Motor Adaptation (RMA), effectively addressing variations in drone dynamics between the + real world and the simulation environment used to generate the training data. +

+ Example data generation from FiGS.
+
-
-

License

-

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.

+
+ +
+
+ +
+
+ +
+
+

Videos

+
+
+
+
+ +
+
-