Home

Awesome

:robot: Spot-Sim2Real

Spot-Sim2Real is a modular library for development of Spot for embodied AI tasks (e.g., Language-guided Skill Coordination (LSC), Adaptive Skill Coordination (ASC)) -- configuring Spot robots, controlling sensorimotor skills, and coordinating Large Language Models (LLMs).

:memo: Setup instructions

Please refer to the setup instructions page for information on how to setup the repo. Note that this repo by-default does not track dirty status of submodules, if you're making any intentional changes within the third-party packages be sure to track them separately.

:computer: Connecting to the robot

Computer can be connected to the robot in one of the following modes.

  1. Ethernet (Gives best network speed, but it is cluttery :sad: )
    This mode can be used to create a wired connection with the robot. Useful for teleoperating the robot via computer
  2. Access Point Mode
    This is a wireless mode where robot creates its wifi network. Connect robot to this mode for teleoperating it using controller over long distances. Robot is in Access Point mode if you see a wifi with name like spot-BD-*********** (where * is a number)
  3. Client Mode (Gives 2nd best network speed, we usually prefer this)
    This is a wireless mode where robot is connected to an external wifi network (from a nearby router). Computer should be connected to this same network, wired connection between router and computer will be faster than wireless connection.

Follow the steps from Spot's Network Setup page by Boston Dynamics to connect to the robot.

After setting up spot in correct network configuration, please add its IP inside bashrc

echo "export SPOT_IP=<spot's ip address>" >> ~/.bashrc
source ~/.bashrc

Test and ensure you can ping spot

ping $SPOT_IP

If you get response like this, then you are on right network

(spot_ros) user@linux-machine:~$ ping $SPOT_IP
PING 192.168.1.5 (192.168.1.5) 56(84) bytes of data.
64 bytes from 192.168.1.5: icmp_seq=1 ttl=64 time=8.87 ms
64 bytes from 192.168.1.5: icmp_seq=2 ttl=64 time=7.36 ms

Before starting to run the code, you need to ensure that all ROS env variables are setup properly inside bashrc. Please follow the steps from Setting ROS env variables for proper ROS env var setup.

:desktop_computer: Getting to the repo

Go to the repository

cd /path/to/spot-sim2real/

The code for the demo lies inside the main branch.

# Check your current git branch
git rev-parse --abbrev-ref HEAD

# If you are not in the `main` branch, then checkout to the `main` branch
git checkout main

:light_rail: Try teleoperating the robot using keyboard

:rotating_light: Running Emergency Stop

:musical_keyboard: Running keyboard teleop

:video_game: Instructions to record waypoints (use joystick to move robot around)

:rocket: Running instructions

Running the demo (ASC/LSC/Seq-Experts)

Step1. Run the local launch executable

Step2. Run ROS image visualization

Step3. Reset home in a new terminal

Step4. Emergency stop

Step5. Main demo code in a new terminal

Step6. [Optional] Pick with Pose estimation (uses NVIDIA's FoundationPose Model)

Step7. [Optional] Object detection with tracking (uses Meta's SAM2 Model)

python spot_rl_experiments/spot_rl/utils/tracking_service.py
import rospy
rospy.set_param("enable_tracking", True)

Using Spot Data-logger

All logs will get stored inside data/data_logs directory

Logged keys

The logger will capture spot's data such that each timestamp's log packet is a dict with following keys:

"timestamp" : double, # UTC epoch time from time.time()
"datatime": str # human readable corresponding local time as "YY-MM-DD HH:MM:SS"
"camera_data" : [
                    {
                        "src_info" : str, # this is name of camera source as defined in SpotCamIds
                        "raw_image": np.ndarray, # this is spot's camera data as cv2 (see output of Spot.image_response_to_cv2() for more info)
                        "camera_intrinsics": np.ndarray, # this is 3x3 matrix holding camera intrinsics
                        "base_T_camera": np.ndarray, # this is 4x4 transformation matrix of camera w.r.t base frame of robot
                    },
                    ...
                ],
"vision_T_base": np.ndarray, # this is 4x4 transformation matrix of base frame w.r.t vision frame
"base_pose_xyt": np.ndarray, # this is 3 element array representing x,y,yaw w.r.t home frame
"arm_pose": np.array, # this is a 6 element array representing arm joint states (ordering : sh0, sh1, el0, el1, wr0, wr1)
"is_gripper_holding_item": bool, # whether gripper is holding something or not
"gripper_open_percentage": double, # how much is the gripper open
"gripper_force_in_hand": np.ndarray, # force estimate on end-effector in hand frame

Logging data

The data logger is designed to log the data provided here at whatever rate sensor data becomes available (which depends on network setup).

To run the logger async, simply run the following command in a new terminal

python -m spot_wrapper.data_logger --log_data

This will record data in a while loop, press Ctrl+c to spot the logger. That will save the log file inside data/data_logs/<YY,MM,DD-HH,MM,SS>.pkl file

Warning : This logger will cause motion blur as camera data is logged while the robot moves. Currently we do not support Spot-Record-Go protocol to log

Log replay

It is also possible to replay the logged data (essentially the camera streams that have been logged) using the following command:

python -m spot_wrapper.data_logger --replay="<name_of_log_file>.pkl"

Caution : For replay, the log file SHOULD be a pkl file with the keys provided here

Caution : Please ensure the log file is present inside data/data_logs dir.

:wrench: Call skills (non-blocking) without installing spot-sim2real in your home environment

We provide an function that can call skills in seperate conda environment. And the calling of skill itself is a non-blocking call.

Step1. Follow Running instructions section to setup image client in spot_ros conda environment

Step2. Run skill_executor.py to listen to which skill to use. This will run on the background.

python spot_rl_experiments/spot_rl/utils/skill_executor.py

Step3. Use ROS to use skill in your application. Now you can call skills in non-blocking way.

# In your application, you import rospy for calling which skill to use
import time # Get a timer
import rospy # This is the only package you need to install in your environment
rospy.set_param("skill_name_input", f"{str(time.time())},Navigate,desk") # Call navigation skills to navigate to the desk. This is a non-blocking call.

:eyeglasses: Run Spot-Aria project code

Follow the steps in the project documentation.

:star: Convert pytorch weights to torchscript

To convert pytorch weights to torchscript, please follow Torchscript Conversion Instructions.

:mega: Acknowledgement

We thank Naoki Yokoyama for setting up the foundation of the codebase, and Joanne Truong for polishing the codebase. Spot-Sim2Real is built upon Naoki's codebases: bd_spot_wrapper and spot_rl_experiments , and with new features (LLMs, pytest) and improving robustness.

:writing_hand: Citations

If you find this repository helpful, feel free to cite our papers: Adaptive Skill Coordination (ASC) and Language-guided Skill Coordination (LSC).

@article{yokoyama2023adaptive,
  title={Adaptive Skill Coordination for Robotic Mobile Manipulation},
  author={Yokoyama, Naoki and Clegg, Alexander William and Truong, Joanne and Undersander, Eric  and Yang, Tsung-Yen and Arnaud, Sergio and Ha, Sehoon and Batra, Dhruv and Rai, Akshara},
  journal={arXiv preprint arXiv:2304.00410},
  year={2023}
}

@misc{yang2023adaptive,
    title={LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place},
    author={Yang, Tsung-Yen and Arnaud, Sergio and Shah, Kavit and Yokoyama, Naoki and  Clegg, Alexander William and Truong, Joanne and Undersander, Eric and Maksymets, Oleksandr and Ha, Sehoon and Kalakrishnan, Mrinal and Mottaghi, Roozbeh and Batra, Dhruv and Rai, Akshara},
    howpublished={\url{https://languageguidedskillcoordination.github.io/}}
}

License

Spot-Sim2Real is MIT licensed. See the LICENSE file for details.