This week, I have been working on creating a pipeline that would connect the gesture recognition model, store the recognized gesture and send it to the website with visualisation to trigger some action.
To connect all the parts of the project into a final product, I needed to connect the gesture recognition model to the visualization website. In order to separate the logic for gesture recognition model and the visualization, I create a local server that serves as an intermediary between the gesture model and the user interface (UI) or client application is crucial. The local server plays a crucial role in managing and optimizing gesture recognition systems for interactive data visualizations and immersive analytics. By filtering, processing, and passing gesture data efficiently, it ensures low-latency, smooth, and secure interactions between the model, the user interface, and the end-user, all while maintaining scalability and customization.
In order to create a webserver, I used Python’s library websockets, which allows for bidirectional communication between a client and a server over WebSocket protocol.
The server maintains a set of currently connected clients and a variable to track the most recent gesture received. When a client connects, the handler
function is triggered, which listens for incoming JSON-formatted messages (e.g., {"gesture": "swipe_left"}
). Upon receiving a message, it decodes the JSON, updates the latest_gesture
, and broadcasts this gesture back to all connected clients. If a message cannot be parsed or a client disconnects, appropriate error handling ensures the server continues running smoothly. The main()
function starts the server on ws://localhost:8765
, and asyncio.run(main())
keeps it running indefinitely.

After creating the server, I connected it to the gesture reocgnition model and the visualisation website.
As the webcam captures frames, MediaPipe extracts 3D landmarks from the detected hand. These landmarks are stored in a buffer (landmark_buffer
) with a fixed sequence length of 5 frames. Once the buffer is full, the landmark data is fed into a pre-trained LSTM model to classify a static gesture (e.g., “fist”, “palm”, etc.). The most recent two classified static gestures are stored in gesture_history
. The script then checks if these two gestures match any predefined dynamic gesture pattern (e.g., ["palm", "fist"]
for “swipe_left”). If a match is found and a cooldown period has passed, it logs the dynamic gesture, displays it on the frame, and calls the asynchronous send_gesture()
function. This function creates a WebSocket connection to ws://localhost:8765
, wraps the gesture name in a JSON object (e.g., {"gesture": "swipe_left"}
), and sends it to the server, allowing other clients connected to that server to receive and act upon the detected gesture.

Then, i created a WebSocket client that continuously listens for gesture messages from a server at ws://localhost:8765
, updates a shared latest_gesture
dictionary with the received data, and signals other parts of the program using a threading event (gesture_event
) whenever a new gesture is detected. When a message is received, it is parsed from JSON, printed for debugging, and stored for later use. Optionally, the same gesture is sent back to the server using the send_gesture()
function, which establishes a new WebSocket connection to transmit the gesture as a JSON object. This setup allows gesture data to be shared across different threads or components in a larger interactive system.

Afterwards, I created the Dash callback function that updates the 3D brain visualization’s camera position in response to gesture input, specifically a “swipe_right” gesture used to trigger a zoom-in effect. It runs periodically via an interval component and retrieves the latest gesture from a shared global variable. To avoid excessive updates, it uses a time-based debounce mechanism (gesture_threshold
) and only processes gestures spaced out in time. If the detected gesture is “swipe_right” and zooming is not already in progress, the function slightly adjusts the camera’s x
and y
position to zoom in, ensuring the zoom stays within a defined limit. Once the gesture stops, it resets the zooming state. The updated figure object is then returned to reflect the zoom effect on the visualization.


This is how the pipeline works.
Leave a Reply