Real-time Action Detection for Sports

Tensorway developed advanced computer vision models, significantly improving hit detection accuracy in real time through both audio and video-based tracking.

Counting ball hits in real time
Audio and video analysis
x2 more accurate than previous solution
Lightweight models perfect for mobile

Story behind

Our client’s product is a fitness game involving a small ball attached to a headband by an elastic string, designed to improve coordination, reflexes, and cardiovascular fitness. 

Players wear the headband and punch the ball, attempting to keep it in motion without letting it fall, making it a fun and engaging way to exercise.

Client’s mobile app gamifies the experience by offering competitions where users can track the number of ball hits within a specified time frame or aim to achieve a certain number of hits.

Goal

Tensorway’s task was to create an AI model able to count ball hits in real time with maximum accuracy to eliminate manual checks. The client already had a solution to do just that, but it failed to deliver accurate results due to technological limitations.

Challenges

The existing audio-based solution delivered inaccurate results due to background noise, thus requiring manual score verification.

The resulting AI model also needed to be lightweight to work effortlessly on mobile devices.

Tensorway’s solution

Audio classification model

We came up with an approach to use a computer vision-based model to enhance audio classification. By calculating the spectrogram of the audio, which transforms the sound into a visual representation, the computer vision model could classify hits more accurately. Optimizing the algorithms led to a reduction in Mean Absolute Error (MAE) by two times compared to the previous solution.

Transition to action detection using CV

To address the challenge of background noise and cheating inherent to audio models, we switched from audio analysis to a computer vision model for action detection. This model analyzes consecutive video frames to detect hits, requiring sophisticated labeling and dataset cleaning. The optimized lightweight model significantly improved accuracy, effectively overcoming background noise limitations.

Multistep training pipelines

We developed multistep training pipelines for both sound and video-based approaches from scratch. This involved managing data gathering, labeling, and the development of complex audio/image/video labeling setups to ensure the models were trained on high-quality data.

Model deployment and application development

We converted the models for deployment on Android and iOS platforms, building demo applications to showcase the improved hit-counting accuracy in real-time scenarios. This step ensured that the models were not only theoretically sound but also practically applicable.

As a result...

Developed lightweight computer vision models, significantly improving real-time hit-counting accuracy.

Created multistep training pipelines for both sound and video-based approaches from scratch.

Managed data gathering, labeling, and the development of complex audio/image/video labeling setups.

Converted models for deployment on Android and iOS, building demo applications.

The audio model’s accuracy doubled, and the video-based model provided even better precision.

Project team, steps,
and timeline

Team

1 full-time and 1 part-time deep learning engineer, 12 data labelers
01

Initial research with different POCs

1st weeks
02

Audio POC

1 weeks
03

Additional data cleaning and labeling

1 month
04

Audio training pipeline

2 months
05

Models conversion, mobile deployment, robustness improvements

1 month
06

Switch to video approach (POCs)

2 weeks
07

Initial data labeling

1 weeks
08

Video training pipeline and testing different approaches

2 weeks
09

Additional data labeling

2 weeks
10

Demo apps

1 weeks

Total project duration

7 months

Other possible
applications

The Tensorway team explores building a unified model that would combine audio and video approaches for the best performance. Our goal is to overcome limitations such as background noise in audio, video quality issues, and camera positioning to create a solution that delivers robust performance in any condition.

We create state-of-the-art AI solutions for sports & entertainment. Tensorway’s custom models transform how users experience sports.

 Interested in exploring the possibilities?
Reach out to us to discuss more. 

Book a free consultation