Brands that sell their products in brick-and-mortar stores are eager to be sure that all their efforts around promotions of their products are efficient and lead to larger margins. They need to understand which of their competitors’ goods are currently on sale alongside their goods in stores, their price and weight, place on the shelves, and so on.
One of the methods which is an industry standard in retail is capturing images or videos of goods on the shelves and further analyzing the raw data. The key goal of such analysis is to identify the product attributes, extract the info into a database and let the brand make informed decisions based on a scientific approach.
A computer vision-based solution for digital merchandising, like the Eyrene platform, efficiently performs all these tasks. With well-trained and continuously updated neural network models, the platform can effortlessly identify, extract and analyze the data (product attributes) captured in the image.
What is better: a computer or a human brain?
In our industry, there are multiple approaches to image processing. Some vendors recruit and train hundreds of people to manually process raw data received in the course of image recognition. The key task of the staff is to manually mark the products in the images and identify their attributes by sight. The result of their labor is a database containing data suitable for image recognition purposes.
That’s not the case with our solution. The neural networks integrated into the platform are fully based on statistics. Image recognition performed by Eyrene is a fully automated process and runs in real-time.
Should we trust a computer more than a human, or vice versa? For the moment, computer vision is an advanced technology, and in some cases, it can cope with similar tasks better than a human. However, it’s obvious that both a computer and a human can fail.
At Eyrene, we have ideas on how to eliminate human error, and we think that the only way is to properly train the staff. We also understand how to constantly monitor and improve the performance of a computer vision-based solution. Inside the Eyrene platform, we’ve integrated multiple quantitative characteristics that help monitor if the system works right at every stage, identify key issues, and resolve them.
Why accuracy?
In the field of machine learning, there are a number of well-known quality metrics for the task of object detection and classification. Precision, recall, F1 score, confusion matrix, mean average precision all are often used by researchers to evaluate the quality of the solution and indicate the relative frequency of different types of errors. Despite their undoubted usefulness for the analysis and improvement of computer vision systems, all these metrics require special knowledge and professional training for correct interpretation.
We’ve cooperated with multiple brands, and as a result, we’ve gained the insight that customers implementing computer vision-based solutions are comfortable looking at just one metric, which defines if the solution works well and can be trusted, and that’s the image recognition accuracy.
Even though most of the vendors on the market prefer to use this metric, there is a number who suggest applying other metrics. There is no standardized approach to image recognition accuracy on the market.
At Eyrene, we’ve developed a task-specific approach to calculating image recognition accuracy in retail. By definition, image recognition accuracy is a metric determining the performance of a solution designed to extract the attributes of products captured in images and videos, store the raw data in a database, and analyze it.
Accuracy is expressed as a percentage, where:
- 0% means the image recognition solution doesn’t perform at all; it can correctly recognize none of the goods on the shelves.
- 100% means ideal functioning when every product in the image is defined and comprehensive information about any product is available for analysis. In practice, 100% accuracy is unachievable in practice.
How accurate is it accurate enough?
Some customers request a 99% (or even 99.9%) accuracy level. We believe that customers should have a clear understanding of how the technology works, so they can use image recognition-based solutions in a proper manner without unrealistic expectations.
We want to emphasize that expecting a 100% accuracy level is unrealistic and counterproductive. From our experience, the highest level of accuracy retail image recognition can achieve in practice is about 98%. If your vendor promises 100% image recognition accuracy, think twice.
We usually define a certain accuracy level and penalties for failing to achieve it in customer contracts. In most cases, we guarantee a 95% accuracy level, and we use our formula to assess it. Nevertheless, in practice, the average accuracy level is higher and can be up to 98%. An average accuracy level is calculated as a sum of values for a certain period of time. In our solution, we monitor the accuracy level on a daily basis.
A 95% accuracy level is the norm in our industry. This accuracy level is enough for the tasks for which brands need computer vision-based solutions and allows them to calculate employee rewards and monitor the market situation.
What's next?
In Part II, you will learn a simple formula that allows us to quantify the accuracy and get an intuition about how it works.