This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
With a rapidly evolving tobacco retail environment, it is increasingly necessary to understand the point-of-sale (POS) advertising environment as part of tobacco surveillance and control. Advances in machine learning and image processing suggest the ability for more efficient and nuanced data capture than previously available.
The study aims to use machine learning algorithms to discover the presence of tobacco advertising in photographs of tobacco POS advertising and their location in the photograph.
We first collected images of the interiors of tobacco retailers in West Virginia and the District of Columbia during 2016 and 2018. The clearest photographs were selected and used to create a training and test data set. We then used a pretrained image classification network model, Inception V3, to discover the presence of tobacco logos and a unified object detection system, You Only Look Once V3, to identify logo locations.
Our model was successful in identifying the presence of advertising within images, with a classification accuracy of over 75% for 8 of the 42 brands. Discovering the location of logos within a given photograph was more challenging because of the relatively small training data set, resulting in a mean average precision score of 0.72 and an intersection over union score of 0.62.
Our research provides preliminary evidence for a novel methodological approach that tobacco researchers and other public health practitioners can apply in the collection and processing of data for tobacco or other POS surveillance efforts. The resulting surveillance information can inform policy adoption, implementation, and enforcement. Limitations notwithstanding, our analysis shows the promise of using machine learning as part of a suite of tools to understand the tobacco retail environment, make policy recommendations, and design public health interventions at the municipal or other jurisdictional scale.
Tobacco point-of-sale (POS) advertising, consisting of signs, displays, and other promotional materials, is considered a very deliberate and effective marketing strategy [
The regulation of tobacco POS advertising varies substantially across communities and states within the United States [
Over the past decade, 2 technological innovations have advanced sufficiently to offer the possibility of a more efficient, less resource-intensive approach for measuring POS advertising at scale. Specifically, the combination of crowdsourcing and machine learning offers a promising strategy to automate the store auditing process and allow researchers to gain insights into actual advertisements. Machine learning is a general term to describe a collection of related computer tasks, including image classification and object detection—key elements in identifying tobacco brands in photographs of store environments. One advantage of machine learning is that it eliminates the burden of manual review and coding and carries the potential for major cost and time savings for research projects. Although both traditional surveillance activities and crowdsourcing efforts have been used to collect data from tobacco retailers, machine learning is yet to be applied to improve digital photograph extraction [
The primary aim of this study is to assess the feasibility of using machine learning approaches to identify and quantify information accurately in tobacco POS advertising. The proposed approach has 2 components. First, we make use of a large collection of interior photographs of tobacco retailers in West Virginia and Washington, DC. Second, we apply image classification and object detection to identify the brand, number, and size of tobacco advertisements within digital photographs. With respect to image classification, we aim to identify individual tobacco brands in the archive of photographs. We aim to accurately determine the location of brand-specific advertising images in individual photographs by using object detection. By accurately classifying branded images and automatically detecting the location of branded imagery in a retail environment, our approach can provide a practical alternative to in-person POS store audits. In addition, being able to capture enhanced contextual information about the POS environment may provide insights into efforts to combat ongoing efforts from the tobacco industry to use their advertisements to target vulnerable populations.
Camera glasses were used by field staff to collect interior photographs from 410 tobacco retailers across 37 counties in West Virginia from September 2013 to August 2014, the majority of which were in 6 counties with full coverage of all stores. A total of 86,683 digital photographs of tobacco product counters and advertisements were collected. These photographs were then used in the first machine learning task, which attempted to classify specific tobacco brands within the photos. We also collected an additional 13,264 photographs from 82 tobacco retailers in Washington, DC, during the fall of 2018 for the second machine learning task, which focused on identifying the location of advertisements within each photograph.
We manually sorted the photographs into training and validation sets for purposes of training a neural network to classify the retailer photographs by brand. In so doing, we removed the images that contained no advertising or had unclear information, leaving 0.8% (694/86,683) photographs deemed most useful for brand detection analysis between West Virginia and Washington, DC. Although 0.8% (694/86,683) seems low, our cameras took photographs every 1 second, meaning most did not contain any tobacco POS advertising. Once the 694 clearest photographs were selected, they were manually sorted by which brands were contained within them. Finally, we created a training set of 70% (486/694) of the photographs and a testing/validation set of 30% (208/694) of the photographs for classification.
To detect the location of advertisements within each photograph, only photographs with Marlboro advertisements were ultimately used, but from a larger pool of photographs, as not all brands needed to be clear. We decided only to attempt to detect the location of Marlboro advertisements because they were by far the most common brand, giving us the largest amount of data on which to train our model. A sample of the 843 clearest Marlboro photos across Washington, DC, and West Virginia were sorted into training (589/843, 69.9% photographs) and testing sets (254/843, 30.1% photographs).
Once the images in the training and test sets were manually classified, we used the Python TensorFlow and Keras libraries through the Jupyter Notebooks platform to recreate the manual brand classification with machine learning [
Several steps were taken to speed up learning and minimize the computation time owing to the computationally expensive nature of image classification models. First, all processes were executed on a Linux server hosted by Amazon Web Services. Second, we used a pretrained image classification network, Inception V3, which has already been trained to classify millions of labeled images in the
After importing the pretrained Inception V3 classification network, we built a deep learning neural network to classify whether the images contained specific tobacco brands. A deep learning network can recognize features in the images that are associated with a targeted brand, including the colors, shapes, or patterns in a given logo. Specifically, we used TensorFlow to configure our computer graphics processing units before training our neural network using Keras. To prevent overfitting and make the most of the 486 photographs in our training set, we configured several random transformations so that our model would never see the same exact picture twice. We randomly applied zooms, shearing, and horizontal transformations to our training set and processed 50 images per batch at a resolution of 299×299 pixels. Once the neural network was trained, we generated the predicted probabilities that each image contained a brand of interest. In our analysis, an image was classified as containing the logo of the brand if the probability was ≥0.5, but other cutoffs could have been chosen.
Beyond identifying the presence of specific brands, our second goal is to discover their locations in a given photograph to inform their positioning and density in an individual store. Doing so required a different technology than the Inception V3 classification network described earlier, which was designed to discover logos anywhere in an image. We trained YOLO (You Only Look Once) V3, a state-of-the-art, real-time unified object detection system to detect tobacco advertisements within images, as illustrated in
Example image from a tobacco point of sale with YOLO (You Only Look Once) bounding boxes.
As part of our process, we first used the open-source Visual Object Tagging Tool to draw bounding boxes around each Marlboro advertisement within images and generate related annotations for training [
We then evaluated the model accuracy for location detection using testing images based on two metrics: the mean average precision (mAP) and IOU. The mAP is a composite accuracy indicator that ranges from 0 to 1 and accounts for both precision and recall, which is computed as the area under the precision-recall curve. An mAP score of 1 indicates that 100% of the model’s predictions are correct and that 100% of the truth objects are detected by the model. The IOU measures the extent to which the prediction overlaps with the ground truth, which is given by the ratio of the area of intersection and area of union of the predicted bounding box and ground-truth bounding box.
We present our results following our 2 primary research questions, as outlined earlier. First, we describe the success of identifying individual tobacco brands in our set of photographs. Second, we detail how we determined the location of such images in individual stores. We then discuss our findings and their implications in the
Our Inception V3 model fundamentally generated the predicted probabilities of the presence of each brand for each image in the validation set. Although we ultimately decided to focus on the brand Marlboro, we attempted to detect a series of brand logos. Success varied by brand, but our model achieved a classification accuracy of more than 75% for 8 of the 42 brands it was trained to detect (
Marlboro
Newport
Camel
Pall Mall
Pyramid
Maverick
Santa Fe
Winston
Kool
American Spirit
Swisher
Black and Mild
White Owl
Dutch Masters
Winchester
Garcia y Vega
Phillies
Cheyenne
Backwoods
Levi Garrett Plug
Day’s Work
Red Man Plug
Grizzly
Garrett
Skoal
Red Man
Copenhagen
Red Seal
Timberwolf
Kayak
Beechnut
Kodiak
Longhorn
Skoal
General
Blu
FIN
Logic
MARKTEN
NJOY
V2
VUSE
Classification accuracy for validation data by tobacco brands.
Brand | Classification accuracy (%) |
Copenhagen | 90.4 |
Winston | 85.1 |
Pyramid | 82.7 |
Blu | 81.7 |
American Spirit | 80.3 |
Marlboro | 78.8 |
Camel | 75.9 |
Pall Mall | 75.9 |
Predicted probability of logos by Inception V3.
Photograph and brand | Probability | ||
|
|||
|
Marlboro | 0.635 | |
|
Newport | 0.982 | |
|
Camel | 0.661 | |
|
|||
|
Marlboro | 0.993 | |
|
Newport | 0.868 |
Panoramic image of a tobacco point of sale—view 1.
Panoramic image of a tobacco point of sale—view 2.
As described in the
Training loss and testing accuracy of the YOLO (You Only Look Once) V3 objection detector. IOU: intersection over union; mAP: mean average precision.
Prediction of Marlboro signs by YOLO (You Only Look Once) V3.
Bounding boxes of Marlboro signs detected by YOLO (You Only Look Once) V3.
Bounding box in |
Confidence score (%) | Bounding box measures (pixels) | |||
|
|
Left x | Top y | Width | Height |
1 | 100 | 0 | 16 | 737 | 376 |
2 | 100 | 724 | 30 | 1355 | 293 |
3 | 98 | 1149 | 505 | 708 | 163 |
4 | 100 | 1189 | 604 | 648 | 154 |
5 | 100 | 2084 | 0 | 745 | 358 |
Our study provides evidence of the feasibility of a novel methodological approach that tobacco researchers and other public health practitioners can apply in the collection and processing of data for tobacco and other POS surveillance efforts. Trained on a small set of labeled tobacco POS photographs, our classifier and object detector were able to identify tobacco brands and their location and dimension successfully within images. Although the initial labeling of the training data set was time-consuming, with an average of 3 minutes per photograph for staff to label advertisements, the costs of processing the photographic data decreased once the neural networks were trained. We were able to achieve a brand classification accuracy of over 75% for 8 of the 42 brands that our classifier (Inception V3) was trained to detect. Furthermore, we were able to accurately predict the location of branded advertising within the store environment (YOLO V3). Our location predictions achieved an mAP score of 0.72 and an IOU score of 0.62.
Our Inception V3 brand detection model had 2 major characteristics. First, the model was optimized to predict true positives, but with the unintended consequence of predicting more false positives. The model, therefore, tended to decide that a brand was in a photograph when none was present, in the process of finding true positives. Unfortunately, there was no alternative; tuning the model to favor predicting true negatives would have been too conservative (ie, unable to make any prediction on most photographs). Second, the model was heavily dependent on making predictions based on color. For example, in
This promising technology offers potential opportunities for tobacco POS surveillance to move forward. First, conducting timely, long-term surveillance of the POS environment can be resource intensive. Some existing state and local audit systems require a substantial amount of training and resources; however, they do provide critical information for enforcement activities as well as an understanding of the impact of marketing on current tobacco user behaviors and tobacco use initiation [
The use of improved surveillance technologies can also assist in the assessment of local policy impacts in an evolving POS retail environment. With the adoption of local and state restrictions on the sale of flavored tobacco products, jurisdictions have an ever-growing need to assess the impact of these policies on tobacco retail environments [
Finally, this concept of applying machine learning to examine tobacco POS could also be extended to other public health applications. For example, image classification and object detection can be applied to identify other products of interest such as sugary drinks, candy, and processed foods from retail store images. Such an approach would allow public health researchers and policy makers to gauge the prevalence and types of advertisements for each product and understand how a specific retail food environment may interact with population demographics. Given the current attention on obesity, related health outcomes, and efforts to tax sugary drinks, we might anticipate interest in a machine learning–based approach, as illustrated [
Despite the success of applying deep neural networks to tobacco advertisement object detection, our analysis revealed several challenges, especially with respect to object location detection. Although our brand classification algorithms were generally successful (Inception V3) with some false-positive classifications, object location detection (YOLO V3) was more challenging, requiring a large amount of training data to identify patterns. Owing to the relatively small sample size, we decided to limit training our object detection algorithm to Marlboro advertisements, the dominant tobacco brand in our data set. Even when considering one brand within a given POS, we observe substantial variability in advertising content and appearance, including differences in color, shading, perspective, size, and rotation (
One key factor that limited our training data set is that relatively few tobacco POS advertising training images are available to the public, as distinct from generic computer vision applications where large and clean benchmark data sets exist. Collecting primary data on tobacco advertising and creating sufficient labeled training samples are a challenge in performing deep neural network–based object detection owing to time and cost implications. As shown in the literature, primary data collection for POS surveillance poses many challenges related to technology, workforce training, and overall logistics [
Beyond technology, there are also specific practical considerations when implementing such assessments. First, there are privacy and safety concerns associated with the use of camera glasses to collect POS photographs in public, especially when one or more identifying people are incidentally captured in photographs. Many store owners may not be comfortable with individuals using camera phones or handheld cameras to capture POS information. To accommodate this, camera glasses are often used to capture hands-free images in a more inconspicuous way. However, wearing a hidden camera in public may raise legal issues, depending on the location. For example, although capturing public scenes requires no consent in the United States, the opposite exists in Spain [
Finally, data quality must be considered when collecting data for machine learning and related analysis. As we discovered, cameras in glasses may not always capture high-quality or usable images. Photographs may be blurry, overexposed, or missing the desired images. As such, input data quality may be subject to bias and error, which could affect the development of subsequent machine learning algorithms. In addition, because many glasses do not have a remote trigger or a viewing screen, the individual using the glasses may be unaware of the image quality until the photos are uploaded. Such effects could result in a significant reduction in the availability of the training images used for image classification and object detection. However, having multiple overlapping images of any given POS location helps to ensure that all available POS advertising is captured with one or more clear photographs.
Future work on image classification in the tobacco POS retail environment would benefit from exploring additional methods to reduce false positives and false negatives for respective machine learning algorithms. For example, we could take advantage of image repositories that contain tobacco branding to help train models and add additional nuances. Although such repositories would not provide sufficient data to help describe the US retail environment comprehensively, they could assist by providing different geographical contexts or supplemental material in other advertising channels (eg, magazines, newspapers, films, the web, and social media) may still be helpful [
To summarize, our study demonstrated the utility of machine learning for POS assessment and highlighted some existing technological limitations. Although the current machine learning algorithms are advanced, they still have room for improvement. Large-scale POS photographic data sets that are comprehensive enough to capture a wider range of tobacco brands are needed to train multiclass algorithms capable of detecting advertisements from less common tobacco brands. Future studies should also attempt to detect the prices associated with within-store advertisements by incorporating text recognition algorithms to gain more contextual information on tobacco marketing. Attempting to classify an absolute geographic location within stores instead of relative image location within a photograph may also be of interest to researchers and surveillance efforts.
Despite these limitations, our analysis shows the promise of using machine learning as part of a suite of tools to understand the tobacco retail environment and inform public health interventions at multiple scales. The accurate and automatic classification of product brands and detection of their location within a retail environment could assist in developing a practical alternative to in-person POS audits, especially in resource-limited environments. For example, the necessary classifiers—documentation—could be made available in the public domain to facilitate their use by public health departments. In addition, coded photographs could be shared as part of a centralized resource to reduce the level of effort required to conduct or continue similar evaluations. With the increasing sales restrictions at the POS, surveillance products with enhanced contextual information about the retail environment can provide states, counties, and municipalities the opportunity to better understand the impact of existing and proposed policies, including ongoing efforts by the tobacco industry to target potentially vulnerable populations.
intersection over union
mean average precision
point-of-sale
region-based convolutional neural networks
You Only Look Once
This work was supported in part by Truth Initiative and included activities such as data collection, analysis, interpretation of data, and writing the manuscript. In addition, any store photos located in Washington, DC, that were used in this research were collected from a study supported by the National Cancer Institute of the National Institutes of Health under award number R21CA208206. Although AAR is a United States Food and Drug Administration’s Center for Tobacco Products employee, this work was not done as part of his official duties. This publication reflects the views of the author and should not be construed as reflecting the United States Food and Drug Administration’s Center for Tobacco Products’ views or policies.
None declared.