This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Viewing their habitual smoking environments increases smokers’ craving and smoking behaviors in laboratory settings. A deep learning approach can differentiate between habitual smoking versus nonsmoking environments, suggesting that it may be possible to predict environment-associated smoking risk from continuously acquired images of smokers’ daily environments.
In this study, we aim to predict environment-associated risk from continuously acquired images of smokers’ daily environments. We also aim to understand how model performance varies by location type, as reported by participants.
Smokers from Durham, North Carolina and surrounding areas completed ecological momentary assessments both immediately after smoking and at randomly selected times throughout the day for 2 weeks. At each assessment, participants took a picture of their current environment and completed a questionnaire on smoking, craving, and the environmental setting. A convolutional neural network–based model was trained to predict smoking, craving, whether smoking was permitted in the current environment and whether the participant was outside based on images of participants’ daily environments, the time since their last cigarette, and baseline data on daily smoking habits. Prediction performance, quantified using the area under the receiver operating characteristic curve (AUC) and average precision (AP), was assessed for out-of-sample prediction as well as personalized models trained on images from days 1 to 10. The models were optimized for mobile devices and implemented as a smartphone app.
A total of 48 participants completed the study, and 8008 images were acquired. The personalized models were highly effective in predicting smoking risk (AUC=0.827; AP=0.882), craving (AUC=0.837; AP=0.798), whether smoking was permitted in the current environment (AUC=0.932; AP=0.981), and whether the participant was outside (AUC=0.977; AP=0.956). The out-of-sample models were also effective in predicting smoking risk (AUC=0.723; AP=0.785), whether smoking was permitted in the current environment (AUC=0.815; AP=0.937), and whether the participant was outside (AUC=0.949; AP=0.922); however, they were not effective in predicting craving (AUC=0.522; AP=0.427). Omitting image features reduced AUC by over 0.1 when predicting all outcomes except craving. Prediction of smoking was more effective for participants whose self-reported location type was more variable (Spearman
Images of daily environments can be used to effectively predict smoking risk. Model personalization, achieved by incorporating information about daily smoking habits and training on participant-specific images, further improves prediction performance. Environment-associated smoking risk can be assessed in real time on a mobile device and can be incorporated into device-based smoking cessation interventions.
Cigarette smoking is the leading cause of preventable deaths in the United States [
New, mobile device-based cessation interventions have improved 6-month [
However, most smoking cessation interventions neglect an important contributor to smoking behaviors—the smoker’s external environment. A growing body of evidence, collected in both laboratory and real-world settings, suggests that smoking risk is affected not only by internal factors but also by the smoker’s current environmental context. For example, images of personal smoking environments increase smoking behaviors and self-reported craving [
Current technologies can now quantify the effects of environmental factors on real-world smoking behaviors [
In a previous study, we demonstrated that computer vision could distinguish between daily environments where smokers commonly smoke and those where they rarely smoke. Using the approach outlined above, we also uncovered specific objects and settings associated with smoking versus nonsmoking environments [
In this study, we collected a representative sample of images of smokers’ daily environments through photograph-augmented EMA (photoEMA). In each assessment, participants self-reported recent smoking and their current craving level and then took a picture of their environment. A mobile-optimized convolutional neural network was trained to predict smoking risk and other outcomes relevant to smoking (craving, whether smoking was permitted in the current environment, and whether the participant was outside) based on environmental images and other participant-specific features. We hypothesized that out-of-sample prediction would be effective, providing a basis for an environment-aware JITAI, and that prediction performance could be improved through model personalization, in which images from a given participant are used to refine model predictions for that participant. We also aim to understand how model performance varies by location type, as reported by participants. Our final prediction model, QuitEye, was deployed on a mobile device and can assess environment-associated smoking risk and craving in real time to support environment-aware smoking cessation interventions.
Recruitment and all study procedures were approved by the Duke University Health System Institutional Review Board, and written consent was obtained from all participants. Smokers (≥10 cigarettes per day for ≥2 years) aged ≥18 years were recruited from the Durham, North Carolina area. Participants were recruited from the community for a study of smoking behavior via printed and web advertisements and word-of-mouth. Participants were excluded if they regularly used noncigarette tobacco products (eg, e-cigarettes); currently used smoking cessation medications; planned to quit smoking, otherwise altered their smoking pattern, left the study area or anticipated a major life event during the study; had current or recent alcohol or drug abuse problems; or were pregnant, breastfeeding, or planning to become pregnant during the study. Eligible participants completed an initial visit to (1) biochemically verify their smoking status (ie, carbon monoxide breath test) and test for illicit drug use, (2) test for pregnancy, and (3) complete questionnaires on nicotine dependence and tobacco use history. Participants who met all eligibility requirements (n=52) then downloaded the photoEMA app (Metricwire) to their smartphone and were trained on its use. Following the 14-day photoEMA period, participants completed a follow-up visit during which an interview was conducted to assess drug and alcohol use, tobacco purchasing, and any other events that might have affected smoking (eg, illness) or daily living (eg, death in the family) patterns. Participants were compensated for up to US $350 in total, including daily (US $5) and weekly (US $50) incentives for high photoEMA completion. All procedures were observational, and no randomization or intervention was performed.
Participants completed the photoEMA assessments for 14 days.
Participants specified their typical wakeful hours during the screening. They were prompted six times daily at randomly spaced intervals. The average interval between prompts was 120 minutes in duration. At each assessment, participants rated their current levels of urge to smoke (1 item) and affect and stress (11 items; not reported here). In addition, they captured a time-stamped image of their current location. Finally, they were prompted to label the location with a prepopulated list of common locations (eg, bedroom, office, car, or park), other location information (eg, indoors or outdoors and whether smoking was permitted), current activity (eg, working or running errands), social environment (eg, presence of others), and recent alcohol and caffeine use.
Participants were also instructed to complete assessments each time they smoked. They were asked how many cigarettes they smoked in this location on this occasion and all items from the random prompt assessments.
Participants were instructed to delay responding if they were in situations or locations where responding to, or initiating prompts, would be distracting (eg, in a meeting) or dangerous (eg, while driving). Across assessments, participants were asked to compose pictures to avoid including other people but to otherwise leave environments as they are.
Craving data were dichotomized based on the median self-reported craving for all participants. Self-reported craving of a
QuitEye is based on MobileNetV2, a convolutional neural network architecture optimized for mobile devices [
To determine the impact of each of these elements on prediction performance, we conducted ablation studies in which models
QuitEye is a multi-task architecture that jointly predicts four binary outcomes: smoking, craving, whether smoking is permitted, and whether the participant is outside. Prediction of whether the participant is outside was included both to contextualize other performance figures and because inside or outside status is associated with smoking behaviors. Nonimage features were concatenated with image features from the global pool layer of MobileNetV2, and a single hidden neural network layer (rectified linear unit activation) was applied. Nonimage features were again concatenated to the output of this layer, and a second fully connected layer (sigmoid activation) was then used to predict each of the four binary outcomes. The QuitEye architecture is shown in
Diagram of QuitEye, which extracts image features using the MobileNetV2 convolutional neural network, then predicts smoking status, craving, whether smoking is permitted, and whether the participant is outside based on a combination of image features and additional data collected from participants with a mobile device.
QuitEye was trained using Tensorflow v1.15 in Python v3.7 on a single Titan XP GPU. MobileNetV2 parameters were initialized to values learned on ImageNet [
Out-of-sample performance was assessed by training and evaluating the model using nested cross-validation [
Personalized model performance was assessed by developing the model with data from all participants from days 1 to 10, then evaluating it on data from days 11 to 14. Images used in model development were divided at random into training (80%) and validation (20%) sets.
Hyperparameters included the width of the hidden layer (
Additional models were trained using the procedures outlined above to quantify the impact of additional (nonimage) features on performance. Features were categorized as (1) baseline information, including participant demographics and smoking habits; (2) information that could be collected via mobile devices, including the time elapsed since the participant last smoked and the time of day; and (3) a unique participant identifier, which was incorporated as a categorical feature in the personalized models only. Including this identifier adds participant-specific parameters to the model, allowing predictions to be explicitly personalized. However, even when this identifier is omitted, the personalized model development scheme (ie, training on days 1-10 from all participants) allows the model to learn from each participant’s previously visited locations when predicting their current risk.
A nonpersonalized (out-of-sample) model incorporating image features only was implemented in TensorFlow Lite to allow prediction via mobile devices. Other features were omitted so that predictions could be made based on images only without additional data collection. A prototype mobile app was built using Flutter or Dart and tested on Google Pixel 3 (Android). QuitEye is applied to individual frames from a live video feed at a rate of approximately eight samples per second and is configured to display smoking and craving predictions corresponding to each frame.
The data sets analyzed in this study are not publicly available because they contain images of participants’ personal daily environments that cannot be deidentified. However, the code supporting this work is available from the corresponding author upon reasonable request.
Of the 77 individuals screened for the study, 52 (68%) were eligible and consented to participate. Four participants were withdrawn or lost to follow-up, and the remaining 48 participants completed the study. One participant completed their study visits remotely because of in-person visit restrictions related to COVID-19. Among the participants who completed the study, a total of 8008 images were collected, 3648 (45.55%) of which were from completed random prompts and 4360 (54.45%) of which were from completed smoking prompts. Demographic characteristics, image details, and other descriptive statistics are presented in
Demographics and descriptive statistics (N=48).
Characteristics | Values | |||
|
||||
|
|
|||
|
|
Female:male | 32:16 | |
|
|
Female, n (%) | 32 (67) | |
|
|
|||
|
|
Value, median (IQR) | 40.5 (31-49) | |
|
|
Value, range | 19-64 | |
|
|
|||
|
|
White | 31 (65) | |
|
|
Black or African American | 19 (40) | |
|
|
American Indian | 1 (2) | |
|
|
Native Hawaiian or Pacific Islander | 1 (2) | |
|
|
|||
|
|
Not Hispanic or Latino | 46 (96) | |
|
|
Hispanic or Latino | 2 (4) | |
|
||||
|
|
|||
|
|
Value, median (IQR) | 15 (12-20) | |
|
|
Value, range | 7-30 | |
|
|
|||
|
|
Value, median (IQR) | 15 (14-20) | |
|
|
Value, range | 10-30 | |
|
|
|||
|
|
Value, median (IQR) | 6 (4-7) | |
|
|
Value, range | 2-9 | |
|
||||
|
|
|||
|
|
Value, median (IQR) | 163 (117-200) | |
|
|
Value, range | 63-406 | |
|
|
|||
|
|
Value, median (IQR) | 87 (67-132) | |
|
|
Value, range | 25-322 | |
|
|
|||
|
|
Value, median (IQR) | 58 (19-99) | |
|
|
Value, range | 1-210 | |
|
|
|||
|
|
Value, median (IQR) | 122 (96-160) | |
|
|
Value, range | 25-388 | |
|
|
|||
|
|
Value, median (IQR) | 45 (20-78) | |
|
|
Value, range | 3-183 |
Without personalization (out-of-sample performance), QuitEye predicted smoking with AUC=0.723 and average precision (AP)=0.785, craving with AUC=0.522 and AP=0.427, whether smoking was permitted with AUC=0.815 and AP=0.937, and whether the participant was outside with AUC=0.929 and AP=0.922. With personalization, performance was substantially improved: QuitEye predicted smoking with AUC=0.827 and AP=0.882, craving with AUC=0.837 and AP=0.789, whether smoking was permitted with AUC=0.932 and AP=0.981, and whether the participant was outside with AUC=0.977 and AP=0.956 (
Receiver operating characteristic curves for each of the four outcomes for both the nonpersonalized (out-of-sample) and personalized (longitudinal) models. AUC: area under the receiver operating characteristic curve.
Precision recall curves for each of the four outcomes for both the nonpersonalized (out-of-sample) and personalized (longitudinal) models.
Image features were critical to these performance figures for all outcomes except craving. In the nonpersonalized (out-of-sample) models, removing the image features lowered AUC by 0.221 when predicting smoking, by 0.229 when predicting whether smoking was permitted, and by 0.253 when predicting whether the participant was outside but by only 0.027 when predicting craving. In the personalized (longitudinal) models, removing the image features lowered AUC by 0.192 when predicting smoking, by 0.168 when predicting whether smoking was permitted, and by 0.178 when predicting whether the participant was outside but increased AUC by 0.034 when predicting craving (
In the out-of-sample models, baseline information about household smoking locations improved the prediction of craving (ΔAUC=0.050) and whether smoking was permitted (ΔAUC=0.020), but other nonimage features had less impact. Surprisingly, knowing the time since the last cigarette did not improve the prediction of smoking (ΔAUC=−0.014) or craving (ΔAUC=−0.026;
In the personalized models, the participant identifier substantially improved the prediction of craving (ΔAUC=0.070), and baseline information about smoking locations outside of the household slightly improved the prediction of craving (ΔAUC=0.013); however, nonimage features had little effect on performance (ΔAUC<0.007). Similar to the out-of-sample models, performance among individual participants when predicting smoking was highly correlated with performance predicting whether smoking was permitted (r=0.71; Spearman
Analyses of model calibration showed that outcome probabilities predicted by QuitEye were consistent with true outcome rates, except when predicting craving via the out-of-sample model (
Model performance (area under the receiver operating characteristic curve) before and after removal of specific data elements.
Model performance | Area under the receiver operating characteristic curve (Δa) | ||||||||
|
Smoking | Craving | Smoking permitted | Outside | |||||
|
|||||||||
|
Base model (all features) | 0.723 (N/Ab) | 0.522 (N/A) | 0.815 (N/A) | 0.949 (N/A) | ||||
|
Images | 0.502 (−0.221) | 0.495 (−0.027) | 0.586 (−0.229) | 0.696 (−0.253) | ||||
|
Demographics | 0.729 (0.006) | 0.542 (0.021) | 0.810 (−0.005) | |||||
|
Time since last cigarette | 0.548 (0.026) | 0.944 (−0.005) | ||||||
|
Time of day, weekday or weekend | 0.726 (0.003) | 0.806 (−0.008) | 0.945 (−0.004) | |||||
|
Household smoking locations | 0.735 (0.012) | 0.472 (−0.050) | 0.795 (−0.020) | 0.950 (0.001) | ||||
|
Other smoking locations | 0.717 (−0.006) | 0.513 (−0.009) | 0.812 (−0.003) | 0.948 (−0.001) | ||||
|
|||||||||
|
Base model (all features) | 0.827 (N/A) | 0.837 (N/A) | 0.932 (N/A) | |||||
|
Images | 0.635 (−0.192) | 0.871 (0.034) | 0.764 (−0.168) | 0.799 (−0.178) | ||||
|
Demographics | 0.824 (−0.002) | 0.836 (−0.002) | 0.929 (−0.003) | 0.976 (−0.001) | ||||
|
Time since last cigarette | 0.828 (0.002) | 0.840 (0.003) | 0.975 (−0.002) | |||||
|
Time of day, weekday or weekend | 0.929 (−0.003) | 0.975 (−0.002) | ||||||
|
Household smoking locations | 0.826 (0.000) | 0.836 (−0.002) | 0.925 (−0.007) | 0.976 (−0.001) | ||||
|
Other smoking locations | 0.824 (−0.003) | 0.824 (−0.013) | 0.929 (−0.003) | 0.976 (−0.001) | ||||
|
Personal identifier | 0.829 (0.002) | 0.767 (−0.070) | 0.926 (−0.006) | 0.975 (−0.002) |
aChange in the area under the receiver operating characteristic curve compared to the base model.
bN/A: not applicable.
cItalics indicate the best performing model for that outcome.
Calibration curves for the nonpersonalized (out-of-sample; left panel) and personalized (longitudinal; right panel) models. Model-predicted probabilities are aggregated by percentile (N=6 bins), then compared with the proportion of positive outcomes in each bin. Good calibration implies that model predictions are an accurate estimate of the true probability of a positive outcome.
Analyses of model performance by self-reported location type showed that QuitEye is more effective in some locations than others. For the nonpersonalized (out-of-sample) model (
Improvements in performance from personalized training also varied by location (
Smoking prediction was more effective for participants whose self-reported location type was more variable (r=0.48;
Outcomes and model performance by location type (out-of-sample). The bar plots indicate the proportion of positive outcomes (with SE) by self-reported location type, and the line plots indicate model performance (average precision) for images taken in each location. NA: prediction performance is not applicable, because there is no variability in the outcome in this location type.
Outcomes and model performance by location type (personalized). The bar plots indicate the proportion of positive outcomes (with SE) by self-reported location type, and the line plots indicate model performance (average precision) for images taken in each location. NA: prediction performance is not applicable, because there is no variability in the outcome in this location type.
Effect of location variability on performance. Higher performance of smoking risk prediction among individual participants is associated with higher variability in self-reported locations (left panel) and higher mutual information between self-reported location type and smoking status (right panel). AUC: area under the receiver operating characteristic curve.
Screenshots of real-time smoking and craving risk prediction using the QuitEye mobile app are presented in
Mobile implementation of QuitEye. Screenshots of real-time smoking and craving risk prediction via the QuitEye mobile app in a high smoking risk environment (left panel) and a low smoking risk environment (right panel).
A growing body of knowledge suggests that habitual smoking environments promote craving and smoking behaviors. Our previous study demonstrated that computer vision could distinguish between habitual smoking and nonsmoking environments, leading us to hypothesize that real-world smoking risk has important, quantifiable environmental correlates that can be leveraged to predict smoking behaviors more effectively in real time. This study confirms this hypothesis: QuitEye effectively predicted smoking status and associated outcomes across the full range of environments encountered by our sample of smokers in their daily lives. By learning from other smokers’ behaviors, our models can scan a smoker’s environment to predict their current smoking risk. QuitEye also predicted whether smoking was permitted in the current location and whether the smoker was inside or outside, providing important context relevant to smoking cessation interventions. The results show that knowledge of recent smoking and daily smoking habits (eg, time of day) improved these predictions, but it is the images themselves that contributed most to good prediction performance for all outcomes except craving.
Importantly, these results provide additional evidence that the environmental correlates of smoking vary among smokers. Our nonpersonalized models achieved good out-of-sample prediction performance, suggesting that environmental factors are shared among smokers more than they are distinct. However, model personalization led to statistically significant improvements in smoking risk prediction, implying that there are meaningful environmental correlates of smoking behaviors that are specific to individual smokers.
To achieve personalization in this study, smokers had to self-report their smoking behaviors for 10 days while collecting images of their daily environments. These data were then used to refine the prediction model. This process is burdensome but may be particularly important for smokers whose daily environments are atypical or whose smoking behaviors do not follow common patterns. If this is not possible, a lesser degree of personalization can be achieved by asking smokers to provide information about the locations where they commonly smoke. Alternatively, models can be iteratively improved during use, for example in a mobile app, by prompting the user to confirm or deny smoking predictions made by the model.
The ability to predict environment-associated smoking risk in real time unlocks a range of environment-focused smoking cessation interventions. Real-time risk prediction can be used to trigger a JITAI [
However, several other intervention types are possible. For example, QuitEye could be used to identify environmental correlates of smoking risk in a smoker’s daily environment before a quit attempt, allowing them to restructure their environments or daily activity patterns to increase the likelihood of quitting successfully. As QuitEye can predict the smoking risk associated with any image, including images of locations not yet visited, it can help smokers preemptively avoid visiting prosmoking environments. During a quit attempt, for instance, a smoker might choose to visit a restaurant that has a lower smoking risk, as determined based on images available on the internet.
Although images improved the prediction of smoking substantially, they did not improve the prediction of craving. To the contrary, our best-performing craving prediction models do not incorporate image features, and out-of-sample prediction performance for craving was poor. Our laboratory research suggests that habitual smoking environments do provoke craving, but this study’s results do not provide additional support for this finding. Consequently, the role of environmental factors in the emergence of craving, or in the progression from craving to smoking itself, remains unclear. These conflicting findings may be partly owing to our EMA procedure. At the time of smoking, participants were asked to report their craving
As shown in
Although EMA provides more accurate smoking tracking than other self-report methods [
In addition to smoking-initiated prompts, participants completed a total of six system-initiated prompts at randomly selected times throughout the day. More frequent prompts would have provided a more comprehensive sample of participants’ daily environments, but this might have also resulted in reduced EMA adherence. Camera design and image quality varied among participants, who used their own smartphones to take pictures. Variability in image quality can be reduced by acquiring images using wearable cameras or smart glasses. This approach would also allow images to be captured throughout the day, providing complete information about the participants’ daily environments.
This study did not include a quit attempt. The results showed that QuitEye predicts smoking risk effectively outside of a quit attempt, but its ability to predict lapses and relapse after quitting is unknown. Other (nonimage) data streams from mobile devices have been used to predict lapse risk [
Owing to privacy concerns, participants were asked to avoid taking pictures of other people. However, this restriction may have prevented us from identifying interpersonal triggers and other important social determinants of smoking. The ability to recognize these and other dynamic environmental features is an important advantage of our approach compared with other sources of environmental information, such as GPS. In future work, we hope to explore the use of computer vision to identify the social determinants of smoking.
Now that QuitEye has been implemented as a mobile app, we can prospectively evaluate the real-time prediction of environment-associated risk and develop
Images of daily environments can be used to predict smoking risk effectively. Our risk prediction system, QuitEye, also predicts craving, whether smoking is permitted, and whether the participant is outside, providing important contextual information that could inform JITAIs for smoking cessation. Performance can be further improved through personalization, achieved by (1) fine-tuning QuitEye with images of a given smoker’s daily environment or (2) asking participants to provide information about their habitual smoking environments. QuitEye has been optimized for mobile devices and implemented as a mobile app, allowing environment-associated smoking risk to be continuously assessed in a mobile device-based smoking cessation intervention.
average precision
area under the receiver operating characteristic curve
ecological momentary assessment
just-in-time adaptive intervention
photograph-augmented ecological momentary assessment
Funding support for this study was provided by National Institute on Drug Abuse R21DA047131 (FJM) and K23DA042898 (JAO). The authors would like to thank Christianne Carson, Anthony DeVito, and Patricia Sabo for their assistance with data collection and processing, Abhishek Jadhav for assistance with preliminary analyses, and Cynthia Conklin for conceptual support and feedback.
MME, JAO, and FJM developed the concept. MME, JAO, RK, and FJM designed the study. RK and FJM oversaw this study. MME, JAO, JD'A, and FJM designed the analyses. MME and JD'A conducted the analyses. JD'A developed the mobile app. MME drafted the manuscript. All authors have revised the manuscript accordingly.
MME, JAO, and FJM declare that they are authors of a US patent app related to this work and have no competing interests. JD'A and RK declare that they have no competing interests.