This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
While there is high-quality online health information, a lot of recent work has unfortunately highlighted significant issues with the health content on social media platforms (eg, fake news and misinformation), the consequences of which are severe in health care. One solution is to investigate methods that encourage users to post high-quality content.
Incentives have been shown to work in many domains, but until recently, there was no method to provide financial incentives easily on social media for users to generate high-quality content. This study investigates the following question: What effect does the provision of incentives have on the creation of social media health care content?
We analyzed 8328 health-related posts from an incentive-based platform (Steemit) and 1682 health-related posts from a traditional platform (Reddit). Using topic modeling and sentiment analysis–based methods in machine learning, we analyzed these posts across the following 3 dimensions: (1) emotion and language style using the IBM Watson Tone Analyzer service, (2) topic similarity and difference from contrastive topic modeling, and (3) the extent to which posts resemble clickbait. We also conducted a survey using 276 Amazon Mechanical Turk (MTurk) users and asked them to score the quality of Steemit and Reddit posts.
Using the Watson Tone Analyzer in a sample of 2000 posts from Steemit and Reddit, we found that more than double the number of Steemit posts had a confident language style compared with Reddit posts (77 vs 30). Moreover, 50% more Steemit posts had analytical content and 33% less Steemit posts had a tentative language style compared with Reddit posts (619 vs 430 and 416 vs 627, respectively). Furthermore, more than double the number of Steemit posts were considered joyful compared with Reddit posts (435 vs 200), whereas negative posts (eg, sadness, fear, and anger) were 33% less on Steemit than on Reddit (384 vs 569). Contrastive topic discovery showed that only 20% (2/10) of topics were common, and Steemit had more unique topics than Reddit (5 vs 3). Qualitatively, Steemit topics were more informational, while Reddit topics involved discussions, which may explain some of the quantitative differences. Manual labeling marked more Steemit headlines as clickbait than Reddit headlines (66 vs 26), and machine learning model labeling consistently identified a higher percentage of Steemit headlines as clickbait than Reddit headlines. In the survey, MTurk users said that at least 57% of Steemit posts had better quality than Reddit posts, and they were at least 52% more likely to like and comment on Steemit posts than Reddit posts.
It is becoming increasingly important to ensure high-quality health content on social media; therefore, incentive-based social media could be important in the design of next-generation social platforms for health information.
Seeking online health information, also called “interactive health communication” [
As of July 2022, the global social media user base reached 59% of the world’s total population [
While social media could provide high-quality health information [
One of the key questions behind this broader effort is how the implemented incentive mechanism affects the kind of content generated on these platforms. While an increasing body of work in the literature [
While there are no direct comparisons to the work performed in this study, there is growing interest in examining broader issues related to content quality in social media. Social media users have various backgrounds, motivations, opinions, and experience levels. As a result, the quality of user-generated content (eg, posts) on social media varies greatly [
A recent study suggested the use of content labeling in social media to deal with issues, such as misinformation and misleading content, which may impact anything from voting to personal health; however, those who seek to spread misinformation always try to find new tactics, methods, and formats to pursue their goals [
In this study, we identified the following 3 dimensions specific to social media that can be used for such a comparison: (1) contrastive topics, (2) emotion and language style, and (3) whether the content is “clickbait.” Among these, the idea of contrastive topics [
The main objective of this study is to understand if there is any difference in health-related content across social media platforms with and without monetary incentives. For the traditional (no incentive) platform, we used Reddit, and for the incentive-driven platform, we used the blockchain social media platform Steemit. Though the basic structure of Reddit and Steemit is similar (Steemit was originally developed by modeling Reddit), we expect to see some differences in content on these 2 platforms in part due to the incentive mechanism in place. Past research [
We introduce the data sets that we used in our work. SteemOps is a data set [
The main subdataset we used in this paper is the social-network operation data set, consisting of 3 operational keys: comment, vote, and custom-json. The comment operation consists of 5 fields (
The Steemit platform offers an interactive application programming interface (API) for researchers to parse the data. However, just retrieving the full information considering some API restrictions would have taken approximately 38 days in total, so we retrieved a random 10% sample of this data set (approximately 1.7 million new posts) for further analysis. Among the 10% data, we used posts written in English (1,076,287 posts remained) for easier and more consistent comparison and analysis.
Reddit also provides an API for accessing any possible information. Moreover, the Reddit API leverages finding health-related content by giving access to health subreddits. We retrieved health-related content in specific health subreddits with the restriction of getting a certain amount of data in each loop. After several attempts of retrieving posts for each subreddit, we ended up having 10,096 Reddit posts in total. However, we had a lot of reiterative content because of the API restriction. After removing reiterative posts and filtering English content, we had 1682 health-related posts from the Reddit platform for analysis. The following section explains obtaining health-related posts on Reddit and retrieving health-related posts on Steemit.
Schema of the comment operation [
Field name | Description |
block_no | The block recording this operation. |
parent_author | The author that the comment is being submitted to. |
parent_permlink | The specific post that the comment is being submitted to. |
author | The author of the post/comment being submitted (account name). |
permlink | The unique string identifier for the post, which is linked to the author of the post. |
Final data set process on the Steemit platform. API: application programming interface; SOD: social-network operation data set.
Finding health-related keywords that could cover health-related words in social media posts is challenging. Many social media users who write posts in the health category are likely not physicians, and they may, therefore, use incorrect terminology (making formal keywords alone insufficient). On the other hand, some people may use health-related words while not planning to post in the health category. To address this issue, we decided to use the “parent permlink” or “category” of posts, which would be the first tag each author chooses for the post. However, if the first tag is among the Steemit popular tags (a list of popular tags has been provided by Steemit), it remains the same; otherwise, the Steemit platform puts different words as the “parent permlink” or “category” [
Although choosing the appropriate tags is essential to authors as they are rewarded if they do it correctly, many posts are categorized in inappropriate categories. To solve this problem and see which categories are more relevant to health, we counted how many times each “category” repeated in all the English posts, and selected any of them that may be relevant to the health category and that had more than 100 posts within. The second column in
As we can see in
Unlike Steemit, Reddit does not have an incentive system that encourages writers to include a category when they post. However, there is another criterion that functions similar to the Steemit category. A subreddit [
Steemit potential categories with the number of English posts (column 2) and the findings for the Steemit sample set (columns 3-6).
Category | Number of English posts (N=10,239) | Number of sample posts in the category (N=700) | Number of irrelevant posts (N=243) | Number of relevant posts (N=457) | Match percentage |
Healtha | 7078 | 407 | 73 | 334 | 82.06 |
Fitness | 592 | 64 | 40 | 24 | 37.50 |
Fruit | 129 | 12 | 7 | 5 | 41.67 |
Health carea | 145 | 7 | 0 | 7 | 100.00 |
Yogaa | 175 | 19 | 6 | 13 | 68.42 |
Medicinea | 133 | 7 | 2 | 5 | 71.43 |
Meditationa | 135 | 12 | 1 | 11 | 91.67 |
Cancera | 114 | 6 | 0 | 6 | 100.00 |
Healthya | 169 | 11 | 0 | 11 | 100.00 |
Lifestyle | 410 | 49 | 45 | 4 | 8.16 |
Beauty | 264 | 33 | 26 | 7 | 21.21 |
Tips | 110 | 9 | 8 | 1 | 11.11 |
Drugsa | 122 | 14 | 7 | 7 | 50.00 |
Dieta | 115 | 10 | 0 | 10 | 100.00 |
Medicala | 142 | 8 | 3 | 5 | 62.50 |
Energy | 114 | 7 | 6 | 1 | 14.29 |
Vegan | 292 | 25 | 19 | 6 | 24.00 |
aRelevant category based on the match percentage. Overall, the relevant categories had 8328 English posts, 501 sample posts in the category, 92 irrelevant posts, 409 relevant posts, and a match percentage of 81.64%.
Steemit sample set summary.
Variable | Average number of relevant posts | Average number of posts | Average match percentage | Total population, n | Match post estimation |
All categories | 91.4 | 140.0 | 65.29 | 10,239 | 6685 |
Relevant categories | 81.8 | 100.2 | 81.64 | 8328 | 6798 |
Methodology. API: application programming interface.
Language is the means through which thoughts are expressed, and it lies at the heart of human cognition and our ability to comprehend the world around us or, at the very least, to change and exchange that comprehension. The computer study of these comprehensions, feelings, emotions, evaluations, and attitudes regarding things, such as products, services, organizations, persons, issues, events, themes, and their characteristics, is known as sentiment analysis [
A tone analyzer service, such as the IBM Watson Tone Analyzer, detects anger, sadness, fear, and joy as emotions, and analytical, confidence, and tentative aspects as language styles in user inputs via text analysis [
Social media text analysis employs a broad range of approaches or algorithms to process language, one of which is topic analysis, which is used to automatically discover a group of words (ie, a topic) from text. The literature investigates 2 types of topic analysis approaches. The first is topic modeling, which uses unsupervised models to find hidden topics in document collections, such as latent Dirichlet allocation [
The term clickbait refers to using alluring headlines that employ writing formulas and linguistic methods to “bait” readers into clicking items [
Crowdsourcing is the practice of collecting opinions or information from those who engage in a “crowd.” Amazon Mechanical Turk (MTurk) is a well-known crowdsourcing platform that has emerged in the last decade [
All the study data (the secondary data set from Steemit as well as the data from the user survey) were anonymized. The study was conducted under protocols approved by the University of South Florida Institutional Review Board (STUDY003306: “Investing the drivers of currency in blockchain social platforms”) under HRP-502b(7) Social Behavioral Survey Consent. The approval covered the use of the publicly available anonymized secondary data set of Steemit posts as well as the survey of users to evaluate the quality of both Reddit and Steemit posts. No individual-specific data were gathered even in the survey; the only information gathered was about the subjects’ opinions of the content of social media posts shown to them in the survey. The consent form was provided in a downloadable format to participants at the beginning of the survey, and they were allowed to withdraw at any moment. The participants in this survey received a US $1 reward, and participation was fully anonymous.
In our analysis, we randomly selected 2000 posts, 1000 for each platform (Reddit and Steemit). Posts on social media are not cleaned texts as they have misspellings, URLs, emojis, etc. We first cleaned the text using the Python NLTK library to remove stop words, URLs, and any non-English words from the text. Then, we applied stemming and lemmatization to generate standardized words. Each cleaned post was submitted to the Watson API, and then, the document-level tones were stored as a result.
From a language style perspective, as
Watson Tone Analyzer results for emotion aspects.
Emotion aspects | Steemit posts (N=1000), n | Reddit posts (N=1000), n |
Joy | 435 | 200 |
Sadness | 276 | 422 |
Fear | 105 | 125 |
Anger | 3 | 22 |
Watson Tone Analyzer results for language style aspects.
Language style aspects | Steemit posts (N=1000), n | Reddit posts (N=1000), n |
Confident | 77 | 30 |
Analytical | 619 | 430 |
Tentative | 416 | 627 |
Emotion and language style samples.
Platform and post | Emotion type | Language style | |
|
|||
|
I want to share this message along with my greetings and wishes for everyone to you guys. I wish universe, god bless u with peace, love, happiness and wealth. Meditation has changed my life, rewired my brain, I’m happier, loved, fulfilled than ever. I hope every single being who receive this positive frequency, to have a beautiful and fulfilling life, full of love to his/ her existence and to all living beings that share that beautiful universe with us. Peace and love. Namaste. | Joy | Confident |
|
I tried meditation January of this year to lessen my anxiety. I have been constantly meditating since then. But my head is still noisy and I still get pretty anxious. Yesterday there was a lot going on with work and I fell into a deep hole. I was shaking, my chest was tight, my head was aching and rushing with thoughts. I was anxious the whole day. It made me ask myself, how come I am still like this? I was full of judgment. I felt like me meditating is just play pretend. Is meditation not working for me? | Sadness and fear | Analytical and tentative |
|
What can I take that is safeish that will turn my brain off for two days. I want to sleep and dream and not answer my demanding life. Yes, I ne ed a vacation but not at option at this moment. I need a break from thinking. I’m not suicidal in the slightest and I just need to shut down. Thank you. | Anger | Analytical and tentative |
|
|||
|
Thalassemia is a disease of anemia. About 8 to 10 thousand children are born in our country every year due to death of this disease. After a child comes to life after life, it is not seen in children with thalassemia. Dhaka is a life of depression. The dream of a mother with her child, the love of emotions disappears in the moment. It is possible to avoid such a tragic event if you are a little aware. Thalassemia treatment is extremely expensive. It has to continue the treatment throughout life. The permanent treatment...a | Sadness and fear | Analytical and tentative |
|
Turmeric has a strongly anti-inflammatory, anti-bacterial, anti-fungal action and contains antioxidants. It perfectly speeds up the healing and the exchange of the epidermis. It is also known as a remedy for discoloration and excessive pigmentation. How to take advantage of these amazing benefits of turmeric? In the form of a mask, of course :). Ladies in India have been doing this for ages! Making a turmeric mask is very easy - take two tablespoons of turmeric, mix with a bit of honey and buttermilk into...a | Joy | Confident and analytical |
|
A small disclaimer before I begin to rant: this post is from my perspective as I have seen and experimented in my country -Dominican Republic, also I have no intent to speak for every dominican ever, I'm not every dominican and also the flavor of health services I have mostly experimented - private - is different for what the majority uses -public- even though I know enough about public health in my homeland to rant enough about it too. With this covered up let me begin: My father is a very sick and fragile man so that means I've spent a lot of time in hospitals during...a | Sadness and anger | Tentative |
aThe text continues.
The use of ContraVis on Steemit and Reddit document collections allowed us to discover hidden topics while also learning about common and discriminative topics within these collections. We also identified labels, documents, topics, and word clouds (as also done in ContraVis), including the top 20 words in each topic.
This procedure began with the compilation of 1000 posts for each social media platform, followed by removing stop words, stemming, and separating words in these 2000 documents. To create the word clouds, a vocabulary of unique terms and their indices were maintained, and the assembled documents were transformed from words to numbers as input in the ContraVis model. We set the number of topics in the ContraVis model to 10 since we gathered 10 health-related categories (health care, cancer, medication, etc) throughout the data collection process. The model generated coordinates for documents, topics, and labels. It also computed the probability of terms in each topic. As a result, we sorted the probabilities of words in descending order, used indices to match terms in the vocabulary file, and then visualized the word clouds. Furthermore, we have displayed the coordinates of documents, topics, and labels in
As
According to
Thus, the content analysis of posts in this section also supported the conclusion from the previous section (Emotion and Language Style) that users post more informational content on Steemit, whereas Reddit posts are more personal in nature.
Contrastive visualization of Steemit and Reddit posts. The black clouds indicate the topics related to common topics across Steemit and Reddit, the turquoise clouds indicate topics in Steemit posts, and the orange clouds indicate topics in Reddit posts.
Number of posts associated with each topic label.
Topic label | Number of posts | Platform |
1 | 223 | |
2 | 154 | Steemit |
3 | 192 | Steemit |
4 | 37 | Steemit |
5 | 111 | |
6 | 262 | Steemit |
7 | 194 | Common |
8 | 515 | |
9 | 282 | Common |
10 | 30 | Steemit |
Steemit is a cryptocurrency-based social media platform, where users gain Steem dollars for posting content that is valued by others. On the other hand, Reddit is primarily a traditional social media platform, where users mostly do not have any scope of personal economic gain for posts. Thus, we investigated whether Steemit users post more clickbait posts, which can increase user engagement, than Reddit users.
For detecting whether a post is clickbait, we used the following 2 approaches: (1) a manual approach, where clickbait content is identified by experts, and (2) a machine learning approach, where the manual approach is used for training a model and then the model is applied on a large number of posts.
For the machine learning model of clickbait detection, we referred to a previous report [
In a previous report [
In conclusion, according to both manual labeling and clickbait detection model outcomes, Steemit headlines appeared to be more clickbait like than Reddit headlines. However, due to the unavailability of large training data, we could not determine the exact percentage of clickbait data in Reddit and Steemit. Intuitively, we could foresee that the reward-based incentive mechanism in Steemit may have motivated Steemit users to create more clickbait post headlines than Reddit users. However, our analysis does not allow us to draw any causal relationship between incentive mechanisms in social media and the existence of more clickbait post headlines.
Manual labeling results.
Platform | Manual clickbait label, n | Manual nonclickbait label, n | Total, n | Clickbait percentage |
Steemit | 66 | 234 | 300 | 22 |
26 | 274 | 300 | 8.67 |
Clickbait detection model.
Model and platform | Model clickbait detection, n | Total headlines, n | Clickbait percentage | Model accuracy on the test set | |||||
|
|
|
|
96.45% | |||||
|
Steemit | 2132 | 10,263 | 20.77 |
|
||||
|
101 | 576 | 17.53 |
|
|||||
|
|
|
|
~100% | |||||
|
Steemit | 1183 | 10,263 | 11.53 |
|
||||
|
58 | 576 | 10.07 |
|
|||||
|
|
|
|
96.39% | |||||
|
Steemit | 1100 | 10,263 | 10.72 |
|
||||
|
60 | 576 | 10.42 |
|
Clickbait manual labeling samples.
Sample of clickbait headlines | Platform |
How do you behave when you enter a foreign body in the eye? | Steemit |
Early age hair fall cause | Steemit |
Artificial Intelligence Can Predict How Much Longer You Have Left To Live | Steemit |
Why do my eyes hurt during meditation | |
Do you actually need 3 meals a day? | |
Can a false positive urine drug test, in the end, reveal a false negative? |
Nonclickbait manual labeling samples.
Sample of nonclickbait headlines | Platform |
Flax-food or medicine? | Steemit |
Exercise Best For Health | Steemit |
Activated Charcoal for Skin Care | Steemit |
Your diet/healthy eating peeps | |
Liver failure due to cancer | |
Nerve under my knee hurting? |
We conducted an online survey using MTurk to assess information quality in Steemit and Reddit posts. We designed the study so that participants first read the post via a link that brought them to see the post on a third-party website without Steemit or Reddit logos, preventing possible biases in answering questions, and then answered 5 questions (mix of multiple choice and text entry types). This procedure was repeated 5 times for each participant. This means that by the end of the survey, each participant received 5 different posts, with the same 5 questions for each. Moreover, to score posts based on multiple responses, we assigned each post thrice to different participants. In this study, we recruited 276 MTurk employees and assigned 5 posts out of 460 random Steemit and Reddit posts (230 posts from Steemit and 230 posts from Reddit) to each and then asked the following questions after each post:
Compared to typical posts you see on social media, how good is this post in terms of content quality?
If you see this post in your feeds, how likely would you be to like or comment on this post?
If reading this post requires a subscription, would you pay money to subscribe?
Please copy and paste the most important sentence in the post.
Why do you think this sentence is the most important one?
The first 3 questions in the survey indicate the content quality. These 3 questions were multiple choice. The response options were “Good,” “Average,” and “Poor” for the first question; “Extremely likely,” “Neutral,” and “Not likely at all” for the second question; and “Yes” and “No” for the third question. The rest of the questions involved text entry. The purpose of the last 2 questions was to make sure participants read the assigned posts carefully. In the process of analyzing the results, we provided weights to each option (3 to “Good,” 2 to “Average,” and 1 to “Poor” for the first question; and 3 to “Extremely likely,” 2 to “Neutral,” and 1 to “Not likely at all” for the second question) based on the importance. For the final score, we obtained the maximum score of each post, and in case of a tie, we chose the worst option (eg, when a post equally scored “Poor” and “Average,” we chose “Poor”).
We next assessed statistically whether the difference in the number of people who rated “Poor” (for example) in Steemit versus Reddit was significant. We did this for all classifications by MTurk users and tested whether the number of people who picked a certain value (eg, poor, extremely likely to comment, etc) was statistically different across a sample of Reddit and Steemit posts. For each classification, we tested the null hypothesis (
To summarize, regarding the content quality question, the number of people who picked the “Poor” or “Average” option was significantly higher for Reddit posts (meanpoor 0.996, SDpoor 1.055; meanaverage 1.400, SDaverage 0.987) than for Steemit posts (meanpoor 0.400, SDpoor 0.721; meanaverage 1.165, SDaverage 0.948;
Number of posts classified in each option based on the first question results.
Content quality question | Steemit posts (N=230), n | Reddit posts (N=230), n |
Poor | 17 | 67 |
Average | 81 | 106 |
Good | 132 | 57 |
Number of posts classified in each option based on the second question results.
Likelihood to comment on or like posts question | Steemit posts (N=230), n | Reddit posts (N=230), n |
Not likely at all | 55 | 95 |
Neutral | 56 | 65 |
Extremely likely | 119 | 70 |
Subscription probability for Steemit vs Reddit posts.
Subscription probability | Steemit (N=230), n (%) | Reddit (N=230), n (%) |
No | 119 (51.7) | 183 (79.6) |
Yes | 111 (48.3) | 47 (20.4) |
Independent samples
Question and options | Alternative | 95% CI | Cohen |
Power | |||
|
|
|
|
|
|
|
|
|
Poor | −7.067 (458) | 2-sided | <.001 | −0.76 to −0.43 | 0.659 | 1 |
|
Average | −2.602 (458) | 2-sided | .01 | −0.41 to −0.06 | 0.243 | 0.738 |
|
Good | 9.196 (458) | 2-sided | <.001 | 0.65 to 1.01 | 0.858 | 1 |
|
|
|
|
|
|
|
|
|
Not likely at all | −5.178 (458) | 2-sided | <.001 | −0.72 to −0.32 | 0.483 | 0.999 |
|
Neutral | −0.150 (458) | 2-sided | .88 | −0.18 to 0.16 | 0.014 | 0.052 |
|
Extremely likely | 5.851 (458) | 2-sided | <.001 | 0.36 to 0.71 | 0.546 | 1 |
|
|
|
|
|
|
|
|
|
No | −6.558 (458) | 2-sided | <.001 | −0.84 to −0.45 | 0.612 | 1 |
|
Yes | 6.667 (458) | 2-sided | <.001 | 0.46 to 0.84 | 0.622 | 1 |
Integrating the findings across all the results presented above, we found that health-related content on incentive-based social media platforms seemed more informational rather than discussion oriented or personal. Moreover, incentive-based platforms appear to encourage their content providers to post higher-quality content, but with more attention-grabbing headlines.
Summary of the findings.
Dimension | Main result | Conclusion |
Topic modeling |
Only 20% of all topics were common. Steemit topics were more informational. |
Steemit users post more informational content, whereas Reddit posts are more personal in nature. |
Emotion and language style | Emotion: Steemit - Joyful content Reddit - Sad, fearful, and angry content Steemit - Confident and analytical content Reddit - Tentative content |
Because posts are more informative on Steemit, the language styles and emotions are more positive. |
Clickbait |
Steemit headlines were more likely to be clickbait than Reddit headlines. |
The reward-based incentive mechanism may have motivated users to create more clickbait headlines. |
Content quality |
Steemit posts had better quality than Reddit posts. Users were more likely to like, comment on, or subscribe to Steemit posts than Reddit posts. |
Posts from the incentive-driven platform were likely to be seen as having higher quality. |
The main objective of this study was to understand differences in health-related social media content across platforms with and without monetary incentives. Our methodology, as noted above, combined machine learning techniques (topic modeling and sentiment analyses) with human survey results and examined differences across emotion and language style, topic similarity and difference, whether the post was clickbait, and content quality as assessed subjectively by users.
The IBM Watson Tone Analyzer API highlighted important differences in both language style and emotion across the Steemit and Reddit social media platforms. In terms of language style, the Watson Tone Analyzer service identified posts as confident, analytical, or tentative (or a combination if relevant). Using a sample of 2000 posts from Steemit and Reddit, we found that more than double the number of Steemit posts had a confident language style compared with Reddit posts (specifically, 77 posts from Steemit and 30 from Reddit were scored as “confident”). Steemit scored higher again for analytical content, and 50% more Steemit posts were identified as having analytical content (specifically, 619 posts from Steemit and 430 from Reddit were scored as “analytical”). On the other hand, 33% less Steemit posts had a tentative language style (specifically, 416 posts from Steemit and 627 from Reddit were scored as “tentative”). In terms of emotion, the Watson Tone Analyzer service labeled posts as joy, sadness, fear, or anger (or a combination if relevant). When provided with the same sample of 2000 posts from Steemit and Reddit, we found that more than double the number of Steemit posts were scored as having a joyful emotion compared with Reddit posts (specifically, 435 posts from Steemit and 200 from Reddit were scored as “joy”). For the other 3 dimensions, Reddit posts seemed more likely to have such content. Specifically, for sadness, there were 53% more Reddit posts than Steemit posts (422 from Reddit and 276 from Steemit). Moreover, for fear, there were 19% more Reddit posts than Steemit posts (125 from Reddit and 105 from Steemit). Furthermore, for anger, there were 22 posts from Reddit compared to only 3 from Steemit.
Our analysis of similar and different topics using the contrastive topic modeling platform ContraVis showed important differences as well. The use of ContraVis on 1000 randomly selected posts each from the 2 different platforms showed that only 20% of all topics were common (2 common topics out of 10). In particular, topics like “food and nutrition” and “exercise and mental health” were common on both platforms. Steemit had more unique topics than Reddit (5 vs 3), and those were more informational in nature rather than discussion oriented, as was the case for Reddit posts.
All the findings together suggest that posts from the incentive-driven platform were more likely to be informational and optimistic in nature, while posts from the traditional social media platform were likely about individual experiences and the discussions such experiences generate on social media.
When we analyzed these data from a “clickbait” perspective, we found that overall more Steemit posts were likely to be categorized as clickbait compared with Reddit posts, suggesting that incentive-driven platforms may encourage authors to compose content that will seem attractive to users. According to the clickbait findings, manual labeling marked more Steemit headlines as clickbait than Reddit headlines (66 vs 26), and a machine learning model that was trained to detect clickbait also labeled a higher percentage of Steemit headlines as clickbait than Reddit headlines.
Finally, in the user survey, MTurk users said that at least 57% of Steemit posts had better quality than Reddit posts, and MTurk users were at least 52% more likely to like and comment on Steemit posts rather than Reddit posts. These findings suggest that posts from the incentive-driven platform were likely to be seen as being of higher quality, which is an important observation as well.
As incentive-based social media ideas gradually enter the mainstream, it becomes increasingly critical to study how incentive systems built into these platforms influence the type of material created on social media platforms. Could these systems aid in the generation of higher-quality data? As we have seen globally, social media plays a massive part in people’s lives, but it continues to pose numerous information quality issues, not the least of which is the growing worry about fake news in the context of health (eg, vaccination-related content [
While the incentive-based Steemit platform is new, there is growing interest in understanding this better. There has been some early work, for instance, that studied the Steemit platform from the perspectives of decentralization, reward mechanisms, and user behavior. In previous work [
In particular, we did find evidence that the incentive-based mechanism may be leading social media users to provide more informational content, which may also be more diverse and with carefully constructed titles to help generate engagement. In some ways, this partly resembles how the mainstream news media have evolved as the shift to digital platforms forced many of them to present content in a manner that engages users. Unlike some mainstream media, the articles themselves appeared to be more informational, perhaps guided by user expectations that such content may be more likely to generate votes from the community, leading to the potential of greater cryptocurrency rewards. We did not assess causality explicitly in this study, and therefore, we suggest this as a possible explanation but not an established empirical observation yet. Quite interestingly, we did find the tone of messages to be quite positive on Steemit, suggesting that users are not necessarily resorting to fear or other negative emotions to garner engagement.
In comparison, we did find that conventional social media (Reddit) does contain more personal stories and discussions, making this perhaps a better place for users who come for input or support from the community. Reddit has recently introduced its own cryptocurrency (Moon), and our results here should suggest some caution since greater adoption of reward-based schemes may take away the valuable aspect of support communities existing today on platforms such as Reddit. We are starting to see unsurprisingly that incentives do affect user behavior, and greater adoption of this by social media platforms may turn the average social media user into a “citizen journalist” battling for eyeballs and engagement.
While our comparison was more exploratory in nature, rather than guided by specific directional hypotheses, we believe that the systematic comparison performed here is one of the first such studies and therefore represents an important contribution. The findings, as noted above, have significant implications for the intended design of next-generation social media. Platforms can take advantage of reward mechanisms to gain more engagement and high-quality informational content on diverse topics. We do see some of the values that can come from incentive mechanisms, but also see evidence that a greater focus on this may negatively impact the community and the social support–related functions that these media provide.
This study has important limitations. As mentioned before, the platforms may be different in many dimensions, and in this study, we only focused on some important dimensions. However, there are other important aspects, notably misinformation and fake news, that need to be examined across incentive and nonincentive-based platforms in future work. Moreover, there is some information on these platforms that we do not have access to, specifically the network structure of the relationships among users, and consequently, we did not study the differences because of them. Further, the platforms we compared were different in terms of how long users participated. Although Steemit is a new social media platform compared with Reddit, it has been a very active and important platform, as it has 1,643,143 registered accounts, and within the first 45 months of its launch (March 24, 2016), 17,805,355 new posts were published on this platform. Finally, this is an exploratory study and does not provide specific causal interpretations. We hope future work can systematically address some of these limitations to build on this potentially important research direction for researchers.
There are many opportunities for future work, and we highlight a few here. First, extending our exploratory analyses to establish more formal causal links would be necessary for major policy decisions. Second, expanding both the categories and the types of social sites compared (eg, Facebook and Twitter) will make the findings more nuanced. Third, a longitudinal analysis of these platforms to study threads of discussions can present a more thorough comparison as well and is something that can be studied through recent deep learning models. Fourth, examining the other components of incentive-based social media (other than the incentive mechanism) would also be interesting. For example, would the permanency associated with blockchain-based systems affect how users participate in such media? Fifth, examining misinformation and fake news separately in the different platforms to study how they differ could be an important contribution as well.
This study is the first to compare an incentive mechanism–based platform against a traditional platform systematically. We compared health-related posts on 2 social media platforms using machine learning and statistical analysis tools, and found differences in examined dimensions (ie, emotions and language styles, topic similarity and difference, clickbait and nonclickbait headlines, and content quality). Our findings demonstrate that the incentive mechanism was associated with more informational posts on diverse topics, whereas posts from the traditional social media platform were more likely about individual experiences in a discussion format. Our user survey results also showed that posts from the incentive-based platform were of higher quality. It also suggested that users on the incentive-based platform, perhaps because of the rewards, make their headlines more clickbait like to an extent to encourage more engagement.
Social media has radically altered how the world distributes and receives health care information. One example may be the COVID-19 pandemic, which emphasized the value of social media as an influential information (could be misinformation or disinformation) source and demonstrated how it affects care on a variety of levels [
Our theoretical contribution shows that the incentive structure in social media can affect specific characteristics of the content of health care social media posts. The practical implication of our work is that the design of future social media platforms targeted toward health care should explicitly consider developing incentives for users as a mechanism to help content quality. A better internet environment for social networking, collaboration, participation, apomediation, and openness [
application programming interface
Amazon Mechanical Turk
Funding for the Amazon Mechanical Turk survey was provided by the University of South Florida.
The data sets generated during or analyzed during this study and the source codes are available in the GitHub repository [
None declared.