This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
South Korea has the lowest fertility rate in the world despite considerable governmental efforts to boost it. Increasing the fertility rate and achieving the desired outcomes of any implemented policies requires reliable data on the ongoing trends in fertility and preparations for the future based on these trends.
The aims of this study were to (1) develop a determinants-of-fertility ontology with terminology for collecting and analyzing social media data; (2) determine the description logics, content coverage, and structural and representational layers of the ontology; and (3) use the ontology to detect future signals of fertility issues.
An ontology was developed using the Ontology Development 101 methodology. The domain and scope of the ontology were defined by compiling a list of competency questions. The terms were collected from Korean government reports, Korea’s Basic Plan for Low Fertility and Aging Society, a national survey about marriage and childbirth, and social media postings on fertility issues. The classes and their hierarchy were defined using a top-down approach based on an ecological model. The internal structure of classes was defined using the entity-attribute-value model. The description logics of the ontology were evaluated using Protégé (version 5.5.0), and the content coverage was evaluated by comparing concepts extracted from social media posts with the list of ontology classes. The structural and representational layers of the ontology were evaluated by experts. Social media data were collected from 183 online channels between January 1, 2011, and June 30, 2015. To detect future signals of fertility issues, 2 classes of the ontology, the socioeconomic and cultural environment, and public policy, were identified as keywords. A keyword issue map was constructed, and the defined keywords were mapped to identify future signals. R software (version 3.5.2) was used to mine for future signals.
A determinants-of-fertility ontology comprised 236 classes and terminology comprised 1464 synonyms of the 236 classes. Concept classes in the ontology were found to be coherently and consistently defined. The ontology included more than 90% of the concepts that appeared in social media posts on fertility policies. Average scores for all of the criteria for structural and representations layers exceeded 4 on a 5-point scale. Violence and abuse (socioeconomic and cultural factor) and flexible working arrangement (fertility policy) were weak signals, suggesting that they could increase rapidly in the future.
The determinants-of-fertility ontology developed in this study can be used as a framework for collecting and analyzing social media data on fertility issues and detecting future signals of fertility issues. The future signals identified in this study will be useful for policy makers who are developing policy responses to low fertility.
South Korea has the lowest fertility rate in the world. According to the Organization for Economic Cooperation and Development (OECD), the total fertility rate (TFR) in South Korea peaked in 1970 at 4.53 and subsequently declined to 1.30 in 2001 [
In an attempt to increase the TFR, in 2005 the Korean government enacted the Basic Law on Low Fertility and Aging Society, and the Ministry of Health and Welfare in collaboration with other government agencies established 5-year plans. The First Basic Plan for Low Fertility and Aging Society (2006-2010) was initiated to establish a foundation for the government from which to proactively respond to the low fertility and aging population. The second and third of these plans (2011-2020) were pursued with the aim of increasing the TFR and successfully responding to the increasingly aging society [
Governments around the world are increasingly seeking ways to detect future signals of policy implications so they can respond to the various challenges that countries face in a timely and effectively manner [
One approach to predicting future signals is to harness intuitive judgment by experts; however, this is both time-consuming and costly [
Social media data are written in various forms and are both unstructured and noisy [
An ontology expresses shared concepts and their relationships in a specific domain [
This study aimed to (1) develop an ontology with terminology for collecting and analyzing social media data on the determinants of fertility, (2) determine the description logics (DL), content coverage, and structural and representational layers of the ontology, and (3) use the ontology with terminology to detect future signals of fertility issues in social data posted in Korean.
An ontology for describing the determinants of fertility, called the determinants-of-fertility ontology, was developed based on the Ontology Development 101 methodology [
The aim of the determinants-of-fertility ontology developed in this study was to analyze social media data posted by consumers, not by health care professionals. Thus, we limited the scope of the ontology to the individual, social, economic, cultural, and policy factors of fertility in the domain of the consumer. The physiological, clinical, and therapeutic factors of fertility in the domain of health care professionals were excluded. The specific domain and scope of this ontology was determined by creating competency questions (CQs) that the ontology must be able to answer [
We identified existing ontologies and conceptual frameworks representing fertility by searching PubMed, Google Scholar, and BioPortal [
We extracted terms from the literature that were consistent with the domain and scope of the ontology. The literature reviewed included reports on fertility, determinants of fertility, low fertility, and policy responses to low fertility published by the OECD [
The classes of the ontology and their hierarchy were defined using a top-down approach. The superclasses of the ontology and their relationships were constructed by integrating an ecological model [
The internal structure of the ontology classes was defined by adding the properties of the classes, the value of the properties, and the value type using the entity-attribute-value (EAV) model. Entities refer to the concepts covered in the determinants of fertility, attributes are characteristics of entities, and value sets comprise the set of values that an attribute can have. Attributes and values were extracted from the questionnaires of the Korean National Survey on Dynamics of Marriage and Fertility [
The available methods for evaluating the quality of an ontology include those proposed by Brank et al [
We tested the DL of the ontology by applying the ontology debugger Protégé plug-in. We also tested the DL using the DL-reasoner Protégé plug-in to determine whether the ontology generates the correct answers to the previously developed CQs. For example, the CQ “What are the personal factors that influence a women’s decision to have a child?” was converted to a DL query “IsIndividualOf some Determinants_of_fertility.” After entering this query into Protégé, we tested whether the answers to the CQ were correct. Since the determinant of the fertility class (domain) was related to the subclasses of individual (range) through the
The content coverage of the ontology was examined by comparing terms extracted from the bulletin board of the Korean Ministry of Health and Welfare with a list of classes and synonyms of the ontology. Both the general public and public servants are allowed to post their opinions or concerns on fertility issues and policies regarding low fertility on this bulletin board. In total, 1387 documents posted on the website by the general public and 63 posted by public servants were collected. Relevant terms in the documents were extracted using the Korean Natural Language Processing package in R software (version 3.2.1, R Foundation for Statistical Computing). Unique concepts were extracted based on the meaning of the terms and then mapped onto the ontology classes. The mapping results were reviewed by 3 experts in health informatics who had experience in ontology development [
The structural and representational layers of the ontology were evaluated by 3 experts in health informatics who had previous experience in ontology design and 2 experts in maternity nursing who had previous experience in ontology evaluation. The evaluation tool developed by Jung et al [
The ontology with terminology was used to detect future signals of fertility issues from social media data. Future signals were analyzed based on the text-mining–based weak-signal detection method of Yoon [
We collected posts on fertility issues written in Korean from the following 183 online channels between January 1, 2011, and June 30, 2015: 159 channels of online news, 17 message boards, 1 social networking service (Twitter), 4 internet blogs, and 2 online community services. “Low fertility” was used as a major search keyword, together with synonyms of “fertility rate decline,” “sharp decline in fertility rate,” “avoiding childbirth,” “no kids,” and “childless family.” Social media data were collected using the SK telecom’s big-data analytics platform [
After extracting terms from each document, we identified the terms related to fertility issues such as socioeconomic and cultural factors and fertility policies. The future signals of fertility issues were detected using the keywords representing socioeconomic and cultural factors and fertility policies. The keywords that were semantically similar but expressed using different terms [
Socioeconomic and cultural factors:
Population aging
Economic problems
Nuclearization of the family
Changing perspectives about marriage
Conservative values
Violence and abuse
Employment problems
Gender inequality
Fertility policies:
Financial support for childbirth
Child-safety protection system
Infrastructure for childcare support
Maternity-leave system
Policy public relations
Financial support for employment security
Flexible working arrangement
Family-friendly work environment
Smart work center
Future signals (also defined as weak signals) show abnormal patterns due to current oddities [
The DoD is the growth rate of the term occurrence expressed as a time-weighted coefficient and is also important for detecting future signals. The DoD represents how the diffusion of a term across different documents varies over time. Since the recent appearance of a term is more important than its past appearance, the DoD puts more weight on recent occurrences:
where DF
The KIM was generated by plotting the average DF on the x-axis and the average growth rate of the DoD on the y-axis. The quadrants of the plot were divided by the medians of the respective values, and so each quadrant of the KIM represented different information about present and future keywords.
Future signals were identified according to where keywords were located in the quadrants of the KIM. Keywords in the first quadrant, which represent strong signals, have a trend toward a high average DF and a high average DoD growth rate. Keywords in the second quadrant, which represent weak signals, have a low average DF but a high average DoD growth rate, and so they may increase rapidly in the future. Keywords in the third quadrant, which represent latent signals, have a low average DF and a low average DoD growth rate and are not yet significantly noticeable. Keywords in the fourth quadrant, which represent not-strong-but-well-known signals, have a high average DF but a low average DoD growth rate, and so currently exhibit a slow growth rate.
A list of 10 CQs was compiled (
What are the personal factors that influence a woman’s decision to have a child?
What are the family factors that influence the decision to have a child?
What are the childcare factors that influence the decision to have a child?
What are the educational factors that influence the decision to have a child?
What are the workplace factors that influence the decision to have a child?
What are the sociocultural factors that influence the decision to have a child?
What are the economic factors that influence the decision to have a child?
What is the Korean government’s policy for overcoming low fertility?
What are the policy tasks for addressing low fertility in South Korea?
What are the policy targets for low fertility in South Korea?
In total, 1659 terms covering the domain and scope of the ontology were collected, and 236 unique class concepts were extracted from these terms. We defined hierarchical and attribute relationships of the classes based on the ecological model. The determinants of fertility were organized into the following levels: individual, family, workplace, childcare and educational environment, socioeconomic and cultural environment, and public policy. These 6 levels of the ontology were defined by adding not only the workplace, but also childcare and educational environment to institutional factors, which constitute the third level of the ecological model. Due to the increasing participation of women in the labor market, the workplace and childcare and educational environment are important factors influencing decisions about childbirth among women who are working [
The determinants-of-fertility ontology based on an ecological model.
We developed EAV models for the 139 lowest level class concepts. For example,
The Protégé ontology debugger program revealed that concept classes in the ontology were coherently and consistently defined. The DL reasoner showed that the ontology correctly answered all 10 CQs.
The content coverage of the ontology is presented in
Results for the content coverage of the ontology.
Category | General public, n (%) | Public servants, n (%) | Total, n (%) |
Existing concepts | 416 (92.0) | 93 (97.9) | 494 (92.9) |
New concepts | 36 (8.0) | 2 (2.1) | 38 (7.1) |
Total | 452 (100) | 95 (100) | 532 (100) |
Average scores for all of the criteria for structural and representations layers exceeded 4 on a 5-point scale. The experts rated the hierarchy breadth, density, overall complexity, and connectivity criteria as strongly agree (score 5). The criterion with the lowest score was accuracy of the representation layers, with a score of 4.33 (
Results for the structural and representational layers of the ontology.
Criteria | Average score (range) | |
|
||
|
Size | 4.80 (4-5) |
|
Hierarchy depth | 4.60 (4-5) |
|
Hierarchy breadth | 5.00 (5-5) |
|
Density | 5.00 (5-5) |
|
Balance | 4.60 (4-5) |
|
Overall complexity | 5.00 (5-5) |
|
Connectivity | 5.00 (5-5) |
|
||
|
Match between formal and cognitive semantics | 4.73 (4-5) |
|
Consistency | 4.50 (4-5) |
|
Clarity | 4.87 (4-5) |
|
Explicitness | 4.60 (3-5) |
|
Interpretability | 4.67 (4-5) |
|
Accuracy | 4.33 (4-5) |
|
Comprehensiveness | 4.77 (4-5) |
|
Granularity | 4.47 (3-5) |
|
Relevance | 4.83 (4-5) |
|
Description | 4.83 (4-5) |
Degree of diffusion (DoD), average DoD growth rate, and average document frequency for fertility issues.
Category and keyword | DoDa | Average DoD growth rate | Average DFb | |||||||||||
|
2011 | 2012 | 2013 | 2014 | 2015 |
|
|
|||||||
|
||||||||||||||
|
Population aging | 7463 | 8912 | 8002 | 4499 | 4503 | 0.088 | 6676 | ||||||
|
Economic problems | 1637 | 2054 | 2523 | 1471 | 1503 | 0.214 | 1838 | ||||||
|
Nuclearization of the family | 1178 | 1288 | 1229 | 667 | 628 | 0.054 | 998 | ||||||
|
Changing perspectives about marriage | 1046 | 1528 | 1116 | 596 | 484 | 0.034 | 954 | ||||||
|
Conservative values | 1150 | 1195 | 1139 | 576 | 565 | 0.036 | 925 | ||||||
|
Violence and abuse | 685 | 800 | 726 | 461 | 527 | 0.158 | 640 | ||||||
|
Employment problems | 515 | 510 | 436 | 306 | 208 | –0.008 | 395 | ||||||
|
Gender inequality | 286 | 298 | 319 | 190 | 133 | 0.027 | 245 | ||||||
|
||||||||||||||
|
Financial support for childbirth | 3061 | 3145 | 2548 | 1573 | 1250 | –0.015 | 2315 | ||||||
|
Child-safety protection system | 1757 | 1632 | 1974 | 1241 | 1292 | 0.156 | 1579 | ||||||
|
Infrastructure for childcare support | 1853 | 2209 | 1310 | 829 | 579 | –0.061 | 1356 | ||||||
|
Maternity-leave system | 1067 | 995 | 798 | 828 | 383 | 0.044 | 814 | ||||||
|
Policy public relations | 878 | 883 | 894 | 648 | 361 | 0.015 | 733 | ||||||
|
Financial support for employment security | 392 | 341 | 345 | 233 | 196 | 0.045 | 301 | ||||||
|
Flexible working arrangement | 330 | 264 | 354 | 287 | 180 | 0.120 | 283 | ||||||
|
Family-friendly work environment | 161 | 130 | 77 | 49 | 35 | –0.146 | 90 | ||||||
|
Smart work center | 131 | 114 | 90 | 50 | 27 | –0.161 | 82 |
aDoD: degree of diffusion.
bDF: document frequency.
Future signal classification using the keyword issue map of fertility issues. Red rectangle (area A) indicates weak signals and blue rectangle (area B) indicates strong signals.
Future signal classification of fertility-issues keywords.
Category and weak signals | Strong signals | Latent signals | Not-strong-but-well-known signals | |
|
||||
|
Violence and abuse | Economic problems | Gender inequality | Conservative values |
|
—a | Nuclearization of the family | Employment problems | — |
|
— | Population aging | Changing perspectives about marriage | — |
|
||||
|
Flexible working arrangement | Child-safety protection system | Policy public relations | Financial support for childbirth |
|
Financial support for employment security | Maternity-leave system | Family-friendly work environment | Infrastructure for childcare support |
|
— | — | Smart work center | — |
aNot applicable.
We have developed a determinants-of-fertility ontology as a framework for collecting and analyzing social media data. The ontology was evaluated in terms of the DL, content coverage, and structural and representational layers. We applied the ontology with terminology to detect future signals of fertility issues from social media data.
The developed determinants-of-fertility ontology has 6 main characteristics. First, it is the first ontology to describe the multilevel factors that affect fertility. Various factors and the complex interactions between them determine fertility [
Second, this ontology contains factors related to fertility issues that are unique to South Korea. In most cases, childbirth does not occur until after marriage in South Korea, and delaying marriage is an important factor affecting the decision to have a child [
Third, the developed ontology includes terminology with synonyms for classes such as consumer terms and abbreviations, which makes it suitable for analyzing social media data. For example, regarding
Fourth, each class of this ontology was modeled using the EAV model and included the attributes of each class and the values of those attributes. Like previous research [
Fifth, we ensured quality of the ontology by using a variety of evaluation methods, including the application-based, data-driven, and user-based approaches proposed by Brank et al [
Finally, the ontology with terminology developed in this study was used as a framework to detect future signals of fertility issues from social media data. The ontology allowed us to use social media data to identify the current trends and future changes in fertility issues related to the effective implementation of policies to increase the fertility rate. These trends were
A
A
The determinants-of-fertility ontology developed in this study comprehensively covers fertility issues relevant to the low fertility phenomenon in South Korea and will be useful for analyzing social media data. However, it is also subject to several limitations.
First, the direct and indirect effects of employment stability, job creation, housing supply, and public education on fertility [
Second, the synonyms of the ontology developed in this study may not include all the terms used by the general public on social media. Many of the terms used on social media are highly transient—rapidly appearing, spreading, and then disappearing [
Third, future signals of fertility issues were detected during the second phases of the policy on low fertility. Low fertility is a demographic issue that requires a long-term approach, and the policy responses of the government should be periodically reviewed and evaluated to ensure that the policies in place at a particular time point are consistent with any changes in the population and the socioeconomic and cultural environment [
Finally, only the KIM that uses the DF of keywords was used to detect future signals. Since future signals are generally subjective [
A determinants-of-fertility ontology was developed in this study that comprised 6 superclasses, 230 subclasses, and 41 relationships with terminology that comprised 1464 synonyms for the 236 classes. Class concepts of the ontology were included as an EAV model and contained synonyms of the ontology classes such as consumer terms and abbreviations. The ontology can be used to analyze social media data on fertility issues. The DL, content coverage, and structural and representational layers of the ontology were evaluated. The ontology and its terminology were used to detect future signals of fertility issues in South Korea. Our novel determinants-of-fertility ontology provides a framework for collecting and analyzing social media data toward understanding which socioeconomic and cultural factors and fertility policies should be focused on in the future. The analysis of future signals revealed that
competency question
document frequency
description logics
degree of diffusion
entity-attribute-value
keyword issues map
Organization for Economic Cooperation and Development
total fertility rate
This work was supported by grant NRF-2015R1A2A2A01008207 from the National Research Foundation of Korea funded by the Korean government.
None declared.