Published on in Vol 21, No 3 (2019): March

Preprints (earlier versions) of this paper are available at, first published .
Automated Analysis of Domestic Violence Police Reports to Explore Abuse Types and Victim Injuries: Text Mining Study

Automated Analysis of Domestic Violence Police Reports to Explore Abuse Types and Victim Injuries: Text Mining Study

Automated Analysis of Domestic Violence Police Reports to Explore Abuse Types and Victim Injuries: Text Mining Study

Original Paper

1The Kirby Institute, Faculty of Medicine, The University of New South Wales, Sydney, Australia

2Neuropsychiatry Service, Hunter New England Health, Newcastle, Australia

3School of Psychiatry, The University of New South Wales, Sydney, Australia

4Centre for Big Data Research in Health, The University of New South Wales, Sydney, Australia

5School of Computer Science, The University of Manchester, Manchester, United Kingdom

Corresponding Author:

George Karystianis, BSc, MSc, PhD

The Kirby Institute

Faculty of Medicine

The University of New South Wales

Level 6, Wallace Wurth Building

High Street, Kensington NSW

Sydney, 2052


Phone: 61 (2) 9385 0900


Background: The police attend numerous domestic violence events each year, recording details of these events as both structured (coded) data and unstructured free-text narratives. Abuse types (including physical, psychological, emotional, and financial) conducted by persons of interest (POIs) along with any injuries sustained by victims are typically recorded in long descriptive narratives.

Objective: We aimed to determine if an automated text mining method could identify abuse types and any injuries sustained by domestic violence victims in narratives contained in a large police dataset from the New South Wales Police Force.

Methods: We used a training set of 200 recorded domestic violence events to design a knowledge-driven approach based on syntactical patterns in the text and then applied this approach to a large set of police reports.

Results: Testing our approach on an evaluation set of 100 domestic violence events provided precision values of 90.2% and 85.0% for abuse type and victim injuries, respectively. In a set of 492,393 domestic violence reports, we found 71.32% (351,178) of events with mentions of the abuse type(s) and more than one-third (177,117 events; 35.97%) contained victim injuries. “Emotional/verbal abuse” (33.46%; 117,488) was the most common abuse type, followed by “punching” (86,322 events; 24.58%) and “property damage” (22.27%; 78,203 events). “Bruising” was the most common form of injury sustained (51,455 events; 29.03%), with “cut/abrasion” (28.93%; 51,284 events) and “red marks/signs” (23.71%; 42,038 events) ranking second and third, respectively.

Conclusions: The results suggest that text mining can automatically extract information from police-recorded domestic violence events that can support further public health research into domestic violence, such as examining the relationship of abuse types with victim injuries and of gender and abuse types with risk escalation for victims of domestic violence. Potential also exists for this extracted information to be linked to information on the mental health status.

J Med Internet Res 2019;21(3):e13067




Domestic violence is a global social and public health phenomenon with important health consequences that affect thousands of lives each year [1-3]. It can be defined as “any incident of threatening behavior, violence (or psychological, physical, sexual, financial, emotional) abuse between adults who are or have been an intimate partner or family member, regardless of gender or sexuality” [4-6]. However, domestic violence can also occur in other relationship structures such as between a caregiver and a dependent person, including a child, or those living together in a household but not in an intimate relationship [4,5]. A multicountry violence study conducted by the World Health Organization estimates a prevalence of 15%-71% in physical and sexual partner violence toward women [1,3]. In Australia, in 2018, one of six women and one of 16 men experienced physical or sexual violence by a current or previous partner [7]. Domestic violence has various forms—from physical to emotional and verbal abuse. The type of abuse received and perpetrated may vary by gender, with each type bearing short- and long-term (physical and mental) health consequences for the victims [8-11]. Domestic violence bears a significant economic cost: Within Australia alone, the cost of violence against women was around Aus $22.2 billion in 2015-2016 [2,3,12].

The New South Wales Police Force (NSWPF) recorded 123,330 domestic violence–related events in 2017 in WebCOPS (Web Computerised Operational Policing System), a Web-based interface for the COPS, which enables the police to capture and analyze crime information on an organization-wide basis [13]. WebCOPS contains detailed information about domestic violence events as both structured fields (date of birth, Aboriginal status, whether weapons were used, etc) and free unstructured text called “event narratives.” An event can contain more than one text narrative describing, in detail, alleged incident(s) that occurred between the person of interest (POI) and the victim, information regarding the circumstances of the event, and any action(s) taken by the police. Narratives are frequently written without a specific structure, featuring various misspellings, typographical and grammatical errors, and (sometimes informal) acronyms and abbreviations that can have different meanings depending on the context [13].

Domestic violence event narratives contain a wealth of important information regarding injuries and abuse types, which is not found in the medical records unless medical attention is sought, although even attainment of medical attention may not be flagged as related to domestic violence. However, the volume of the recorded data along with the associated long unstructured narratives makes it difficult to identify potentially meaningful information through traditional ethnographic/qualitative research methods involving eyeballing the records. One research paper recently commented that “...there is no systematic way to extract information from these [police] narratives other than by manual review” [14].

Prior Work

There is a need for methods that can automatically extract information of interest from large volumes of data in a short time. Text mining has been used for more than 30 years to harvest information from unstructured text in many fields, particularly in biomedicine [15-20]. Recent efforts have sought to text mine crime-related information from online media publications [21-23], with limited attempts to process police reports [13,24-28]. Previous work extracted data on the names, narcotic drugs, and weapons with varying degrees of success (F1-score ranging from 46% to 81%) through named entity extraction [24,25] and police report classification of events as domestic violence or nondomestic violence related, using an unsupervised clustering technique that correctly classified 44% of the reports set aside for manual inspection [26]. Other efforts included recognition of crime-related information (such as drugs, weapons, and facial features) from witness narratives through dictionaries and rules, with F1-scores ranging from 82% to 93% [27,28]. Recently, Karystianis et al applied a rule-based approach combined with manually crafted dictionaries to extract mentions of mental illnesses for POIs and victims from police text narratives of recorded domestic violence events with an average F1-score of 84% [13].


In this paper, we investigate whether the application of a text mining method can automatically extract abuse types (conducted by POIs) and sustained victim injuries from a large-scale corpus of 492,393 domestic violence events.


We used a corpus of 492,393 domestic violence events provided to the researchers by the NSWPF, occurring from January 2005 to December 2016 [13]. The domestic violence events were flagged in WebCOPS as “domestic violence related,” the description of violence was coded as “domestic,” and the relationship between the victim and the POI included any of the following: “spouse/partner” (including ex-spouse/ex-partner), “boyfriend/girlfriend” (including ex-boyfriend/ex-girlfriend), “parent/guardian” (including step/foster), “child” (including step/foster), “sibling,” “other member of family” (including kin), or “carer.” These events covered the following categories: various types of assaults; breaches of Apprehended Violence Orders; homicides; malicious damage to property; and offense against another person such as intimidation, kidnapping, abduction, and harassment. These data included only events with recorded physical assaults and any cases with stalking, sexual assault, and young POIs were not included.

Permission to access the narratives was granted by the NSWPF following ethics approval from the University of New South Wales Human Research Ethics Committee (Ref: HC16558). Due to the inclusion of sensitive and personal information (eg, name, surname, and address) in the narratives, all processing was undertaken at the NSWPF headquarters. Only de-identified, extracted outputs were allowed to be taken offsite for further analysis.

We used a total of 300 narratives for training, development (used to enhance the performance of the rules), and evaluation purposes (100 each). These sets are described in more detail in our previous work [13]. A hypothetical de-identified narrative is shown in Figure 1.

Categorizing Abuse Types

We categorized specific abuse types (ie, details of the abuse behavior) using several sources into nine categories [12,29,30] with 44 abuse types (Table 1). Although the provided data did not include domestic violence events involving sexual assault and stalking, there were still cases wherein these types of abuse were described in an event. Several nonspecific forms of violence (eg, “bashing,” “smack,” “assaulted,” and “clipping”) were categorized as “assault (unspecified).” A more detailed explanation of the abuse types is provided in Multimedia Appendix 1. A total of 17 common injury types were examined, including scratching, grazing, red mark/sign, tear off (nail), bruising, cut/abrasion, swelling, lump, other, fracture, black eye, broken tooth, burn mark, stab wound, bite mark, soreness, and bleeding.

Figure 1. A hypothetical example of a domestic violence event narrative as recorded by the New South Wales Police Force. Blue-highlighted terms indicate the annotated victim injuries, and yellow-highlighted terms indicate the abuse types.
View this figure
Table 1. Categories of abuse along with abuse types.
Abuse categoryAbuse type
Physical assaultAssault (unspecified), biting, blocking, choking, ordered dog attack, dragging, elbowing, attempting to set fire to premises, gagging, grabbing, hair pulling, headbutting, head locking, kicking, kneeing, physical restraining, pulling, punching, pushing, scratching, shaking, slapping, spitting, stabbing, victim being thrown around, limb twisting, attempt to harm a victim with an object or weapon, and hitting the victim with an object or weapon
ThreatIntimidation (via body language) or stating explicit threat(s) to physically harm, sexually assault, and self-harm if the victim does not comply
Sexual assaultSexual assault (eg, rape)
Emotional/verbal abuseSelf-harming when the victim does not comply, yelling profanities, and other emotional/verbal abuse
StalkingStalking, harassment, and forced entry
Financial abuseFinancial control (eg, no access to credit card)
Social abuseSocial restriction and prevent/limit child access
UnclassifiedApprehended Domestic Violence Order breach, chasing, lunging, other, and possession of personal effects (eg, phone and car keys)
Property damageProperty damage (ranging from breaking an item to causing damage to a house or vehicle)

Rule-Based System Development


Our method involved the design and implementation of rule-based language expression patterns combined with dictionary terms for the recognition of abuse types and victim injuries at the narrative level. It consisted of the following steps (Figure 2): (1) creation of relevant dictionaries to recognize mentions of abuse types and victim injuries, (2) design and implementation of rules to capture abuse types and victim injuries mentions in context, and (3) aggregation of multiple mentions in each narrative to reach domestic violence event–level annotation.


We recognized mentions of task-specific semantic groups through the development of 22 custom-made dictionaries (Table 2). The dictionaries were manually crafted by inspecting the training and the development sets for terms and expressions that describe abuse types (conducted by POIs) and victim injuries, by the first author (GK) and checked by two other authors (AA and PS) to ensure consistency. We used systematic variation (such as plural, past, and present tenses) and also included common misspellings (eg, “stuck” instead of “struck,” “harassment,” and “assalting”) frequently present in the narratives. Although the majority of the terms are noun phrases, for the “threat” dictionary, we included verbal threats made by POIs and manually expanded variations by changing a noun (eg, “your kids are going to have no father” to “your kids are going to have no mother”) and the surface expressions (“your dead” to “you’re dead” or “you are dead”).

Figure 2. An overview of the text-mining methodology used for the identification of abuse types and victim injuries from domestic violence police event narratives. DV: domestic violence; GATE: General Architecture for Engineering; WebCOPS: Web Computerised Operational Policing System.
View this figure
Table 2. The manually crafted dictionaries and their respective size (number of terms included) used to identify abuse types and victim injuries.
Dictionary nameSizeDescriptionExamples
Anatomy108Anatomical parts of the human body in which a victim has been injured by the POIaChest, leg, head, neck
Assault18Verbs that indicate a nonspecific physical attackAttacked, clipped, smacking, bashing
Attempt6Verbs that suggested a physical effort by the POI to harm the victimAttempted, aimed, trying, tried
Be4Conjugations of the verb “be” in the present and past tenseIs, was, were, are
Confiscate8Verbs describing a confiscating act by an offender towards a victimConfiscated, grabbed, snatched, grabbing
Damage22Verbs indicating an act of property damage by the POICracked, burned, shuttering, ripping
Degree14Adjectives describing the victim’s woundSuperficial, extensive, minor, major
Description59Terms (mostly adjectives) describing various attributes of an object such as color or type of made materialYellow, wooden, serving, frying
Family31Various nouns indicating the relationship between individualsBoyfriend, mother, father, cousin
First person threats123Threats made by the POI towards a victim“I will kill you,” “I am going to bury you,” “I will hunt you down and kill you,” “someone is going to kill you”
Force8Verbs describing an offender physically restrain a victimForcing, pinned, pinning, kept
Location15House locations that a DVb event occurred atToilet, loungeroom, wall, hallway
Number10Numbers in words suggesting the number of criminal counts charged at an offenderOne, two, four, six
Object174Various objects that were broken or used in a DV eventTable leg, cup, rear door, window
POI18Terms that describe an offender in a DV eventDefendant, person of interest (offender), offender accused
Premises6Terms describing a residenceUnit, terrace, flat, premises
Preposition44Various prepositions suggesting the presence of a victim’s injury in an anatomical partUnder left, lower, upper, front
Start7Verbs suggesting the initiation or continuation of an action by the offenderBegun, commenced, continuing, started
Trauma14Terms indicating a wound caused by a weapon/object used by the offender towards a victimWound, cut, trauma, fracture
Victim19Terms describing a victim in a DV eventVictim, vic, pinop (short for person in need for protection), pn (short for pinop),
Weapon155Objects used to cause harm or threaten to cause harm to a victim by an offenderArmy knife, torch, book, shotgun

aPOI: person of interest.

bDV: domestic violence.


We based our rules on syntactical patterns identified in the training and development sets, indicating the presence of an abuse type or victim injury. This work follows the same methodology that we previously developed [13]. The syntactical patterns included frozen syntactical expressions as anchors for certain elements built through specific verbs, noun phrases, and prepositions (eg, “commenced to choke”) and semantic placeholders identifiable through the application of the manually crafted dictionaries (all possible synonyms describing a victim, such as “victim,” “vic,” and “pinop”). We specifically utilized concept enumeration, since it frequently appeared in the training and development sets (eg, “Injuries: Swollen hand, soreness and scratch under left eye [mentions of victim’s injuries]”).

General Architecture for Text Engineering (GATE) [31], a text mining framework for annotating and categorizing text, enabling information recognition, was used to create and apply our rules. The observed syntactical patterns were converted into rules via Java Annotations Pattern Engine, GATE’s pattern-matching language. A total of 64 rules were created (Multimedia Appendix 2).

Elimination of Multiple Mentions

More than one syntactical pattern may be matched in an event narrative and may refer to one or more mentions of abuse types of victim injuries (that can be duplicates). This led to the extraction of highly variable mentions of abuse types and victim injuries (eg, “punch,” “punched,” and “punching” are variations of the same abuse type [“punching”]; “bruised,” “bruises,” and “purple marks” are variations of the same injury [“bruising”]). Each mention is therefore mapped to its “canonical” representative, and only one mention for each abuse type or injury is kept and used to “tag” the domestic violence narrative. For example, if, in a domestic violence event report, we have extracted three mentions of the abuse type “punching” and two mentions of the abuse type “kicking,” we only annotate two abuse types—“punching” and “kicking”—at the domestic violence event level.


The text mining system was evaluated on a set of 100 previously unseen, randomly chosen domestic violence event reports. The set was manually inspected and annotated by the first and second authors (GK and AA) who identified the type(s) of abuse and victim injuries. The inter-annotator agreement calculated as the absolute agreement rate [32] was 91%, suggesting reliable annotations. Performance of the methodology was evaluated at the narrative level (after eliminating any multiple characteristic mentions). We calculated the precision (the number of true positives against the number of true positives and false positives), recall (the number of true positives against the number of true positives and false negatives), and F1-score (the harmonic mean between precision and recall) at the domestic violence event level using standard definitions [33]. We defined true positive as the detection of a correct mention in an event; false positive as the extraction of any unrelated mention that has not been annotated manually; false negative as the correct mention that was not detected by our method; and true negative as the case where our method did not identify any mentions when none were annotated.

The results are shown in Table 3. Injuries and abuse types returned F1-scores above 85%, suggesting reliable and consistent results with small but expected drops from the training (5.5% and 9.6%, respectively) and development sets (3.9% and 6.7%, respectively). In particular, the precision was 90.2% for abuse types and 85.0% for the victim injuries, with a small decrease from the development set (2.6% and 5.2%, respectively). In a similar fashion, recall was 89.6% and 86.3% for the abuse types and victim injuries, respectively, with a drop of 5.2% and 8.0%, respectively, when compared to the values of the development set. However, the evaluation set had a significantly smaller number of victim injury mentions (n=66) from the development set (n=88) and the training set (n=83); therefore, its recall value should be considered with caution.

Large-Scale Corpus Analysis

Given the relatively accurate results of the method in identifying abuse types and victims’ injuries, we applied the method to the corpus of 492,393 domestic violence events. Over 71.32% of events (351,178) had an identified abuse type as mentioned in the report, whereas more than one-third of those events (177,607; 36.07%) contained a victim injury (Table 4).

Of the 44 abuse types, “emotional/verbal abuse” (117,488; 33.46%) was the most common, followed by “punching” (86,322; 24.58%) and “property damage” (78,203; 22.27%). A total of 35.45% (124,498 events) of domestic violence events contained only one identified abuse type, whereas 33.83% (118,819 events) of domestic violence events included three to five different abuse types (Table 5).

The most frequent injury type was “bruising” (51,455; 29.03%), followed by “cut/abrasion” (51,284; 28.93%) and “red marks/signs” (42,038; 23.71%) (Table 6). A total of 105,493 domestic violence events (59.56%) had only one form of injury, and 24.48% (43,373) of domestic violence events had two forms of injury (Table 7).

Table 3. Performance of the system on the training, development, and evaluation sets for the identification of abuse types and victim injuries with true positive, false positive, and false negative results.
Set and characteristicPrecision (%)Recall (%)F1-score (%)True positive (%)False positive (%)False negative (%)

Abuse type90.289.689.82592830

Abuse type92.894.893.73102417

Abuse type93.996.395.32931911
Table 4. Number of domestic violence events containing various abuse types (n=351,178).
Abuse typeEvents, n (%)
Assault (unspecified)171,323 (48.79)
Emotional/verbal abuse117,488 (33.46)
Punching86,322 (24.58)
Property damage78,203 (22.27)
Intimidation75,662 (21.55)
Grabbing66,728 (19.00)
Pushing62,794 (17.88)
Scratching20,493 (5.84)
Physical restraining20,014 (5.70)
Kicking19,435 (5.53)
Slapping17,474 (4.98)
ADVOa breach16,903 (4.81)
Attempting to hit with an object or weapon13,592 (3.87)
Hair pulling/dragging by hair13,048 (3.72)
Choking11,325 (3.22)
Spitting9341 (2.66)
Hitting with an object or weapon8387 (2.39)
Other7135 (2.03)
Pulling6373 (1.81)
Victim being thrown around5255 (1.50)
Lunging4685 (1.33)
Possession of personal effects3265 (0.93)
Blocking3163 (0.90)
Harassment3100 (0.88)
Stalking2940 (0.84)
Self-harming2597 (0.74)
Biting2285 (0.65)
Dragging2216 (0.63)
Shaking2098 (0.60)
Stabbing1903 (0.54)
Forced entry1779 (0.51)
Headlocking1482 (0.42)
Chasing1324 (0.38)
Kneeing1321 (0.38)
Gagging1161 (0.33)
Elbowing225 (0.06)
Limb twisting173 (0.05)
Headbutting148 (0.04)
Sexual assault125 (0.04)
Prevent child access91 (0.03)
Social restriction40 (0.01)
Financial control29 (0.01)
Attempting to set fire to premises28 (0.01)
Ordered dog attack1 (0.00)

aADVO: Apprehended Domestic Violence Order.

Table 5. Domestic violence events according to the number of abuse types (n=351,178).
Number of abuse type(s)Events, n (%)
1124,498 (35.45)
289,342 (25.44)
3-5118,819 (33.83)
6-917,951 (5.11)
>10568 (0.16)
Total351,178 (100.0)
Table 6. Number of events containing various injury types (n=177,607).
Injury typeEvents, n (%)
Bruising51,455 (29.03)
Cut/abrasion51,284 (28.93)
Red mark(s)42,038 (23.71)
Swelling32,581 (18.38)
Soreness26,729 (15.08)
Other19,778 (11.16)
Bleeding19,154 (10.81)
Fracture(s)17,531 (9.89)
Lump9482 (5.35)
Grazing7305 (4.12)
Black eye(s)2994 (1.69)
Scratching2399 (1.35)
Bite mark(s)2350 (1.33)
Stab wound(s)2346 (1.32)
Burn mark(s)1382 (0.78)
Broken tooth620 (0.35)
Tear off nail(s)7 (0.00)
Table 7. Domestic violence events according to the number of victim injury types (n=177,607).
Number of injury typesEvents, n (%)
1105,493 (59.56)
243,373 (24.49)
3-425,678 (14.49)
5-62484 (1.40)
≥789 (0.05)
Total177,117 (100.0)

Principal Results

To the best of our knowledge, this analysis represents the first attempt to capture domestic violence–related abuse and victim injuries using a large, population-level corpus of domestic violence events recorded by the police. The identification of abuse types conducted by POIs and various injuries sustained by victims in domestic violence disputes are not recorded in the structured information of the WebCOPS database fields. We therefore focused on the narrative part, where the application of our knowledge-driven approach has identified rich information and has the potential to be used for better understanding domestic violence and the development of related prevention interventions, surveillance, and reporting.

Our findings derived from text mining present a more detailed picture of the types of injuries and abuse occurring in domestic violence events. The most common abuse type in our dataset was nonphysical and involved “emotional/verbal abuse,” which is consistent with the recent findings showing that nonphysical abuse types are more prevalent than physical ones [34] and that victims of domestic violence abuse are more likely to sustain certain types of injuries such as cuts and fractures than others [34,35]. Domestic violence can also take myriad physical forms, ranging from victim intimidation to cases where serious and grievous bodily harm is caused by a specific type of abuse (eg, “punching,” “stabbing,” and “choking”), which have both short- and long-term physical and mental health consequences [9-11].

Through the recognition of various abuse types and related victim injuries, potential exists to develop prevention and intervention guidelines by linking this information to diagnostic data held by health services, so that surveillance and monitoring of the victims can be performed. There is also a possibility to track any potential timelines in which the victim was abused. Moreover, the text mining method can be updated on an ongoing basis to monitor trends and inform risk stratification algorithms, which can drive domestic violence–prevention strategies targeting specific groups.

With the inclusion of domestic violence in the WHO’s Sustainable Development Goals, the need for accurate reporting in this area will be necessary [36]. Text mining the police’s domestic violence event narratives is possibly a source of obtaining very nuanced information on this topic including the cause of the event, the potential role of mental illness and substance (ab)use in the event, the types of abuse perpetrated, injuries sustained, weapons used, and information on relationship status. This invaluable information can then be used to target prevention strategies for use by those providing prevention services to particular groups and to identify warning signs for health care providers. A recent report indicated that in Australia, from 2012-2013 to 2013-2014, one woman was killed each week and one man was killed each month as a result of violence from a current or previous partner [7]. Subsequent analyses of this rich information will aim to examine these issues and identify early warning signs of abuse and domestic violence events, which may improve assistance in preventing homicides in domestic violence settings.

Error Analysis

Although the level of accuracy was acceptable for large-scale analysis to identify trends in domestic violence events, there were still some errors in both abuse types and victim injuries at the level of individual narrative reports. By inspecting the evaluation set, we observed that the system erroneously extracted few instances (five cases) of several POI injuries as victim injuries, since the rules were triggered for the POIs (eg, “minor grazing to the right shoulder [false positive for injury] of the POI”). In other instances (4 cases), victim injuries were incorrectly identified when they actually referred to property damage through ambiguous word and syntactical pattern combinations that indicated an injury (eg, “INJURIES/MEDICAL TREATMENT/DAMAGE TO PROPERTY: Broken table leg [false positive for victim injury]”). In 12 domestic violence events, when a victim fought back against a POI, any actions by the victim in self-defence were erroneously extracted as an abuse type (eg, “witness stepped in and grabbed [false positive for abuse type] the POI and pinned him to the ground [false positive for abuse type] until he calmed down” and “...has admitted she physically pushed him [false positive for abuse type] back after he pushed [true positive for abuse type] into her”). There were few occasions where an abuse type was recognized but had no domestic violence context (eg, “The Accused was closed inside the caged area, where he began kicking [false positive for abuse type] at the door and yelling at the police officers...”), while others had not occurred but were likely to happen in the future (eg, “The victim believes if she stayed at the residence she would definitely have been bashed [false positive for abuse type] by the accused and possibly stabbed [false positive for abuse type]”).

Although we engineered the rules based on generic syntactical patterns that stated victim injuries and abuse types, these rules ignored a limited number of injury mentions, since they were not explicitly stated to have been sustained by the victim (eg, “redness [false negative for injury] and grazes [false negative for injury] sighted on back, dried blood [false negative for injury] on lips”). Some examples (eight cases) were more implicit and required additional inference using some related terms (eg, “the POI placed his hand in the middle of the victim's sternum and applied force [false negative for injury] causing her pain and shortness of breath”). Cases like these were the majority of false negatives for abuse types, suggesting that abuse types such as “grabbing” and “punching” can have quite a few lexical variations in the narratives, which indicate richness of the contexts.

Additionally, injury or abuse type mentions (six cases) that were accompanied by the victim’s surname were excluded from our rule design, since there was no way to determine from the narrative who was the victim or POI without using the structured part of the record (eg, “xxx had a bleeding nose [false negative for injury]” and “xxx yelled verbal abuse [false negative for abuse type] at her”).


Our text mining system could have missed cases due to more specialized or explicit mentions of abuse types occurring in domestic violence events, since we based our extraction rules on the information contained in only 200 narratives. Despite incorporation of all types of abuse, there are still likely to be cases in which we probably did not identify explicit types. The relatively smaller number of injury mentions in the evaluation set (when compared to one of the abuse types) could explain the lower performance for the injuries. Nonetheless, we designed our rules based on common syntactical patterns that would attribute abuse types/injury mentions toward POIs and victims, respectively, in order to avoid the generation of false negatives; hence, our recall was higher than the precision in all three datasets. Nevertheless, this approach was able to identify the victim’s actions as types of POI’s abuse as well as POI’s injuries as those of the victim in some instances. This suggests that more specific engineered rules could address this issue. Similarly, although we included the basic and most common forms of injuries, there would be instances containing other causes of injuries or particular abuse types leading to specific injuries that probably have been excluded from our approach. Additionally, the implementation of spell-checking algorithms could assist in the identification of any misspelled abuse types or injuries and potentially elevate performance.

Our analysis of the results from the large corpus of domestic violence events is limited to the abuse types and victim’s injuries. We plan to use this information in combination with administrative data collections on mental illness to further examine the nexus between mental illness and domestic violence and explore the relationship of abuse types with gender and victim injuries. It is pertinent to inquire whether domestic violence victims with mental illness are more vulnerable than those without mental illness in this large-scale dataset spanning 10 years, to identify new intel. Further analysis of the results combined with demographic variables can show interesting aspects of the data in the area of the domestic violence, from prevalence and incidence rates in specific cohorts to risk factors for the occurrence (or recurrence) of domestic violence events. The combination of victim injuries from clinical data resulting from health service contacts could assist in the early identification of victim abuse and the implementation of intervention strategies. Modelling will be used to investigate whether POI characteristics can predict severity of abuse and similarly, whether certain victim phenotypes are prone to particular types of abuse.


We demonstrated that a knowledge-driven approach can be used for the automated extraction of abuse types and victim injuries involved in domestic violence events. The performance was encouraging, with 90.2% and 85.0% precision for abuse types and injuries, respectively, further implicating that text mining can be used to extract meaningful information from these unstructured data on a large scale. The identified information has enabled us to confirm the magnitude of abuse that victims endure during domestic violence. The results can be used to support further public health research that aims to assess the profiling of POIs involved in domestic violence events and to alter existing intervention policies for victims of abuse.


The authors would like to thank the NSWPF for their assistance with this project, particularly Dr Chris Devery, Dr Christie Wallace, John Blanchette, Erin Sharland, and Nicole Grant. This research was supported by an Australian Institute of Criminology/Criminology Research Grant (34/15-16).

Conflicts of Interest

None declared.

Multimedia Appendix 1

Brief description of the extracted abuse types.

PDF File (Adobe PDF File), 215KB

Multimedia Appendix 2

Rule examples for recognition of abuse types and victim injuries.

PDF File (Adobe PDF File), 192KB

  1. Howard LM, Trevillion K, Khalifeh H, Woodall A, Agnew-Davies R, Feder G. Domestic violence and severe psychiatric disorders: prevalence and interventions. Psychol Med 2010 Jun;40(6):881-893. [CrossRef] [Medline]
  2. Robinson L, Spilsbury K. Systematic review of the perceptions and experiences of accessing health services by adult victims of domestic violence. Health Soc Care Community 2008 Jan;16(1):16-30. [CrossRef] [Medline]
  3. Trevillion K, Oram S, Feder G, Howard LM. Experiences of domestic violence and mental disorders: a systematic review and meta-analysis. PLoS One 2012;7(12):e51740 [FREE Full text] [CrossRef] [Medline]
  4. Home Office Statistical Bulletin. 2008. Crime in England and Wales 2007/08: Findings from the British Crime Survey and police recorded crime   URL: [accessed 2019-02-22] [WebCite Cache]
  5. Briodi A. Sydney City Council and NSWPF. 2010. Domestic Violence is a crime electronic resource   URL: [accessed 2018-07-11] [WebCite Cache]
  6. Australian Government - Department of Social Services. The National Plan to Reduce Violence against Women and their Children 2010-2022   URL: https:/​/www.​​women/​programs-services/​reducing-violence/​the-national-plan-to-reduce-violence-against-women-and-their-children-2010-2022 [accessed 2019-02-28] [WebCite Cache]
  7. Australian Institute of Health and Welfare. 2018. Family, domestic and sexual violence in Australia   URL: https:/​/www.​​reports/​domestic-violence/​family-domestic-sexual-violence-in-australia-2018/​contents/​summary [accessed 2018-12-07] [WebCite Cache]
  8. Foshee V. Gender differences in adolescent dating abuse prevalence, types and injuries. Health Educ Res 1996;11(3):275-286. [CrossRef]
  9. Kelly J, Johnson M. Differentiation among types of intimate partner violence: Research update and implications for interventions. Family Court Review 2008 Jul;46(3):476-499. [CrossRef]
  10. Capaldi D, Shortt J, Kim H, Wilson J, Crosby L, Tucci S. Official incidents of domestic violence: Types, injury, and associations with nonofficial couple aggression. Violence and Victims 2009;24(4):502. [Medline]
  11. Cleak H, Schofield M, Axelsen L, Bickerdike A. Screening for Partner Violence Among Family Mediation Clients: Differentiating Types of Abuse. J Interpers Violence 2018 Apr;33(7):1118-1146. [CrossRef] [Medline]
  12. KPMG. 2016. The cost of violence against women and their children in Australia   URL: https:/​/www.​​sites/​default/​files/​documents/​08_2016/​the_cost_of_violence_against_women_and_their_children_in_australia_-_summary_report_may_2016.​pdf [accessed 2019-03-01] [WebCite Cache]
  13. Karystianis G, Adily A, Schofield P, Knight L, Galdon C, Greenberg D, et al. Automatic Extraction of Mental Health Disorders From Domestic Violence Police Narratives: Text Mining Study. J Med Internet Res 2018 Sep 13;20(9):e11548. [CrossRef] [Medline]
  14. Macdonald W, Fitzgerald J. NSW Government: Justice – Bureau of Crime Statistics and Research. 2014. Understanding fraud: The nature of fraud offences recorded by NSW Police   URL: [accessed 2019-02-22] [WebCite Cache]
  15. Abbe A, Grouin C, Zweigenbaum P, Falissard B. Text mining applications in psychiatry: a systematic literature review. Int J Methods Psychiatr Res 2016 Dec;25(2):86-100. [CrossRef] [Medline]
  16. Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 2004;11(5):392-402 [FREE Full text] [CrossRef] [Medline]
  17. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010;17(5):507-513 [FREE Full text] [CrossRef] [Medline]
  18. Spasić I, Livsey J, Keane JA, Nenadić G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform 2014 Sep;83(9):605-623 [FREE Full text] [CrossRef] [Medline]
  19. Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, et al. Clinical information extraction applications: A literature review. J Biomed Inform 2018 Jan;77:34-49 [FREE Full text] [CrossRef] [Medline]
  20. Karystianis G, Dehghan A, Kovacevic A, Keane JA, Nenadic G. Using local lexicalized rules to identify heart disease risk factors in clinical notes. J Biomed Inform 2015 Dec;58 Suppl:S183-S188 [FREE Full text] [CrossRef] [Medline]
  21. Matto G, Mwangoka J. Detecting crime patterns from Swahili newspapers using text mining. IJKEDM 2017;4(2):145-156. [CrossRef]
  22. Nokhbeh Zaeem R, Manoharan M, Yang Y, Barber K. Modeling and analysis of identity threat behaviors through text mining of identity theft stories. Computers & Security 2017 Mar;65:50-63. [CrossRef]
  23. Arulanandam R, Savarimuthu B, Purvis M. Extracting crime information from online newspaper articles. 2014 Jan 20 Presented at: Proceedings of the Second Australasian Web Conference-Volume 155; 2014; Auckland, New Zealand.
  24. Chau M, Xu J, Chen H. Extracting meaningful entities from police narrative reports. 2002 May 19 Presented at: Proceedings of the annual national conference on Digital government research; 2002; Los Angeles, USA.
  25. Ananyan S. AMCIS 2004 Proceedings. 2004. Crime pattern analysis through text mining   URL: [accessed 2019-03-01] [WebCite Cache]
  26. Poelmans J, Elzinga P, Viaene S, Dedene G. Formally analysing the concepts of domestic violence. Expert Systems with Applications 2011 Apr;38(4):3116-3130. [CrossRef]
  27. Ku C, Iriberri A, Leroy G. Crime information extraction from police and witness narrative reports. 2008 Presented at: Technologies for Homeland Security, IEEE Conference on; 2008; Boston, USA.
  28. Iriberri A, Leroy G. Natural language processing and e-government: Extracting reusable crime report information. 2007 Presented at: Information Reuse and Integration, IEEE International Conference; 2007; Las Vegas, USA.
  29. White Ribbon Australia. 2018. Physical Abuse   URL: [WebCite Cache]
  30. Mouzos J, Makkai T. Australian Institute of Criminology. 2004. Women's experiences of male violence: findings from the Australian component of the International Violence Against Women Survey (IVAWS)   URL: [accessed 2019-03-01] [WebCite Cache]
  31. Cunningham H, Tablan V, Roberts A, Bontcheva K. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics. PLoS Comput Biol 2013;9(2):e1002854 [FREE Full text] [CrossRef] [Medline]
  32. Ananiadou S, McNaught J, editors. Text Mining for Biology and Biomedicine. Boston, MA: Artec House Publishers; 2006.
  33. Ananiadou S, Kell DB, Tsujii J. Text mining and its potential applications in systems biology. Trends Biotechnol 2006 Dec;24(12):571-579. [CrossRef] [Medline]
  34. Outlaw M. No One Type of Intimate Partner Abuse: Exploring Physical and Non-Physical Abuse Among Intimate Partners. J Fam Viol 2009 Feb 27;24(4):263-272. [CrossRef]
  35. Muelleman R, Lenaghan P, Pakieser R. Battered women: injury locations and types. Annals of emergency medicine 1996;28(5):486-492. [Medline]
  36. World Health Organization. The Sustainable Development Goals (SDG) and violence prevention: how do they connect? 2018;   URL: https:/​/www.​​violence_injury_prevention/​violence/​7th_milestones_meeting/​Butchart_SDGs_and_violence_prevention.​pdf?ua=1 [accessed 2018-12-07] [WebCite Cache]

ADVO: Apprehended Domestic Violence Order
DV: domestic violence
GATE: General Architecture for Engineering
NSWPF: New South Wales Police Force
POI: person of interest
WebCOPS: Web Computerised Operational Policing System

Edited by G Eysenbach; submitted 11.12.18; peer-reviewed by I Spasic, A Davoudi; comments to author 05.01.19; revised version received 31.01.19; accepted 10.02.19; published 12.03.19


©George Karystianis, Armita Adily, Peter W Schofield, David Greenberg, Louisa Jorm, Goran Nenadic, Tony Butler. Originally published in the Journal of Medical Internet Research (, 12.03.2019.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.