O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population

Abstracts(2023)

Cited 0|Views13
No score
Abstract

Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to take into account potential future changes, which could negatively impact the reliability of JEMs when used outside their development period. Moreover, the process of developing JEMs for emerging or new exposure factors is a laborious, time-consuming process. Within the Exposome Project for Health and Occupational Research (EPHOR; https://www.ephor-project.eu/), we have been exploring the use of Natural Language Processing (NLP) as a vehicle for streamlining the update of existing JEMs and the development of new JEMs. Specifically, we will develop named entity recognition (NER) tools to automatically detect mentions of exposure-related concepts in literature, thus increasing the efficiency of locating relevant information for JEM update and development.Accordingly, we have developed a novel annotated corpus, i.e., 50 literature articles concerning workplace exposure to diesel exhaust, in which exposure assessment experts used guidelines to annotate all mentions of six different named entity categories (substance, occupation, industry/workplace, job task/activity, measurement device and sample type) occurring in the abstract, methods and results sections. The corpus will be used to train machine learning NER algorithms. Each article was annotated independently by two experts, and Inter-Annotator Agreement (IAA) scores were calculated to assess annotation quality. Exact matching scores (requiring agreement of semantic category and exact annotation span) ranged from 0.38 to 0.79 F1 for individual categories (average: 0.56). Relaxed matching scores (requiring agreement of category and partially overlapping spans) ranged from 0.63 to 0.87 F1 (average: 0.72). These results suggest that annotation quality is sufficient for machine learning. We will present the annotation scheme, guidelines and preliminary analysis of the results.
More
Translated text
Key words
job exposure matrices,chemical exposures,natural language processing,natural language
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined