Semantic N-gram feature analysis and machine learning–based classification of drivers' hazardous actions at signal-controlled intersections

KM Kwayu, V Kwigizile, J Zhang… - Journal of Computing in …, 2020 - ascelibrary.org
KM Kwayu, V Kwigizile, J Zhang, JS Oh
Journal of Computing in Civil Engineering, 2020ascelibrary.org
Abstract In the United States, it is common for crash reports to include a narrative that
contains a police officer's written summary of the crash. The crash narratives provide
valuable information that can assist in understanding circumstances surrounding a crash at
a given roadway location. However, the crash report narratives contain unstructured textual
information, which is hard to extract or utilize in analyses considering there are hundreds of
thousands of reports. This study uses Michigan's crash reports (UD-10) to demonstrate how …
Abstract
In the United States, it is common for crash reports to include a narrative that contains a police officer’s written summary of the crash. The crash narratives provide valuable information that can assist in understanding circumstances surrounding a crash at a given roadway location. However, the crash report narratives contain unstructured textual information, which is hard to extract or utilize in analyses considering there are hundreds of thousands of reports. This study uses Michigan’s crash reports (UD-10) to demonstrate how natural language processing (NLP) techniques can be useful in extracting information from the UD-10 crash report narratives to better understand crash scenarios. Reports of crashes at signal-controlled intersections in Michigan involving responsible (i.e., at fault) drivers who were issued a “fail to yield” or “disregard traffic control” hazardous action citation were used in the analysis. Semantic analysis was conducted to discern the most likely crash scenario at signal-controlled intersections for each of the hazardous action with respect to the responsible driver’s movement. Support vector machines and boosted classification trees were developed using unigram and bigram features with different n-gram feature deployment scenarios to predict hazardous action citations. Support vector machines using a mixture of unigram and bigram features performed better than the boosted classification tree, with an out-of-sample predictive accuracy of 86.1 percent and area under Receiver Operating Curve (ROC) of 0.917. Overall, the results can help safety engineers and analysts to ascertain the causes of a crash by detailing the chain of precrash events leading to a crash.
ASCE Library
以上显示的是最相近的搜索结果。 查看全部搜索结果