Classification of Legal Documents in Portuguese Language Based on Summarization

2022 IEEE Latin American Conference on Computational Intelligence (LA-CCI)(2022)

引用 0|浏览17
暂无评分
摘要
Legal document classification in Portuguese language is a research area highly benefited by computational intelligence techniques as the availability of better processing with the easiness of digital text recording of juridic processes. Different techniques have been explored to achieve reliable results in real-world conditions; however, the most suitable configuration of methods remains to be an open problem. This study proposes a model consisting of four stages: preprocessing, extractive summarization using page rank algorithm, feature extraction with bag-of-words, and classification with Support Vector Classifier. Testing sessions were conducted using three versions of the model as a mean for comparison and evaluation. The first one was a basic classifier without preprocessing nor summarization stages, the second included preprocessing but not summarization, and the third one was an implementation of the complete proposed model. All three were evaluated using a separated set of examples falling into six different labeled categories and their performance was recorded calculating weighted average precision, recall, F1-score and accuracy values. The best performance obtained was the one presented by the proposed model with precision, recall and F-1 score values of 96% each, which represents a 2% improvement for all of them in comparison to the first version and a 1 % improvement for precision and recall in comparison to the second version. Specially F1-score pointing to the most balanced performance, the proposed model outperformed the versions of it itself excluding some stages, allowing to infer that preprocessing and extractive summarization have positive impacts in the text classification task for Portuguese-written legal documents.
更多
查看译文
关键词
Text Classification,Summarization,Bag of Words
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要