Multimodal Measurement of Depression Using Deep Learning Models.

MM '17: ACM Multimedia Conference Mountain View California USA October, 2017(2017)

引用 153|浏览75
暂无评分
摘要
This paper addresses multi-modal depression analysis. We propose a multi-modal fusion framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. Our framework considers audio, video and text streams. For each modality, handcrafted feature descriptors are input into a DCNN to learn high-level global features with compact dynamic information, then the learned features are fed to a DNN to predict the PHQ-8 scores. For multi-modal fusion, the estimated PHQ-8 scores from the three modalities are integrated in a DNN to obtain the final PHQ-8 score. Moreover, in this work, we propose new feature descriptors for text and video. For the text descriptors, we select the participant»s answers to the questions associated with psychoanalytic aspects of depression, such as sleep disorder, and make use of the Paragraph Vector (PV) to learn the distributed representations of these sentences. For the video descriptors, we propose a new global descriptor, the Histogram of Displacement Range (HDR), calculated directly from the facial landmarks to measure their displacements and speed. Experiments have been carried out on the AVEC2017 depression sub-challenge dataset. The obtained results show that the proposed depression recognition framework obtains very promising accuracy, with the root mean square error (RMSE) as 4.653, mean absolute error (MAE) as 3.980 on the development set, and RMSE as 5.974, MAE as 5.163 on the test set.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要