Sherlock in OSS: A Novel Approach of Content-Based Searching in Object Storage System

arxiv(2023)

引用 0|浏览146
暂无评分
摘要
Object Storage Systems (OSS) inside a cloud promise scalability, durability, availability, and concurrency. However, open-source OSS does not have a specific approach to letting users and administrators search based on the data, which is contained inside the object storage, without involving the entire cloud infrastructure. Therefore, in this paper, we propose Sherlock, a novel Content-Based Searching (CoBS) architecture to extract additional information from images and documents and store it in an Elasticsearch-enabled database, which helps us to search for our desired data based on its contents. This approach works in two sequential stages. First, it will be uploaded to a classifier that will select the data type and send it to the specific model for the data. The images that are being uploaded are sent to our trained model for object detection, and the documents are sent for keyword extraction. Next, the extracted information is sent to Elasticsearch, which enables searching based on the contents. Because the precision of the models is so fundamental to the search's correctness, we train our models with comprehensive datasets (Microsoft COCO Dataset) for multimedia data and SemEval2017 Dataset for document data. Furthermore, we put our designed architecture to the test with a real-world implementation of an open-source OSS called OpenStack Swift. In addition, we upload images into the dataset in various segments to find out the efficacy of our proposed model in real-life Swift object storage.
更多
查看译文
关键词
searching,storage,oss,content-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要