Automated Document Indexing via Intelligent Hierarchical Clustering: A Novel Approach
April 01, 2015 ยท Declared Dead ยท ๐ 2014 International Conference on High Performance Computing and Applications (ICHPCA)
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Rajendra Kumar Roul, Shubham Rohan Asthana, Sanjay Kumar Sahay
arXiv ID
1504.00191
Category
cs.IR: Information Retrieval
Citations
2
Venue
2014 International Conference on High Performance Computing and Applications (ICHPCA)
Last Checked
3 months ago
Abstract
With the rising quantity of textual data available in electronic format, the need to organize it become a highly challenging task. In the present paper, we explore a document organization framework that exploits an intelligent hierarchical clustering algorithm to generate an index over a set of documents. The framework has been designed to be scalable and accurate even with large corpora. The advantage of the proposed algorithm lies in the need for minimal inputs, with much of the hierarchy attributes being decided in an automated manner using statistical methods. The use of topic modeling in a pre-processing stage ensures robustness to a range of variations in the input data. For experimental work 20-Newsgroups dataset has been used. The F- measure of the proposed approach has been compared with the traditional K-Means and K-Medoids clustering algorithms. Test results demonstrate the applicability, efficiency and effectiveness of our proposed approach. After extensive experimentation, we conclude that the framework shows promise for further research and specialized commercial applications.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Information Retrieval
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
R.I.P.
๐ป
Ghosted
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
๐
๐
Old Age
Neural Graph Collaborative Filtering
R.I.P.
๐ป
Ghosted
Self-Attentive Sequential Recommendation
R.I.P.
๐ป
Ghosted
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted