Unsupervised Semantic Similarity Estimation using Random Walk on Lexical Substitution Graph

Authors

1 Department of Computer Engineering, Yazd University, Yazd, Iran

2 Senior Researcher at Parsijoo Persian Search Engine

3 Head of Parsijoo Persian Search Engine

Abstract

This paper introduces the indirect substitutability relation for the first time to provide a practical solution for estimating semantic similarity. Proposed method is an unsupervised semantic similarity estimation method, which is benefitted from taking into account the indirect substitutability relation. This method recognizes the substitutability between two terms by considering a third term, which has similar lexical context with each of them separately. To model this relation, we generate a graph using substitutable pairs of terms. The strength of the relation between each pair of terms is approximated by propagating semantic score through the substitutability graph. This method is language independent and uses only textual corpora to generate the substitution graph. Furthermore, it supports semantic similarity estimation in languages suffering from lack of dense corpora. Results of our experiments using RG-65 Persian dataset show that the proposed method outperforms the baseline algorithms. The proposed method improves the estimation from 0.03 Spearman's correlation up to 0.13 in comparison with the baseline algorithms.

Keywords