Skip to main content
Kent Academic Repository

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

Jameel, Shoaib, Schockaert, Steven (2019) Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors. In: The 57th Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference. . ACL ISBN 978-1-950737-48-2. (KAR id:74583)

Abstract

Word embedding models typically learn two types of vectors: target word vectors and context word vectors. These vectors are normally learned such that they are predictive of some word co-occurrence statistic, but they are otherwise unconstrained. However, the words from a given language can be organized in various natural groupings, such as syntactic word classes (e.g. nouns, adjectives, verbs) and semantic themes (e.g. sports, politics, sentiment). Our hypothesis in this paper is that embedding models can be improved by explicitly imposing a cluster structure on the set of context word vectors. To this end, our model relies on the assumption that context word vectors are drawn from a mixture of von Mises- Fisher (vMF) distributions, where the parameters of this mixture distribution are jointly optimized with the word vectors. We show that this results in word vectors which are qualitatively different from those obtained with existing word embedding models. We furthermore show that our embedding model can also be used to learn high-quality document representations.

Item Type: Conference or workshop item (Proceeding)
Divisions: Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User: Shoaib Jameel
Date Deposited: 26 Jun 2019 08:38 UTC
Last Modified: 05 Nov 2024 12:37 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/74583 (The current URI for this page, for reference purposes)

University of Kent Author Information

Jameel, Shoaib.

Creator's ORCID:
CReDIT Contributor Roles:
  • Depositors only (login required):

Total unique views for this document in KAR since July 2020. For more details click on the image.