• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Center for Artificial Intelligence and Cybersecurity – AIRI

  • Home
  • About Us
    • Center Activities
    • Vision, Mission and Goals
    • Center Faculty
    • Steering Committee
    • Press
  • Research
    • Scientific Projects
    • Research Papers
  • Laboratories
    • Machine Learning
    • Natural Speech & Language Processing
    • Blockchain Technology
    • Information Processing & Pattern Recognition
    • AI in Medicine
    • Data Mining
    • Computer Vision
    • Complex Networks
    • Human-Computer Interaction
    • Maritime Cybersecurity
    • Autonomous Navigation
    • AI in Mechatronics
    • AI in Education
    • Hybrid Computational Methods
    • Drug Design
    • Legal Aspects of AI
    • Ethically Aligned AI
    • Cultural Complexity
    • Trustworthy and Explainable AI
  • Collaboration
    • Industry Collaboration
    • Industry Projects
    • International Collaboration
  • News
  • Contact
  • Login

A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings

18.12.2020

Measuring the semantic similarity of texts has a vital role in various tasks from the field of natural language processing. In this paper, we describe a set of experiments we carried out to evaluate and compare the performance of different approaches for measuring the semantic similarity of short texts. We perform a comparison of four models based on word embeddings: two variants of Word2Vec (one based on Word2Vec trained on a specific dataset and the second extending it with embeddings of word senses), FastText, and TF-IDF. Since these models provide word vectors, we experiment with various methods that calculate the semantic similarity of short texts based on word vectors. More precisely, for each of these models, we test five methods for aggregating word embeddings into text embedding. We introduced three methods by making variations of two commonly used similarity measures. One method is an extension of the cosine similarity based on centroids, and the other two methods are variations of the Okapi BM25 function. We evaluate all approaches on the two publicly available datasets: SICK and Lee in terms of the Pearson and Spearman correlation. The results indicate that extended methods perform better from the original in most of the cases.

Authors:
Karlo Babić, Francesco Guerra, Sanda Martinčić-Ipšić, Ana Meštrović
Journal:
Journal of Information and Organizational Sciences
Publishing date:
09.12.2020
View original article

Primary Sidebar

Latest Projects

Advanced Data Analysis Using Digital Signal Processing and Machine Learning Techniques

Compound Flooding in Coastal Rivers in Present and Future Climate

Data Processing on Graphs

North Adriatic Hydrogen Valley

Data Governance and Intellectual Property Governance in Common European Data Spaces – DGIP-CEDS

Latest Research Papers

Forecasting the Trajectory of Personal Watercrafts Using Models Based on Recurrent Neural Networks

A System for Real-Time Detection of Abandoned Luggage

Enhancing Biophysical Muscle Fatigue Model in the Dynamic Context of Soccer

Pravna tehnologija (Legal Tech) i njezina (ne)prikladnost za zamjenu pravne struke

Regression-Based Machine Learning Approaches for Estimating Discharge from Water Levels in Microtidal Rivers

Latest News

Arian Skoki defended his doctoral thesis “Data-Driven Assessment of Player Performance and Recovery in Soccer”

Anna Maria Mihel defended her PhD dissertation topic

Prof. dr. sc. Renato Filjar participated at the meeting of the 31st National Space-Based Positioning, Navigation and Timing US Advisory Board

Presentation of the NPOO project Peoplet

Ana Vranković Lacković defended her doctoral thesis

We provide the expertise for solving real world problems using AI

If your company wants to implement artificial intelligence in your products or services, or increase your level of cybersecurity, our multidisciplinary team of scientists is your ideal partner.

Contact us

Footer

Center for Artificial Intelligence and Cybersecurity
  • jlerga@airi.uniri.hr
  • +385 51 406 500

University of Rijeka

University of Rijeka

About the Center

  • About Us
  • News
  • Privacy Policy
  • Contact

Center Activities

  • Laboratories
  • Scientific Projects
  • Industry Projects
  • Research Papers
  • Industry Collaboration
  • International Collaboration

Footer bottom left

© 2020 Center for Artificial Intelligence and Cybersecurity, all rights reserved.

Designed & developed by Nela Dunato Art & Design