Multilingual Phrase Sampling for Text Entry Evaluations

Text entry evaluations are typically conducted with English-only phrase sets. This calls into question the validity of the results when conducting evaluations with non-native English speakers. Automated phrase sampling methods alleviate this problem, however they are difficult to use in practice and do not take into account language semantics, which is an important attribute to optimize. To achieve this goal, we present Kaps, a phrase sampling method that uses the BabelNet multilingual semantic network as a common knowledge resource, aimed at both standardizing and simplifying the sampling procedure to a great extent. We analyze our method from several perspectives, namely the effect of sampled phrases on user's foreign language proficiency, phrase set memorability and representativeness, and semantic coverage. We also conduct a large-scale evaluation involving native speakers of 10 different languages. Overall, we show that our method is an important step toward and provides unprecedented insight into multilingual text entry evaluations.

Software

Phrase sets

The following are the phrase sets we sampled from the OpenSubtitles dataset. Each set has 2000 phrases in lowercase form and without punctuation symbols.

Citation

If you use our application, API, or any derivation thereof, please cite the following paper:

BibTeX entry:

@Article{Salvador18_kaps,
  author    = {Marc Franco-Salvador and Luis A. Leiva},
  title     = {Multilingual Phrase Sampling for Text Entry Evaluations},
  booktitle = {International Journal of Human-Computer Studies},
  volume    = {113},
  number    = {1},
  year      = {2018},
}


(cc) 2026 The authors