DictPress is language agnostic and has no concept of language semantics.
It stores all data in a Postgres database in just two tables
To make a universal dictionary interface possible,
it treats all dictionary entries as UTF-8 strings that can be accurately searched with Postgres DB's fulltext capabilities
by storing tsvector
tokens alongside them. The tokens that encode and make the entries searchable
can be anything—simple stemmed words or phonetic hashes like Metaphone.
Postgres comes with built-in tokenizers
for two dozen languages (
\dFd to see the full list on psql).
For languages that do not have Postgres fulltext tokenisation, search tokens can be generated externally and plugged in. For example, Olam uses MLPhone, a simple Metaphone like phonetic hashing algorithm that allows Malayalam words in the dictionary to be searched by how they sound.
See this article for historical context on the project.