dictpress - Build and publish dictionaries for any language

Examples dictionaries:

Alar — Kannada-English dictionary.
Olam — English-Malayalam, Malayalam-Malayalam dictionary.

Screenshot of Olam (English-Malayalam) dictionary

Features

Build dictionaries for any language to any language.
Supports multiple dictionaries and languages in the same database.
Custom themes and templates for publishing dictionary websites.
Paginated A-Z (all alphabets for any language) glossaries.
HTTP/JSON API for search and everything else.
Pluggable search tokenizers (Lua scripts) and algorithms for fulltext search, phonetic search etc.
Admin UI for managing and curating dictionary data.
Admin moderation UI for crowd sourcing dictionary entries.
Bulk CSV to database import.

How it works

DictPress is language agnostic and has no concept of language semantics. It stores all data in an SQLite database file in just two tables entries and relations. To make a universal dictionary interface possible, it treats all dictionary entries as UTF-8 strings that can be searched with SQLite's fulltext capabilities on tokens alongside them. The tokens that encode and make the entries searchable can be anything—simple stemmed words or phonetic hashes like Metaphone.

DictPress bundles a Snowball stemming algorithm library which supports [arabic, danish, dutch, english, finnish, french, german, greek, hungarian, italian, norwegian, portuguese, romanian, russian, spanish, swedish, tamil, turkish].

For languages that do not have fulltext tokenisation, search tokens can be generated externally using any algorithm and plugged in. For example, Olam uses MLPhone, a simple Metaphone like phonetic hashing algorithm that allows Malayalam words in the dictionary to be searched by how they sound.

See this article for historical context on the project.

Getting started

Download the latest version of DictPress.
Read the docs to install the app and to import dictionary data.

Admin UI

Features

How it works

Getting started