Skip to content

dictpress comes with a built in CSV to database importer tool. Once dictionary data has been organised into the below described structure, import it by running ./dictpress import --file=yourfile.csv.

Entries with the same content in the same language are not inserted into the database multiple times, but are instead re-used. For instance, if there are multiple Apple (English) entries, it is inserted once but re-used in multiple relations.

Sample CSV format

-,A,Apple,english,Optional note,default:english,"",optional-tag1|tag2,"ˈæp.əl|aapl","","{""etym"": ""ml""}"
^,"","round, red or yellow, edible fruit of a small tree",english,"","","","","",noun,""
^,"","the tree, cultivated in most temperate regions.",english,"","","","","",noun,""
^,"","il pomo.",italian,"","","","","",sost,""
-,A,Application,english,Optional note,default:english,"","","aplɪˈkeɪʃ(ə)n","",""
^,"","the act of putting to a special use or purpose",english,"","","","","",noun,""
^,"","le applicazione",italian,"","","","","",sost,""

Every line in the CSV file contains an entry in a given language described in 10 columns. Each entry is either a main entry in the dictionary, or a definition of another entry. This is indicated by the first column in each line. - represents a main entry and all subsequent entries below it marked with ^ represents its definitions in one or more languages.

The above example shows two main English entries, "Apple" and "Application" with multiple English and Italian definitions below them.

CSV fields

Column Field
0 type - represents a main entry. ^ under it represents a definition entry.
1 initial The uppercase first character of the entry. Eg: A for Apple. If left empty, it is automatically picked up.
2 content The entry content (word or phrase).
3 language Language of the entry (as defined in the config). Eg: english
4 notes Optional notes describing the entry.
5 tokenizer Empty OR default:$language (eg: default:english) from the list of supported languages OR a custom Lua tokenizer script that's loaded into dictpress eg: lua:indicphone_ml.lua. To supply custom, externally computed tokens, leave this empty and specify the tokens in the next field.
6 tokens If not specifying a tokenizer above, space-separated fulltext search tokens.`.
7 tags Optional tags describing the entry. Separate multiple tags by the pipe character |
8 phones Optional phonetic notations representing the pronunciations of the entry. Separate multiple phones by the pipe character |
9 definition-types This should only be set for definition entries that ar marked with Type = ^. One or more parts-of-speech types separated by the pipe character | . Examplenoun:verb`.
10 meta Optional JSON metadata. Quotes inside JSON are escaped by doubling them. Eg: {"etym": "ml"} => {""etym"": ""ml""}

Importing with SQL

Generating SQL for dictionary data and loading that directly into the database can give fine grained control The following is the SQL equivalent of the above CSV. The SQLite database tables schemas are described here.

-- Insert head words apple, application (id=1, 2)
INSERT INTO entries (lang, content, initial, tokens, phones) VALUES
    ('english', 'Apple', 'A', 'appl', '["/ˈæp.əl/", "aapl"]'),
    ('english', 'Application', 'A', 'applicat', '["/aplɪˈkeɪʃ(ə)n/"]');

-- Insert English definitions for apple. (id=3, 4, 5)
INSERT INTO entries (lang, content) VALUES
    ('english', 'round, red or yellow, edible fruit of a small tree'),
    ('english', 'the tree, cultivated in most temperate regions.'),
    ('english', 'anything resembling an apple in size and shape, as a ball, especially a baseball.');
-- Insert English apple-definition relationships.
INSERT INTO relations (from_id, to_id, types, weight) VALUES
    (1, 3, '["noun"]', 0),
    (1, 4, '["noun"]', 1),
    (1, 5, '["noun"]', 2);

-- Insert Italian definitions for apple. (id=6, 7)
INSERT INTO entries (lang, content) VALUES
    ('italian', 'mela'),
    ('italian', 'il pomo.');
-- Insert Italian apple-definition relationships.
INSERT INTO relations (from_id, to_id, types, weight) VALUES
    (1, 6, '["noun"]', 0),
    (1, 7, '["noun"]', 1);


--
-- Insert English definitions for application. (id=8, 9)
INSERT INTO entries (lang, content) VALUES
    ('english', 'the act of putting to a special use or purpose'),
    ('english', 'the act of requesting.');
-- Insert English application-definition relationships.
INSERT INTO relations (from_id, to_id, types, weight) VALUES
    (2, 3, '["noun"]', 8),
    (2, 4, '["noun"]', 9);

-- Insert Italian definitions for application. (id=10, 11, 12)
INSERT INTO entries (lang, content) VALUES
    ('italian', 'le applicazione'),
    ('italian', 'la domanda'),
    ('italian', 'la richiesta');
-- Insert Italian application-definition relationships.
INSERT INTO relations (from_id, to_id, types, weight) VALUES
    (2, 10, '["noun"]', 0),
    (2, 11, '["noun"]', 1),
    (2, 12, '["noun"]', 1);