Keyboard Support

Contact and Search

Keyman.com Homepage

Header bottom

Keyman.com

Other versions
Version 18.0Version 17.0 (current version)Version 16.0Version 15.0Version 14.0Version 13.0Version 12.0

On this page

TSV files

Used by:
Keyman Developer and Lexical Model compiler.
Description:
A .tsv file or a tab-separated values file contains a word list. This word list is used to predict and correct words using the predictive text functionality.
Details:
A .tsv file is a plain-text file containing of tabular data. Spreadsheet programs such as Microsoft Excel and Google Sheets can export into TSV format. TSVs can also be programmatically generated from other data sources. For advanced users, see File Format for more details.
Distributed with lexical model:
No. This is a development file and should not be distributed.

File format

The lexical model compiler expects word lists to abide by the following tab-separated values (TSV) format:

  • the file is a UTF-8 encoded text file
  • newlines are either LF or CRLF
  • the file MAY start with the UTF-8 byte-order mark (BOM); that is, if the first three bytes of the file are EF BB BF, these will be interpreted as the BOM and will be ignored.
  • the file either consists of a comment or an entry
  • comment lines MUST start with the # character on the very first column
  • entries are one to three columns, separated by the (horizontal) tab character (U+0009)
  • column 1 (REQUIRED): the word form: can have any character except tab, CR, or LF. Surrounding whitespace characters are trimmed. Quote characters (' or ") are NOT required to surround the text and are NOT parsed in any special manner.
  • column 2 (optional): the count: a non-negative integer specifying how many times this entry has appeared in the corpus. Blank means ‘indeterminate’, and is treated as if the word exists in the corpus, but will be predicted at the lowest possible priority.
  • column 3 (optional): comment: an informative comment, ignored by this tool.

Additional notes

Exporting a spreadsheet from Google Sheets as a TSV will produce properly formatted output.