ISO/DIS 1951
ISO/DIS 1951
ISO/DIS 1951: Presentation of Lexicographic Entries in General Language Dictionaries – Fundamentals and Recommendations

ISO/DIS 1951:2025(en)

ISO/TC 37/SC 2

Secretariat: SCC

Date: 2025-03-28

Presentation of lexicographic entries in general language dictionaries – Fundamentals and recommendations

© ISO 2025

All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of the requester.

ISO copyright office

CP 401 • Ch. de Blandonnet 8

CH-1214 Vernier, Geneva

Phone: +41 22 749 01 11

Email: copyright@iso.org

Website: www.iso.org

Published in Switzerland

Contents

Foreword 4

Introduction 5

1 Scope 1

2 Normative references 1

3 Terms and definitions 1

4 An overview of lexicographic components 5

5 Typographical conventions in printed and digital dictionaries 9

Annex A (informative) Structure of a lexicographic entry 12

Annex B (informative) Lexicographic symbols in printed and digital dictionaries 15

B.1 General information 15

B.2 Sources 15

B.3 Lexicographic symbols 16

Annex C (informative) Dictionary examples applying LMF modelling mechanisms 19

C.1 General information 19

C.2 Sources 19

Bibliography 21

Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

ISO draws attention to the possibility that the implementation of this document may involve the use of (a) patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a) patent(s) which may be required to implement this document. However, implementers are cautioned that this may not represent the latest information, which may be obtained from the patent database available at www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.

Any trade name used in this document is information given for the convenience of users and does not constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions related to conformity assessment, as well as information about ISO's adherence to the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.

This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee SC 2, Terminology workflow and language coding.

This fourth edition cancels and replaces the third edition (ISO 1951:2007), which has been technically revised.

The main changes are as follows:

— extending the scope;

— reviewing the entire content;

— changing the title, retaining the term ‘presentation’ because it is a fundamental aspect of this standard, while the term ‘representation’ has been removed and is now referring to the ISO 24613 series available on the ISO website;

— introducing the relationship between the generic structure and the presentation of lexicographic entries, using the LMF (Lexical Markup Framework) TEI serialization and integrating the TEI tagset as the reference for implementing the proposed model;

— reviewing and updating core lexicographic terms to align with the current state of the field, as well as introducing new terms.

Any feedback or questions on this document should be directed to the user’s national standards body. A complete listing of these bodies can be found at www.iso.org/members.html.

Introduction

The lexicographic landscape has undergone a profound transformation over the last few decades, primarily due to the definitive shift to digital platforms. Technological advances have played a pivotal role in shaping new strategies and directions: a significant number of lexicographic resources are currently accessible online, largely due to retro-digitization; the limitations imposed by print editions are no longer a concern; the integration of corpora has evolved into a widely recognized best practice; various dictionary writing systems have been developed to accommodate the changing landscape; and annotation schemes have markedly improved. In this digital age, the ongoing revolution demands the application of adapted standards and tools to ensure the availability of structured data and promote interoperability between systems, especially given the inherent heterogeneity in the dictionary-making process due to variations in nature, form, and content.

This revised document arose from the work within ISO working group ISO/TC 37/SC 2/WG 9, Terminology workflow and language coding. It aligns with ISO international standards ISO 24613-1:2024, ISO 24613-2:2020, ISO 24613-3:2021 and ISO 24613-4:2021 developed by ISO working group ISO/TC 37/SC 4/WG 4, focusing on modelling data representation in a variety of dictionary subtypes.

The intended audience for this document includes lexicographers as well as researchers and practitioners in the field of language resource management who work with lexicographic resources.

This document adopts a lexicographic lemma-oriented approach and focuses on general language dictionaries, whether monolingual, bilingual, or multilingual, which serve as valuable tools and references for broadening knowledge. Regarding representing lexicographic data, the relationship between the generic structure and the presentation of lexicographic entries is elucidated using LMF TEI serialization, integrating the TEI tagset as the reference for implementing the proposed model.

To develop a standard that establishes the model for the presentation of lexicographic entries in general language dictionaries, this document aims to 1) provide recommendations for addressing the variety of existing heterogeneous features and practices found in human-readable dictionaries, whether in print or digital format; 2) standardize the core concepts related to the presentation of various components in a lexicographic entry, as uniformity of terminology promotes consistency and data reusability; 3) reproduce the typographical conventions described in previous editions of ISO 1951.

This document includes examples from printed and retro-digitized dictionaries, those converted from an analogue (paper) or digital (e.g., PDF) medium into a computer-readable format. Born-digital dictionaries, created directly in machine-readable formats, are excluded.

In the running text of this document, the following notations are employed:

— terms designating concepts defined in this document are in italics;

— TEI P5 terms (element names, attribute names, attribute values, etc.) are presented in a fixed-width (monospace) font, as follows:

— individual element names are enclosed in angle brackets, e.g., <entry>;

— names of nested elements are represented in XPath notation, e.g., cit/quote/bibl;

— attribute names are indicated with an @sign preceding the name of the attribute, e.g., @type;

— attribute values are enclosed in double quotation marks (" "), e.g., "domain".

Presentation of lexicographic entries in general language dictionaries

1.0 Scope

This document specifies the presentation of lexicographic entries in general language dictionaries, whether monolingual, bilingual or multilingual, following a lexicographic lemma-oriented approach, and addressed for human end-users. Concerning the modelling of the underlying data, this document follows the ISO 24613 series.

The document provides recommendations to deal with the heterogeneous structures of data presentation in lexicographic entries, both in print and digital dictionaries. This document also establishes core concepts related to the broader scope of lexicographic work.

2.0 Normative references

The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

ISO 639 (all parts), Code for individual languages and language groups

ISO 1087, Terminology work and terminology science — Vocabulary

ISO 24613‑1, Language resource management — Lexical markup framework (LMF) — Part 1: Core model

ISO 24613‑2, Language resource management — Lexical markup framework (LMF) — Part 2: Machine-readable dictionary (MRD) model

ISO 24613‑3, Language resource management — Lexical markup framework (LMF) — Part 3: Etymological extension

ISO 24613‑4, Language resource management — Lexical markup framework (LMF) — Part 4: TEI serialization

ISO 21636‑1:2024, Language coding — A framework for language varieties — Part 1: Vocabulary

IETF BCP 47, Tags for Identifying Languages. (ed. A. Phillips; M. Davis). September 2009. Best Current Practice. URL: https://tools.ietf.org/html/bcp47

TEI P5, Guidelines for Electronic Text Encoding and Interchange. [Version number: 4.6.0]. [Last modified date: 2023-04-04]. TEI Consortium. http://www.tei-c.org/Guidelines/P5/

3.0 Terms and definitions

For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminology databases for use in standardization at the following addresses:

— ISO Online browsing platform: available at https://www.iso.org/obp

— IEC Electropedia: available at https://www.electropedia.org/

NOTE Terms and corresponding definitions related to lexicographic components and sub-components are listed

3.1

delimiter

separator

element used to separate different components of a lexicographic entry (3.11) or distinct entries within a dictionary (3.2)

Note 1 to entry: Delimiters help to organize information, making it easier for end-users to locate and understand the various components of a lexicographic entry.

EXAMPLE: The lemma delimiter used after a lemma (3.7), and the sense delimiter positioned before a new sense.

3.2

dictionary

<language resource management> lexicographic resource (3.13) that contains a structured collection of lexicographic entries (3.11)

Note 1 to entry: Dictionary can have a much broader meaning. The definition presented is restricted to the scope of this document.

3.3

machine-readable dictionary

MRD

electronic dictionary

computer-aided dictionary

computer-assisted dictionary

dictionary (3.2) designed to be processed and interpreted by software

Note 1 to entry: Unlike traditional dictionaries (3.2), which are intended for human use, MRDs are formatted in such a way that their contents can be efficiently accessed, manipulated, and utilized by software.

3.4

dictionary structure

structure containing a macrostructure (3.15), a microstructure (3.19) and a mediostructure (3.17)

3.5

general language

natural language (3.20) characterized by the use of linguistic means of expression independent of any specific domain

[SOURCE: ISO 1087:2019]

3.6

grammatical feature

property associated with a lexical unit (3.9) to describe one of its grammatical attributes

Note 1 to entry: Possible grammatical features include gender, number, and transitivity.

[SOURCE: ISO 24613-1:2024, 3.2, modified – lexical unit replaces word form; Note 1 to entry added, EXAMPLE removed]

3.7

headword

entry word

a lexicographic component (3.12) that serves as the main access point to a lexicographic entry (3.11)

Note 1 to entry: This term is included in Table 1.

3.8

lemma

lemmatized form

canonical form

base form

base word (deprecated term)

conventional representation of a lexical unit (3.9) chosen as the headword in a lexicographic resource (3.13) according to lexicographic conventions

Note 1 to entry: Conventions may vary between languages.

3.9

lexical unit

lexical item

meaningful lexical element within natural language (3.20)

Note 1 to entry: Although ‘lexeme’ is the term used in ISO 24613-1:2024, this document adopts the term ‘lexical unit’. This preference is based on its practical orientation, emphasizing a meaningful lexical item that is readily identifiable and applicable. This choice avoids confusion with the more abstract concept of ‘lexeme’, which is distinct from both lemma and lexical unit, as defined in ISO 24613-1:2024.

3.10

lexicographer

expert who compiles or edits a dictionary (3.2)

3.11

lexicographic entry

entry

main entry

lexicographic article

dictionary article

structured set of lexicographic components (3.12) that treat a headword (3.8) in a lexicographic resource (3.13)

3.12

lexicographic component

structural element of a dictionary entry (3.11)

Note 1 to entry: Lexicographic components can include but are not limited to headwords, definitions, examples, etymology, and usage notes.

3.13

lexicographic resource

collection of lexicographic entries (3.11)

Note 1 to entry: A lexicographic resource can be a collection of structured datasets that is human-readable as a dictionary and also can be processed as a machine-readable dictionary.

EXAMPLE: Printed dictionaries, CDs, databases.

3.14

lexicon

resource containing a collection of lexical units (3.9)

3.15

macrostructure

dictionary structure (3.4) comprising a data set with a list of lemmas (3.7)

3.16

marker

type of notation used in lexicographic entries (3.11) to provide metadata (3.18) about a lexical unit (3.9)

Note 1 to entry: Markers can indicate various aspects such as grammatical information and usage labels, helping users understand the proper use of a lexical unit. For example, in the lexicographic entry for the lexical unit ‘run’, a marker can indicate that it is a verb (v.), and another marker can label it as informal when used to mean ‘to manage’ (e.g., ‘run a business’).

3.17

mediostructure

cross-reference structure

dictionary structure (3.4) of cross-references between lexicographic entries (3.11) or their lexicographic components (3.12)

3.18

metadata

data that provides information about other data related to any element of a lexicographic resource (3.13)

3.19

microstructure

dictionary structure (3.4) of lexicographic components (3.12) within a lexicographic entry (3.11)

3.20

natural language

language that is or was in active use in a community of people, and the rules of which are mainly deduced from usage

[SOURCE: ISO 1087:2019]

3.21

orthography

systematic way of spelling or writing lexical units (3.9) that conforms to a conventionalized use

[SOURCE: ISO 24613-1:2024, 3.10, modified – The term lexemes has been changed to lexical units.]

3.22

sense component

structural sense element of a lexicographic entry (3.11)

3.23

subentry

nested entry

grouping structure for related lexicographic entries (3.11) that share a common headword

3.24

typographical convention

set of practices governing the visual presentation of lexicographic content as displayed or output

Note 1 to entry: These conventions encompass choices related to typography, such as font usage, font size, line spacing, margins, paragraph styles, text alignment, punctuation, symbols and other text design characteristics.

3.25

usage label

marker (3.16) that indicates a restricted use of a lexical unit (3.9)

Note 1 to entry: Usage labels address different dimensions of linguistic variation, such as space, time, social group, and situation (cf. ISO 21636-1:2024).

Note 2 to entry: General and specialized dictionaries employ a range of symbols and abbreviations as usage labels.

Example: Labels indicating currency or period (e.g., arch. for archaic), formality or register (e.g., inf. for informal), regionality or dialect (e.g., Am. for American, York. for Yorkshire), technicality or subject field (e.g., bot. for botanical), and textuality or genre (e.g., poet. for poetic).

4.0 An overview of lexicographic components

Table 1 describes various lexicographic components and sub-components typically found in lexicographic resources or annotation schemes.

Table 1 — Lexicographic components and sub-components

Preferred designation

Other designations

Definition

Example

antonym

cross-reference indicating a lexical unit with a meaning contrary to that of the lemma

NOTE: This component is generally preceded by a delimiter, e. g. OPP [opposite], ANT. [antonym] or ≠.

low OPP high [Longman]

desfavorable ANT.: favorable [DLE]

attitude label

marker that indicates the mood, positive or negative, which a speaker is wishing to convey via the use of a given lexical unit

nice Of a (finished) action, task, etc.: well-executed; commendably performed or accomplished. Now frequently in interjections, as nice going!, nice try!, nice work!. Also used ironically or sarcastically. [OED]

oik /ɔɪk/ noun [countable] British English informal not polite [Longman]

citation

quote

quotation

cited quotation

sense component that references a specific quote from written or spoken sources to illustrate the occurrence of a lexical unit

NOTE: A citation should be followed by bibliographic reference (its source).

porte-arquebuse Officier chargé de porter l'arquebuse (puis le fusil) du roi ou d'un grand seigneur, quand ils allaient à la chasse. Vous n'avez pas ici votre bon porte-arquebuse La Hurière (DUMAS père, Reine Margot, 1847, II, tabl. 5, 4, p. 70). [TLFi]

cross-reference

lexicographic component which provides a link or reference to another component within the lexicographic resource

NOTE: The reference may be internal to a dictionary or pointing to an external source.

empire See also imperial (EMPIRE) [Cambridge]

arthritis [...] SEE ALSO osteoarthritis, rheumatoid arthritis [Oxford Advanced Learner’s Dictionary]

ozonosfera Tb. ozonósfera, Am. 1. f. Meteor. capa de ozono. [DLE]

dating

lexicographic component indicating the date of the estimated first recorded use of a lexical unit

pill

Verb (1) 12th century, in the meaning defined at intransitive sense

Noun 14th century, in the meaning defined at sense 1a

Verb (2) 1736, in the meaning defined at transitive sense 1 [Merriam-Webster]

domain label

field label

subject field label

topic label (deprecated term)

marker which identifies the specialized field of knowledge in which a lexical unit is mainly used

RHÉT (for “rhétorique” in ALLITÉRATION [Petit Robert 2017]

astronomy (for “dark star”) [Oxford Advanced Learner’s Dictionary]

etymology

lexicographic component which contains information about the origin of a lexical unit and its historical development

NOTE: This information can include etymons, roots, cognates, etc.

tuer milieu XIIe s. <> du latin populaire tutare “éteindre” et “tuer”, de tutari “protéger”, par l’intermédiaire d’expressions comme tutari famen, sitim “calmer la faim, la soif”, donc “protéger, garantir de la faim, la soif ” [Petit Robert 2017]

example

usage example

sense component that includes a text string to illustrate the occurrence of a lexical unit

corriger […] corriger la vue de qqn par des verres de contact. [Petit Robert 2017]

punish Smacking is not an acceptable way of punishing a child [Longman]

form

lexical form

word form

instantiation of a lexical unit in a textual context

color noun (US English) (British English colour) [Oxford Learner’s Dictionaries]

frequency label

marker which identifies the relative rate of occurrence of a lexical unit in a given context

(also less frequent flyer) (for “flying start”) [Oxford Advanced Learner’s Dictionary]

geographic label

region label

marker which identifies the place or region where a lexical unit is mainly used

NOTE: Some dictionaries do not identify a specific place but identify that the lexical unit is not used generally in every geographic area.

RÉGION. (Sud-Ouest; Canada) (for “chocolatine”) [Petit Robert 2017]

vacation US (UK holiday) [Cambridge Dictionary]

plonker British English informal not polite [Longman]

gender

grammatical feature assigned to a lexical unit which refers to a noun class system in certain languages

Elefant m.

Ladung f.

Eis n.

[Langenscheidt Taschenwörterbuch] (Deutsch als Fremdsprache)

gloss

any descriptive or explanatory note within a lexicographic entry

Note: Glosses can include short comments, remarks.

elucidation [i.lu:si'deiʃən] N (of text) Erklärung f; (of issue, situation) Erhellung f; (of point) nähere Ausführung; (of mystery) Aufklärung f, Aufhellung f [Collins]

headword

entry

entry word (deprecated/obsolete term)

lexicographic component that serves as the main access point to a lexicographic entry

astrónomo, ma [DLE]

inflected form

modified form of a base or root of the headword that conveys specific grammatical information, such as tense, number, gender, case, mood, etc.

make (meɪk) Word forms: makes, making, made [Collins]

lexicographic definition

definition

sense component that describes the meaning of a lexical unit by referencing a generic term (genus proximum) and at least one distinguishing characteristic (differentia specifica)

boxeo m Deporte en que dos luchadores se golpean con los puños utilizando guantes especiales. [DEA]

meaning type label

marker which identifies a semantic extension of the sense of a given lexical unit

PRINTEMPS Fig. Temps de la jeunesse. [DAF]

[grammatical] number

grammatical feature that indicates the quantity or grammatical plurality, duality and singularity of/within a morphological variant of a lexical unit

ŒIL [œj], plur. YEUX [jø] [Petit Robert 2017]

normativity label

marker which identifies the use of a given lexical unit which is in some aspect considered to be non-standard or incorrect

círculo […] [uso indevido mas generalizado] GEOMETRIA circunferência [Infopédia]

note

complementary information related to any lexicographic component

escolasticídio […] Palavra cunhada por Kelma Nabulsi, jurista palestina e académica de Oxford, em 2009. [DLP]

part of speech

lexical category

word class

lexicographic component assigned to a lemma based on its morpho-syntactic properties

NOTE: In some dictionaries gender, number, transitivity among others are part of speech elements.

[SOURCE: ISO 24613-1:2024]

PÉRÉGRINATION nom féminin [DAF]

pronunciation

lexicographic component which represents the phonetic form by which a lexical unit is articulated

ventura [vẽˈturɐ] Infopédia

sense

lexicographic component which groups all information relating to one or several meanings conveyed by a headword

NOTE: The information is grouped into distinct sense components, such as definition, citations, examples, synonyms.

ONIRIQUE [ɔnirik] adj. – 1895 <graphic></graphic> du grec oneiros « rêve » <graphic></graphic> 1 didact. Relatif aux rêves. Images, scènes, visions de l’état onirique. <graphic></graphic>2 littér. Qui évoque un rêve, semble sorti d’un rêve. Atmosphère, décor onirique de certaines œuvres surréalistes. [Petit Robert 2017]

sense number

delimiter which distinguishes different meanings or senses of a lexical unit associated with a headword

virus 1 a very small living thing that causes infectious illnesses […] 2 a set of instructions secretly put onto a computer or computer program, which can destroy information. […] 3 a program that sends a large number of annoying messages to many people’s mobile phones in an uncontrolled way [Longman]

sociocultural label

register

marker which identifies different dimensions of linguistic variation referring to the type of situation, particularly different degrees of formality, and also social group usage

Note1: Register covers the two different defined dimension of linguistic variation in ISO 21636-1:2024, ‘situation dimension’ and ‘social group dimension’.

Note2: Register may include information related to sociocultural label, diaphasic or diastratic information.

dodgy (British English, informal) [Oxford Learner’s Dictionaries]

synonym

cross-reference indicating a lexical unit with a meaning identical or similar to that of the lemma

NOTE: This component is generally preceded by a delimiter, e. g. SYN or =.

exquisito SIN.: delicioso, sabroso, suculento, gustoso, apetitoso [DLE]

temporal label

time label

marker which identifies the use of a given lexical unit on a scale from old to new

bothersome adjective (old-fashioned) [Oxford Learner’s Dictionaries]

agradamiento m. desus. agrado [DLE]

text type label

diatextual information

marker which identifies the typical use of a lexical unit in a particular discourse type or genre

ósculo nome masculino 1. POÉTICO beijo [DLP]

aflame adjective [not before noun] (literary) [Oxford Learner’s Dictionaries]

variant

lexicographic component that represents one of the alternative forms of a lemma

NOTE: spelling variation, orthographic variation, regional variation, etc.

standardize (also standardise British English) [Longman]

5.0 Typographical conventions

Printed and digital dictionaries extensively employ typographical conventions to delineate and clarify the relationships among various lexicographic components. These conventions include a spectrum of typographic choices, including but not limited to font type, font size, line spacing, margin settings, alignment, stylistic formatting (such as normal text, boldface, italics), and the use of punctuation marks. The application of these typographical conventions is pivotal in ensuring the uniformity and legibility of dictionaries. It is essential that each lexicographic component is presented in a manner that allows end-users to clearly identify and understand its intended structural and semantic significance, thereby distinguishing between different types of information with ease.

The role of typographical conventions in lexicography is both foundational and transformative, significantly influencing the manner in which lexicographic content is conveyed and interpreted. Table 2 enumerates a selection of these prevalent typographical practices employed across various language dictionaries, showcasing the diversity and functionality of typography in enhancing the end-user’s navigational and interpretative experience. For a comprehensive overview of the symbols and notations commonly adopted in both printed and digital dictionaries, see Annex B, which provides further insights into the symbolic lexicon integral to lexicographic works.

Table 2 — Typographical conventions generally adopted in dictionaries

Format

Description

Boldface

Boldface is usually used for lemmas and other lexical units within a lexicographic entry, such as compounds, phrasal verbs, to aid end-users in quickly locating them.

Italics

Italics is usually used for usage examples, Latin units and loanwords, providing a clear distinction from the main text.

Lightface

Lightface is usually used for pronunciation guides and notes, ensuring they are distinguishable yet not overly prominent.

Highlighting / Colour coding

Highlighting, through varied background colours or font styles, is usually used to emphasize new or significant lexical units, as well as cautionary notes, thereby enhancing readability and navigation.

Abbreviations

Abbreviations, a longstanding convention in both printed and digital dictionaries, require consistent use and a comprehensive list of expansions to ensure clarity and user-friendliness.

Numbering

Numbering aids in the logical organization and easy reference of different senses within a lexicographic entry.

Superscript number

A superscript number following lexical units indicates that these are homographic/homonymic lexical units.

Icons

The icons can provide a visually intuitive way to indicate additional features or content types within a lexicographic entry.

Illustrations

Non-verbal representations, such as images or diagrams, are used to visually depict concepts or provide additional context to the lexicographic content.

Hyperlinks

A hyperlink is a clickable element in a document or webpage that takes the user to another location, such as a different page or document. In digital dictionaries, hyperlinks are usually indicated by underlined text and distinct colours. They should adhere to accessibility standards to ensure usability for all users.

Bullet points

Bullet points are used to organize information clearly, especially for sub-definitions, examples, or related terms, indentation and bullet points can be used to create a visual enumeration of information elements.

Marginalia / sidebar boxes

Marginalia can provide supplementary information, notes, icons, references, sidebars or info boxes can provide additional content without interrupting the reading of a full lexicographic entry.

Charts

Extra information like verb conjugations or phonetic charts can be made interactive, allowing users to explore detailed information dynamically.

Interactive elements

In digital dictionaries, certain text elements or symbols are interactive in that, when clicked or tapped, they trigger actions such as playing audio pronunciations, displaying translations, or revealing additional information.

Tooltip

In digital dictionaries, hovering over a word or symbol can display a tooltip—a small pop-up box with additional information, definitions, or usage tips.


  1. (informative)

    Structure of a lexicographic entry

Table A.1 outlines the data model of lexicographic entries.

Table A.1 — Descriptors, encoding and respective output

Descriptor (field designation)

LMF component (XPath)

Typical realisation/XML representation

Recommended output (harmonising the way of doing)

antonym

sense/xr

<xr type="antonymy"/>

Antonym with special formatting

attitude label

sense/label

<usg type="attitude"/>

Attitude label with special formatting

citation

sense/cit

<cit><quote>$TEXT</quote><bibl>$TEXT</bibl></cit>

Citation in quotes followed by reference in brackets

cross-reference

sense/xr

<xr type="reference"/>

In the running text

dating

<date>$TEXT</date>

Date with special formatting

lexicographic entry

/LexicalEntry/entry

<entry>$MIXEDCONTENT</entry>

New paragraph per entry

domain label

sense/label

<usg type="domain"/>

Domain label with special formatting

etymology

etym

<etym>$TEXT</etym>

example

sense/cit[@type="example"]

<cit type="example">$TEXT</cit>

Example preceded by a blank line

frequency label

sense/label

<usg type="frequency"/>

Frequency label with special formatting

geographic label

sense/label

<usg type="geographic"/>

Geographic label with special formatting

gender

gramGrp/gram

<gram type="gen">$TEXT</gram>

m (masculine) f (feminine) n (neuter)

Gender abbreviation for printed editions

gloss

sense/gloss

<gloss>$TEXT</gloss>

Gloss in italics

headword

form[@type="lemma"]

<form type="lemma">$TEXT</form>

Headword in bold

inflected form

form[@type="inflected"]

<form type="inflected">$TEXT</form>

Inflected form in regular font

lemma

/OrthographicRepresentation/

form[@type="lemma"]/orth

<form type="lemma"><orth>$TEXT</orth></form>

Lemma in bold

lexicographic definition

sense/def

<def>$TEXT</def>

Definition in regular font

meaning type label

sense/label

<usg type="meaningType"/>

Meaning type label with special formatting

[grammatical] number

gramGrp/gram

<gram type="num">$TEXT</gram>

sg (singular) pl (plural)Number abbreviation for printed editions

[entry] number

<entry n="$NUMBER">$TEXT</entry>

Following the lemma

normativity label

sense/label

<usg type="normativity"/>

Normativity label with special formatting

note

note

<note>$TEXT</note>

Note in regular font

orthographic form

form/orth

<orth>$TEXT</orth>

Orthographic form in regular font

part of speech

gramGrp/gram

<gram type="pos">$TEXT</gram>

Part of speech in italics

pronunciation

/OrthographicRepresentation/

form/pron

<pron notation="[notation]">$TEXT</pron>

Pronunciation in square brackets

sense number

sense

<sense n="[number]"/>

Sense number using numbers

sociocultural label

sense/label

<usg type="sociocultural"/>

Sociocultural label with special formatting

synonym

sense/xr

<xr type="synonymy"/>

Synonym with special formatting

temporal label

sense/label

<usg type="temporal"/>

Temporal label with special formatting

text type label

sense/label

<usg type="textType"/>

Text type label with special formatting

variant

form[@type="variant"]

<form type="variant"><orth>$TEXT</orth></form>

Variant in regular font


  1. (informative)

    Lexicographic symbols
    1. General information

Table B.1 outlines typographic conventions sourced from dictionaries in the ISO official languages: English, French, and Russian. It aims to present a cohesive style found in lexicographic resources, potentially guiding the harmonization of typographic conventions across various languages and formats.

NOTE 1 The symbols enumerated in this annex were selectively sourced from representative mono- and bilingual dictionaries of the respective countries. It is important to note that the list is not exhaustive and serves as an illustrative guide to commonly used symbols.

NOTE 2 In compliance with ISO standards, language and country codes used herein adhere to ISO 639 and ISO 3166, respectively. The abbreviations GB, FR, and RU correspond to the country codes as per ISO 3166-1:2020, representing United Kingdom, France, and Russia, respectively.

    1. Sources

Academia das Ciências de Lisboa. Dicionário da Língua Portuguesa. Retrieved July 17, 2024, from https://dicionario.acad-ciencias.pt

Cambridge Dictionary online. Retrieved July 17, 2024, from https://dictionary.cambridge.org/

Collins English Dictionary. Retrieved July 17, 2024, from https://www.collinsdictionary.com/

Collins English-German Dictionary. Retrieved July 17, 2024, from https://www.collinsdictionary.com/

Diccionario Básico de la Lengua Española, 2014

Diccionario del español atual, Manuel Seco, Olímpia Andrés y Gabino Ramos. Diccionario BBVA. Retrieved July 17, 2024, from https://www.fbbva.es/diccionario/

Dictionnaire de l'Académie française. Retrieved July 17, 2024, from https://www.dictionnaire-academie.fr/

German English Dictionary

Infopédia. Dicionários da Língua Portuguesa. Retrieved July 17, 2024, from https://www.infopedia.pt/dicionarios/lingua-portuguesa

Le Petit Robert de la langue française 2017

Longman Dictionary of Contemporary English online. Retrieved July 17, 2024, from https://www.ldoceonline.com/

Merriam-Webster.com dictionary. Retrieved July 17, 2024, from https://www.merriam-webster.com/

Oxford Advanced Learners Dictionary (printed edition)

Oxford English Dictionary online. Retrieved July 17, 2024, from https://www.oed.com/

Oxford Learner's Dictionaries. Retrieved July 17, 2024, from https://www.oxfordlearnersdictionaries.com/

Real Academia Española. Diccionario de la lengua española (23rd ed.). Retrieved July 17, 2024, from https://dle.rae.es

TLFi online. Retrieved July 17, 2024, from http://atilf.atilf.fr/antonyme

    1. Lexicographic symbols

Table B.1 — Lexicographic symbols

Symbol

Designation

Unicode

Function

Position

Specific usage

Almost/approximately equal

U+2248

U+2243

U+2252

Indicates approximate equivalence or similarity in meaning.

preceding a lexical unit

luzidio

DLPC 2001

'

Apostrophe

U+0027

Indicates a gloss or equivalent of a form.

between a lexical unit

senescente

DLPC 2001

Arrow

Points to cross-references or, in etymology, presents a related lexical unit.

preceding a lexical unit

MOB Robert 2017

*

Asterisk

U+002A

Marks reconstructed, unattested forms, or forms not found in corpus data.

preceding a lexical unit

adaga

DLPC 2001

Black square

U+25A0

Is used to separate different

Preceding

ONIRIQUE

Black diamond

U+2666

U+25CA

senses within a lexicographic entry.

different

components

Robert

2017

Lozenge

Separates the date of appearance of the word in the source language from its origin.

Dagger (U+2020)

U+2020

Indicates obsolete lexical units or historical usage no longer in active use.

preceding a lexical unit

taciturnous OED

°

Degree sign

U+00B0

Characterizes the lexical unit as an internationally harmonized scientific-technical term.

preceding a lexical unit

=

Equals sign

U+003D

Indicates that the lexical unit is an equivalent or synonym.

preceding a lexical unit

DNA = Deoxyribonucleic acid

!

Exclamation mark

U+0021

Indicates that the lexical unit has been coined by means of translation.

preceding a lexical unit

>

Greater-than sign

U+003E

Indicates that the form/word following the symbol comes from the form/word preceding the symbol in etymological components.

<

Less-than sign

003C

Indicates the historical derivation of a lexical unit. It suggests that the lexical unit or form preceding the symbol is derived from the lexical unit or form following it.

after a lexical unit

vacation < Old French (also modern French) vacation OED

×

Multiplication sign

U+00D7

Indicates that exists an overlap.

Not equal

U+2260

Indicates antonymy or significant difference in meaning.

preceding a lexical unit

+

Plus sign

U+002B

Can be used to show compound word formation. For example, note + book = notebook.

preceding a lexical unit

notebook OED

§

Section sign

U+00A7

Indicates that the designation is legally protected.

preceding

®

Trademark

Indicates that a lexical unit also represents a trademark.

after a lexical unit

Pladur®

DLP 2024

~

Tilde

U+007E

Replaces the lemma or a specific lexical unit throughout a lexicographic entry or part of an entry.

instead of a lexical unit

desacuerdo Presença/Langenscheidt

Parallel symbol or double vertical line

Indicates different examples.

at the end of the first example and before beginning the next example

persian dictionary and arabic dictionary

;

Distinguishes different pronunciations, subsenses, synonyms, and other lexicographic components.

at the end of the first pronunciation and before the second one

oxford dictionary

$

Shows the american pronunciation.

...

Indicates, in citation, that part of the text has been omitted.

persian dictionary and arabic dictionary

:

Indicates the beginning of the definitions.

webster dictionary

Indicates a division of parts of speech.

oxford dictionary

Indicates cross reference.

Oxford Dictionary Persian Dictionary

/

Separates different pronunciations.

( )

Parentheses

Enclose additional information, clarifications, or contextual details that complement the main text.

[ ]

Square brackets

Enclose phonetic transcriptions, providing a standardized method for representing pronunciation.

Also for some complementary information, such as material related to etymology.

Persian Dictionary

< >

Angle brackets

Enclose lexical units in discussions of etymology, particularly to indicate a lexical unit or form in an older language from which the current word is derived.

Sometimes used to narrow down the domain.

/ /

Slashes

Enclose pronunciation or phonemic transcriptions to indicate the representation of sounds.


  1. (informative)

    Dictionary examples applying LMF modelling mechanisms
    1. General information

The examples have been selected for illustrative purposes and are specific to the language(s) in question. Real lexicographic examples in this document demonstrate the application of LMF serialization and their associated presentations. They do not imply any responsibility on the part of the publishers.

NOTE 1 The standard sets guidelines that can affect the online display of dictionary content, potentially using technologies like XSLT, CSS, etc.

NOTE 2 Examples of creating XSLT and CSS for rendering lexicographic entries encoded according to the TEI/TEI Lex-0 guidelines in a browser can be found on GitHub. This approach offers practical benefits, such as avoiding ISO intellectual property restrictions and facilitating easier maintenance of the resource. A practical implementation is available on GitHub at:

https://github.com/anacastrosalgado/lexicalresources/tree/master/Schemas/ISO1951

    1. Sources

Academia das Ciências de Lisboa. Dicionário da Língua Portuguesa. Retrieved July 17, 2024, from https://dicionario.acad-ciencias.pt

Cambridge Dictionary online. Retrieved July 17, 2024, from https://dictionary.cambridge.org/

Collins English Dictionary. Retrieved July 17, 2024, from https://www.collinsdictionary.com/

Collins English-German Dictionary. Retrieved July 17, 2024, from https://www.collinsdictionary.com/

Diccionario del español atual, Manuel Seco, Olímpia Andrés y Gabino Ramos. Diccionario BBVA. Retrieved July 17, 2024, from https://www.fbbva.es/diccionario/

Dicionário Espanhol-Português/Português-Espanhol, Presença/Langenscheidt, 2000 (printed edition)

Dictionnaire de l'Académie française. Retrieved July 17, 2024, from https://www.dictionnaire-academie.fr/

German English Dictionary

Infopédia. Dicionários da Língua Portuguesa. Retrieved July 17, 2024, from https://www.infopedia.pt/dicionarios/lingua-portuguesa

Le Petit Robert de la langue française 2017

Longman Dictionary of Contemporary English online. Retrieved July 17, 2024, from https://www.ldoceonline.com/

Merriam-Webster.com dictionary. Retrieved July 17, 2024, from https://www.merriam-webster.com/

Oxford Advanced Learners Dictionary (printed edition)

Oxford English Dictionary online. Retrieved July 17, 2024, from https://www.oed.com/

Oxford Learner's Dictionaries. Retrieved July 17, 2024, from https://www.oxfordlearnersdictionaries.com/

Real Academia Española. Diccionario de la lengua española (23rd ed.). Retrieved July 17, 2024, from https://dle.rae.es

TLFi online. Retrieved July 17, 2024, from http://atilf.atilf.fr/

Ozhegov’s is a general-purpose explanatory dictionary of Russian and the other one (Denisov’s) is a 'dictionary of collocability’. Here are the bibliographic entries for these two:

S.I. Ozhegov, N. Yu. Shvedova “Tol’kovyi slovar’ russkogo yazyka”, Mockva, 1998

P.N. Denisov, V.V. Morkovkin (eds.), “Slovar’ sochetaemosti slov russkogo yazyka”, Moskva, Astrel’, AST, 2005

Bibliography

[1] Consortium T.E.I., ed. TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium. http://www.tei-c.org/Guidelines/P5/ ([Version number and dates to be completed when finalising the document]).

[2] Tasovac T., Romary L., Banski P., Bowers J., de Does J., Depuydt K. et al. 2018. TEI Lex-0: A baseline encoding for lexicographic data. DARIAH Working Group on Lexical Resources. https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html ([Version number and dates to be completed when finalizing the document]).

[3] BCP 47 Tags for Identifying Languages. A. Phillips; M. Davis. IETF. September 2009. IETF Best Current Practice. URL: https://tools.ietf.org/html/bcp47

[4] IEFT BCP 47, Tags for Identifying Languages (ed. A. Phillips, M. Davis). September 2009. Best Current Practice. https://tools.ietf.org/html/bcp47

[5] Romary, L., & Wegstein, W. (2012), Consistent modelling of heterogeneous lexical structures. Journal of the Text Encoding Initiative, 3. doi:10.4000/jtei.540.

[6] Costa, C., Roche, C., and Salgado, A. (2022). Standards for Representing Lexicographic Data: An Overview. Version 1.0.0. DARIAH-Campus. [Training module]. https://elexis.humanistika.org/id/REhOykBU7pPs5zOAENdah

[7] Salgado, A., Costa, R., & Tasovac, T. (2019). Improving the consistency of usage labelling in dictionaries with TEI Lex-0. Lexicography: Journal of ASIALEX, 6(2), 133–156. doi:10.1007/s40607-019-00061-x.

espa-banner