A global lexicon and grammar?

  • To write a document once for usage worldwide,
    to help represent, store, retrieve, compare, facili-tate knowledge exchange as a new Information Model, as a mind-mapping and interoperable Knowledge Management platform
  • To help compose domain maps, promoting knowledge discovery, to reference elements of the Internet Of Things, to help unify data world-wide, even across measurement systems
  • To improve machine translation, to compose a global open library of business objects and processes, to use as program variables and database objects, tTo use as atoms in predicates in rule-based systems, to bypass RDF, to help compose international ontologies, to facilitate ontology-matching
  • To help build model-driven architectures, programs and systems
  • Linked to contexts, to offer a complete map  of synonyms and categories, clusters of concepts in context within same and across languages, and a global dictionary, even acronyms

A child's dream!

At first glance, conceiving a representation of everything in any language seems to be just that! As there are many ways to 'say the same thing', from using a single word to metonymy, metaphor, and the rest! You may for instance end a sentence with a dot, a period, a point, 'express the same proposition' or 'make the same statement', use 'parcel' for 'package', 'delete' for 'remove', 'at the end of the day' for 'finally', 'make out' for 'kiss'! Or, should we 'take it away' or 'walk it off'? From the surface, you may think: "What a mess!"
But it happens that humans crafted a finite number of ways to arrange a finite (but growing) set of words so they can communicate. This tacit convention within people using a given language allows them to convey meaning.
Or, to play crosswords!
A finite number of ways... to express the infinity of possi-ble sentences? Yes! Of course, some language may create a word for a verb, while another uses two, e.g. a verb and a preposition as in: (monter, in French vs go up in English or up go in Mandarin: '上去'... Find out how to achieve this goal

Or, it uses an expression instead of a single noun e.g. 'a piece of furniture' for 'un meuble'. Or, declensions, inflections to help express number or case. Or, it will have its ways to express time and spatial relations. But again, we are dealing with a finite number of ways and elements to express a potentially infinite universe of sentences.

Are words cheating us?

In "Take your shoes off', 'Take..off' has a different meaning than in: "The plane takes off at 6:42". These are two different concepts.
Also, in "since you're here, help me carry the piano", 'since' has a different meaning than in 'I knew her since 2006', where the former is a preposition, whereas the latter is a conjunction.

Now, polysemy within the same lexical class (see abovementioned) adds to the complexity: a 'drive' in "Doncaster drive" is an alley, but in 'she had the drive to continue her studies', means 'an urge to attain a goal'.

Not Related: HL7 and Medical Interoperability.

I think our problem space is complete

From an IT viewpoint, we can distinguish a finite number of elements, and classify them:

  • A set of words, expressions to be turned in to a set of concepts, hopefully n..1
  • Each concept can be grouped by the lexical class it impersonates (phrasing may be challenged)
  • Corresponding concept as lemma, carrying lexical class as metadata and some first order semantics, followed by stemming preserving the lemma as main concept, forming a unique abstraction.
  • This abstraction acting as a key, refers to collections of viewpoints, renderings (utterances or characters), definitions or formulas, to collections of revisions and collections of authors

It also refers to contexts as collection of first direct pa-rent(s) as abstractions, addressing inheritance, and to collection of other types of relations, fit to specify any application. Application, defining a container, its objects, and the relations that exist within, eventually links to outer world, is ontology. Note that every component, interlin-ked via abovementioned collections, is fully semantic, read 'linked to a world of synonyms and superclasses', facilitating knowledge discovery.

Abstraction as a key

Using abstractions in place of words implies that to every new word created, an abstraction must be created too, and added to the worldwide lexicon that results from this effort, via a gigantic web service.
For instance, Honda must register their new '2016 LX Honda Civic Blue 1.7L' so everyone worldwide refers to this model using corresponding abstraction. Updating the lexicon. This is the price to pay for being globally semantic.

Write Once, Use Everywhere...

An abstraction-based language-neutral represen-tation of digital assets eliminates multiple names for the same business object or process, a costly ailment common to IT departments in most corporations or federal agencies. Ultimately, applications may use FirstName, FIRST_NAME, or Nombre, but in the code, their abstraction, the same interoperable business object", acts as their proxy.  CMS Wire
A word such as 'Employee' is stored as a sequence of letters E, m, p, l, o, y, e and e, which lets a human, English speaker, understand that it is related to an employer, an office, a mailing address or a salary.

For a computer, it is no more but this specific sequence of letters. Abstractions let computers 'understand' this word, and relate it to a nexus of concepts, going beyond RDF 1.1.

A huge task!

An individual can start up a pro-ject by himself. But if s/he succ-eeds in gathering people around it, and if there is a need, for such a project, even yet unmanifested, they may open-source it and multiply resources.
In such a unified world, there are no such things as 'a foreign language'. One can always* find a term, an expression or a formula to define 'something' in a similar, or very close way, making them 'true' synonyms - cognates. I hear critics of this assertion...
'Synonym' here refers to all words written in all possible alphabets or formulations. The character meaning mountain, is pronounced 'yama' in Japanese, and 'shan' in Chinese Mandarin, simplified or traditional.
*Some languages grant various hues of meaning to the same word or expression, but anyway, one can associate 'something' to this meaning, such as a description.
For instance, in Inu language (Inuktitut), there are dozens of 'words' to refer to snow and ice. Fifty-two to be precise. To any word or expression, the 'something' that I mentioned above associates an abstraction, which will serve as an umbrella for all synonyms, single vocable or expression, across other languages, to access one or more definitions (viewpoints).
Distance may be described as 'speed x time', a formula, or by a literal definition. These are two viewpoints.

A Global Representation Framework

Lexikl (pronounce "Lexical") is an extensible artificial lan-guage, helping form a mesh of correlated concepts, symbols, predicates involved in the genesis and evolution of thoughts. Its 'words' are conceptual or functional units linked to one or more words.

Its goal is to help create complex information structures, understand*, translate, represent, store and disseminate information, and offer a single representation of any data, proposition/predicate, rule, clause or sentence across natural languages and measurement systems, for usage wordwide, implementing the “Write once, use everywhere” (WOUE) paradigm to achieve Global Interoperability. It is opened source to serve as a Knowledge Management Platform.

*First line of semantics, some idiomaticity, no metaphors / metonymy / synecdoches-tropes yet.

Lexikl is very compact; in contrast to QUELIC, it is designed to be understandable by other non-QUEL systems.

Word or concept-based instances?

Words help describe concepts. Words are labels mapping onto our mental-conceptual representations and structures. But words lie, words are ambiguous, deceitful. In most languages, the same utterance comes with different meanings (polysemy) or different lexical classes.
'Take off' for instance means both the initial phase of flight of a plane, and the action of removing a piece of cloth or shoe. And its antomym: I landed a job!
'Since' as an adverb or a preposition has a different meaning than as a conjunction, repectively 'since' involving past tense, and 'because', involving present tense. These are different concepts.
Concepts describe real world objects, ideas, processes, events. Using concepts instead of words helps escape the ambiguity due to synonyms or quasi synonyms, idiomaticity, metonymy, synecdoches, trope, lemma, and all the ways to term the same meaning (say the same thing!).

Multiple-word concepts

The two expressions 'Date_Created and 'Creation date' represent the same concept. Linguists will probably disagree, based on usage of 'Past participle-noun' Vs 'Noun-noun' phrases, but the important fact is that the concept-entity 'Date-Created' be addressed in the same coherent way, notwithstanding the language that defines it, or the words used to form the term. In other 'words', it is important to define an abstraction which serves as an entry point to the concept itself and all its avatars.

Lexikl has already broken down these homonyms into their respective concepts and abstractions, and created the link to true synonyms.

Parsers will use said abstractions to compose natural language agnostic text as a sequence of unique concepts separated by spaces and punctuation, as does text in most languages.
Note that although concepts live through abstractions, these act as word of a universal language.

Abstractions: Lexigos at Work

Lexikl uses pure plain-ascii Lexigos as building blocks, abstractions of concepts. They now amount to over 2,492,000 tokens, and serve as edges and vertices in the hypergraph, and as words to form documents, predicates, rules, knowledge, database elements or as schema or program variables, a useful feature in use case realization and modal-driven architectures.
See Lexikl's layout here.
Each [multi-faceted] Lexigo contains a lexical class, en-coded 'semantics', and stemming complementing the lemma. It acts as key to collections of relations, render-ings, definitions as 'viewpoints' or 'formulas', versions and authors, a.k.a 'contexts', translators and a compo-nent in a vectorial space of concepts / thoughts.
Proposed categorizations (also concepts), contexts and thesauri, are stored in complex collections. Contexts may be customized, including categorization.
Lexigos allows usage in -ductions (ab, in and de), can form n-ary predicates even coupling them with ontologies using [multiple] inheritance properties.
It is important to realize that while using Lexigos, information appended to it makes it an homoiconic language, whose semantic content and values can be used as program variables as well as database objects [table and column names] and contents.
As database contents, the unit and unit ratio are the semantic part, augmented with the base64-encoded measurement system-independent value.
Therefore, such Lexigos 'designating' action verbs (functional) are "FirstClass". Then, the first character of the Lexigo is a code for a verb.

An associative neural memory

In addition, Lexikl acts as an associative neural memory framework, as several millions of concepts paired with Lexigos participate in the hypergraph, using the "Identity Substitution" principle.
If some concept is not found, either as single or aggregation of terms, one can create concepts on the fly, a useful feature to create software applications, whose conceptual model is ontology.

Represent Once, Use Everywhere

Lexikl can be used in: machine-translation, search-engines, building ontologies and inference systems, Libraries of business objects, business processes, complex graphs, ontologies, predicate/inference, Data representation and transfer, entity-extraction, Machine-translation, small-footprint data storage, Harmonizing documents and databases using the same building blocks on a planet-wide scale, implementing a global representation to help describe international standards, use in wikis, ontologies, requirements and specifications, rule-based systems, data transfer.

Contexts

Contexts include (proposed - but customizable) first parents for each word, allowing for an infinite number of classifications and ontology / application building - implementing [multiple] object-orientation naturally.
Lexigos can be used as column or data in databases and JSON manipulation.
Functional elements are represented as predicates, e.g. 'Carry' equals "Move(Object, Object)", etc.
Currently implemented: hypergraph, part of custom grammar, and a small rule-based expert-systems.

Lexikl isCompatible with - useable by:
  • Java, C#, C++, Python, Json
  • Scala, Factorie, Prolog, LISP
  • Oracle, SQL Server, DB2, mySQL,
  • Redis, RDF, Neo4J, Sesame,
  • postGreSQL. MongoDB, CouchDB
  • Fortran, Cobol, ...

How do humans "understand"?

Each word we use comes in various types, and each type is associated with one or more function.
There are eight major such roles in all languages. These roles, listed below, are known as 'lexical class'.
There are also symbols, icons, prefixes and suffixes, etc.

Determiners: 'the', 'a', 'an', indicate the kind of reference for the noun, either direct ot indirect: 'the' apple fell from the tree, or 'an' apple fell from the tree; if using 'the' (direct determiner or article), we know what object is talked about.

- Verbs indicate action (dynamic verbs), and state (stative, non-progressive verbs). Or, 'the fact that' - 'like' 'dislike' 'love' 'hate' 'prefer' 'remember' 'forget' 'believe' 'mean' 'seem' 'understand' 'want' 'need' 'know' 'belong' 'own'...
Some verbs can be both state and action verbs depending on their meaning: in "I think you made a mistake", 'think' = 'believe'.

Nouns

Nouns represent entities that may act as a subject, a direct or indirect complement, as in: a/ A direct object is the receiver of action within a sentence, as in "He hit the ball." or "Cats eat mice" b/ An indirect object is the receiver of an object within a sentence, as in "I offered Lisa a gown, where Lisa is indirect object (I gave a gown to her). There is also a class of nouns representing entities that can act as subject or indirect object complement: proper nouns may express names of people, of places, of events. The first letter is uppercase in languages that apply this rule.

Semantic clusters

In addition to offering a universal naming system, Lexikl provides clusters of related concepts [global semantics], the contexts in which they may exist / evolve tagged with Lexigos, unique concept iden-tifier (UCI), each serving as a pointer to this infor-mation, distinguished name in ldap, unambiguous (identifies one concept only) and unique (no other concept in the lexicon has this name), also cove-ring first names and acronyms.
Each Lexigo may, as are words, be used as operand in statistical methods.
Note that in dynamic-cluster forming applications, a cluster may vary as an aggregation of concept identifiers, depending on world events. It may in-clude a [constant] network of concepts, but some world event may add to the cluster. For instance: oil will 'cover' oil (reflexive), OPEC, barril, dollar, etc. They themselves define the term (viewpoint) with given renderings (pronunciation, utterances in various languages), etc.
If oil is found somewhere, Lexigos for place and information will increase cluster size.
Mean density of associations may be computed, and a theshold be defined to set the lower boundary, hence the size of the resulting cluster.

Semiotics

The Lexigo is an extra element, an abstraction referencing Peirce's semiotic triad, where concept is sign, rendering is significant, viewpoints are signified.

But Are Lexigos "real" words?
As per above-mentioned triadic relation, the Lexigo is used to replace all occurrences of the same word or expressions with similar meaning in different languages. As it comprises lexical infor-mation, then definitely it acts as a real word. It even conveys complex semantic meta-informa-tion, more than the initial concept of 'word'. If someones says he does not relate to A1875 as equivalent to a 'white [thisBrandOfStoves] stove', and 'A' stands for a noun, which a stove is, then we can send him to the web site for the brand where the online catalog will confirm the adequation of the reference with the object.
An objection can be : "Yes, but in another online catalog, it will be 'BE31A47'. Fine. This may be compared to the same concept expressed in a foreign language - a cognate. *And then the reader may imagine the concept, or associate it with other words, a process known as 'reading'.

A Periodic Table of Concepts?

Can we derive classes of concepts from some base sym-bol or concept? Take a line, for instance, 'hold' it in front of you. It defines a boundary, a threshold: 'within' and 'beyond'. Concepts of 'enough' and 'too much' become apparent - and their opposites (antonyms), therefore 'satiety' and 'plethora', and more. The common language says it all: "You draw the line", meaning "to set a limit at something, to decide when a limit has been reached" (The free dictionary).
Now, if someone else draws a line close to you, then you may feel "coertion".
In the same spirit, one can start building a periodic table of concepts to understand the reality - at least, 'some' reality of mind [see: "Fauconnier mental spaces"]. Note that logic is wired in the flesh and works fine. Other, external, factors may interfere to break its functionalities, but our set of logical units works admirably.
Now, still holding the line in front of you, turn 360 degrees (till you come back the same position you started. You define a circle, you draw a wall around you: the magic circle. Eons of wall, taboos, include, exclude appear 'magically' following this simple rotation..

Babel

In his book Babel: The Language of the 21st Century, Abraham Alain Abehsera observed that some homonyms in some language are paired with homonyms in another language, where both edges are synomyms. He call them the Babel square, as a group of four words, pairs homonyms, and pairs as synonyms.
In French, for example, 'mèche' and 'méchant' sound alike. But this coincidence was reproduced in English, as 'wick' and 'wicked' respectively. Countless repetitions of this order reflect the existence of physical fields that affect all of nature, and where birds of a feather flock together. This is the case for 'wick' and 'wicked', who share the property of 'being twisted'.
This phenomenon happens with words having different etymologies, and across centuries. He then demonstrates this happens across linguistic groups, and provides many examples.
The result is an underlying, hidden, network of meanings suddenly comes to light, and shows very bizarre associations between woman and mare, bride and bridle, born and burn, and wall and cooking.
It shows that the words used in each language are pieces of a huge puzzle which, once reconstituted, forms a universal language, a natural language with remarkable properties where words are not simply tags (phonetic or graphic 'renderings', in Lexikl), but explain the object they point to and reproduce their substance. This brillant discovery is paired with a theory of linguistic force fields, which helps find root meanings, very different than Wierzbicka's Semantic Primes.

Letters, Ideograms, Icons


Finding true semantic primes is an essential stage in defining an linguistic algebra that would allow programmatic composition and validation of sentences. Currently, a project (TBD) is under way to turn words into their consonant counterparts - with substitutions [e.g. 'd' for 'th', 'p' for 'ph', 'g' for 'w', etc.] and pair homonyms across languages at large to find counterparts and hidden meanings.

Now, what to think about the phonetic relationships between 'water' and the question "what"? 'What, water', 'was, wasser', 'lama, ma', 'shema, shue'? Could it be that all 'what' type questions bounce us to ontology, as we are made of close to 75% of water?