Linguistics may help us understand some of the “weirdness” of the genetic code
× Close
Credit: CC0 Public Domain
Linguists have developed a comparison of the genetic code with language in which nucleotides act as letters, and introduced the concept of “semiotic nucleotides” – the minimum element that makes it possible to distinguish between codons – the coding units of DNA. According to this approach, the biochemical properties of DNA act as informational properties.
The flexibility of the informatics approach enables researchers to highlight facts that are not explained by biochemical features, and are usually considered deviations from the global regularity of the genetic code. The research is published in the journal vital systems.
The genetic code has twofold properties: it contains not only biochemical properties, but also a semiotic or semantic dimension. Semiotics is a science that studies the general regularity of information processing through signs. Researchers find similarities between the text and the genetic code, for example, in that genes carry a program for the development of an organism, and that program is like scripts written according to some rules.
Semiotic theory allows us to consider nucleotides not as biological molecules but as carriers of information. Crucial genetic processes can be described from the point of view of processes with the text: reading, transcription, translation, proofreading, and editing.
Researchers from the Immanuel Kant Baltic Federal University and the Scientific Information Institute of Social Sciences of the Russian Academy of Sciences paid attention to the fact that the same nucleotide in DNA according to its location has a different value in processing genetic information.
Thus, when proteins are synthesized in a cell according to a “recipe” written in genes, the cell’s special “machines” – the ribosomes – read nucleotides three by three, and for each of these three, called a codon, selects a specific amino acid. In 32 cases out of 64 possible combinations of nucleotides “A”, “T”, “G” and “C”, the third position could be occupied by any of them and this does not affect the result – the recognized amino acid. This happens because the same amino acid can be encoded by several different nucleotide twins.
As a result, in order to understand the required amino acids, the ribosome, while reading each letter, focuses primarily on the “meaning” of its group within the triplets. This is called a wiggle, because of the “wobble” position of the last nucleotide in the codons. In order to describe them from the point of view of data transmission, linguists have introduced the term “semiotic nucleotides” – the minimum elements that enable them to distinguish one trinucleotide from another.
In this regard, rather than comparing nucleotides to letters as is usually done, the scientists proposed linking them to other linguistic entities—phonemes, or more accurately, phonemes (a linguistic component that includes only those features necessary to distinguish signs). The letter is not a unit of language; It only serves to set the sound in writing.
The similarity to phonemes allows an explanation of how two distinct features of a nucleotide are associated with their varying importance depending on the position of the nucleotide within the codon.
This assumes that the minimal units of the genetic code are not nucleotides, but rather their defining features. These features have different significance according to their position within the trio – the maximum in the second position and the minimum, down to zero, in the third position. The nucleotide at the third position is present in the physical sense but may be absent in the semiotic sense (from the point of view of its characteristic value).
Each nucleotide has two distinct properties: the number of hydrogen bonds (two or three) and carbon rings (one or two). These features are related to the binding of nucleotides to each other. Thus, nucleotides with two rings correspond to those with one ring (and vice versa) but with the same number of hydrogen bonds. However, this regularity can be shifted as far as the third position in the codon is concerned.
“The use of the semiotic method makes it possible to define the role played by each nucleotide in differentiating codons, and to regard the oscillation as a special reading mode. As a product of evolution, the genetic code is semiotically heterogeneous—in half of the codons (32) the third position is irrelevant, In thirty cases it works at half strength (only one feature, the number of rings is relevant); and only in the case of tryptophan are the two features equally shared.”
An informational semiotic approach enables us to complete the common description of the genetic code. Early Francis Crick, speaking of deviations from the regularities of the genetic code associated with the third locus, described them as ‘beyond the obvious meaning’. However, from a semiotic point of view, the particular loci Nucleotides may have a meaningful explanation, as their primary function is to separate one codon from another, and only the second function is to distinguish between codons,” says Dr. Soren Zulian. PhD in Linguistics, Professor and Senior Researcher at the Immanuel Kant Baltic Federal University, Institute for Human Sciences.
more information:
Soren Zulian, On the Minimum Elements of the Genetic Code and Their Semiotic Functions (Degeneration, Integration, Oscillation), vital systems (2023). DOI: 10.1016/j.biosystems.2023.104962