Nonlinear Dynamics, Psychology, and Life Sciences, Vol. 26, Iss. 1, Jan, 2022, pp. 1-19 @2022 Society for Chaos Theory in Psychology & Life Sciences Linguistic Behavior of Well-Defined Strings in the Non-Coding Human Genome Abstract: In this article we do a top-down analysis of the non-protein-coding human genome
using well-defined parameters, resulting in what we call ?-strings.
We show that there are altogether 45,371,328 different ?-strings in the
human non-protein-coding genome. We explore statistical properties of the
y-strings and demonstrate that they have many characteristics in common with human words.
We indicate how they are 'packed' in the chromosomes and that each chromosome
has its own specific y-dictionary. We also outline our future work exploring
the linguistic features of y-strings and y-text using methods developed to study human,
natural language. Keywords: sequence analysis, CCIC, long-range correlation, text mining, datamining, natural language processing, network analysis, bioinformatics |