THE TECHNOLOGY OF SEMI-AUTOMATIC META-MARKING PROGRAM OF KAZAKH NATIONAL CORPUS

The authors:
Aiman Zhanabekova
Kunsulu Pirmanova
Pages: 470-487
Section: GENERAL AND SPECIFIC LINGUISTICS
URL: http://science-ifl.rudn.ru/470-487/
DOI:
10.22363/09321-2019-470-487

Abstract
About the authors
References

In the study of the Kazakh language, it is necessary to pay much
attention to the field of linguistics and to study its world-class
theoretical and practical aspects.
Special editions of scientific journals have also been published,
with articles on general and specific issues relating to the design
and functioning of text boxes around the world
However, it is known that many issues are related to this case.
Linguistics require a special study of Kazakh linguistics. It
includes: definition of corpus linguistics and its basic concepts,
place in the structure of linguistics, methods, approaches, etc. At
the same time, the problem of understanding theoretical basis of
new trend is that the corpus is still far from being used in specific
research.
The subject of corpus linguistics was considered by professionals
in this field as one of the linguistic fields that study the situation
of creating and using language corpuses. Some scholars consider
this subject in a narrow circle that only explains it in the context
of computer linguistics: “Corpus linguistics is a section of
computer linguistics that uses the development of common
principles and the use of linguistic corpuses (computer texts)
using computer technology.” (Zaharov V.P., 2011: 7)
And the concept of computer linguistics, as a rule, can be
interpreted as a broad range of computer tools. Here we refer to
“computer tools” as computer programs, language data
processing and proper organization of computer technology and
so on. (Baranov A.N., 2003: 13-38).
And corpus linguistics is used only as a “tool”. That is why
corpus linguistics could not do without compulsion. However,
given that computer plays an important role in all forms of
modern education, it can not be attributed to computer linguistics.
The above mentioned theoretical and practical aspects of corpus
linguistics should also be taken into account when creating a
database of Kazakh language texts on the basis of computer
corpus. While Kazakh corpus linguistics is formed as a special
branch of Kazakh linguistics, it allows Kazakh language
specialists to use large-scale experimental materials, to find
necessary language data and to make relevant edits. All this will
give a new look at the empirical approaches to Kazakh language
research. It also will help to introduce most important language
materials in the field of science.
At present, the nature of global corpus linguistics development is
to make national full texts a special research object. Therefore,
automated computer database of Kazakh language texts (with
theoretical and practical considerations) will be the most
important initiative of “Kazakh National Corpus” in the near
future. The results of such researches are one of the topical issues
in definition of styles, structural, semantic, functional
characteristics of Kazakh texts.
The texts are not only a collection of electronic versions of texts
in different styles of languages, but a modern language tool based
on a computer program that automatically analyzes a language on
different language levels. Therefore, it is necessary to create
programs that make such automated analyzes on housings. Given
the fact that computer is capable of dealing with formal models,
there is a need to introduce a linguistic guide to computer
programs. Therefore, linguistic development of each level of
language enables the computer to automatically make language
analyzes as an important issue prepare linguistic products in the
applied direction.
In Kazakh linguistics there is a question of creating national
corpuses since the early stages of the XXI century. Department of
Applied Linguistics has been actively involved in the creation of
national corpuses in accordance with the requirements of
information technology at the A. Baitursynov Institute of
Linguistics. By the initiative of Professor A.K Zhubanov,
methods and techniques of corpus building were investigated and
studied by A. Zhubanov and A. Zhanabekova in the textbook
“Corpus linguistics” (Zhubanov A.K., Zhanabekova A.A., 2017:
5). Practically, since 2009 programs on introduction of linguistic
definitions have been developed and accumulated considerable
experience. At the same time, the morphology level analyzes
were first performed. This automatic program is called
morphological analyzer. D. Tokmyrzaev, a programmer of the
Institute of Linguistics, and K. Koibagarov, former researcher of
the Institute of Informatics, developed the software. Linguistic
and extralinguistic markings are realized with the support of
programmers D. Tokmyrzaev and K. Koibagarov, and a
mathematician, specialist in applied linguistics, professor
A. Zhubanov. Additionally, the specialist of applied linguistics,
doctor of philology A. Zhanabekova also contributed to the
development of the corpus. It is well-known that the development
of linguistic definitions requires the knowledge of the field of
linguistics. Professor A. Zhunisbek is engaged not only in
theoretical problems of the phonetics but also deals with the
programmatic areas of applied linguistics – textbooks,
methodology, speech synthesis and analysis. At the same time,
the phonetic symbols of the case are based on the 3-step guide of
A. Zhunsbek (Zhunsbek A., 2018: 80).
In short, computer linguists say that computer language fund is
the ability of a scientist to look at his or her new subject in a new
way. The more linguistic foundation is, the deeper the language
structure is, the deeper the concept of the object being explored,
and the better of the “illumination” in the human knowledge field.
Likewise, the researcher’s abilities will increase dramatically,
creative energy sources will emerge, and these new opportunities
will certainly be used to improve the systematic character of the
Kazakh language and careful understanding of the language.
Keywords: corpus, national corpus of the Kazakh language,
linguistic markings, meta-marking.

Aiman Zhanabekova1 , Kunsulu Pirmanova2
1
A. Baitursynov Institute of Linguistics
Almaty, Kazakhstan
e-mail: aiman_miras@mail.ru 2
Al-Farabi Kazakh National University
Almaty, Kazakhstan
e-mail: Kunsulu.Pirmanova@mail.ru

Baranov, A.N. 2003. Computational linguistics. Introduction to
applied linguistics: Tutorial. Moscow, 13-38 pp.
Zhubanov, A.K., Zhanabekova А.А. Corpus linguistics. 2017.
“The Kazakh Language” publishing house, Almaty, 5 pp.
Zhunisbek, A. 2018. Problems of Kazakh linguistics. Abzal-Ai,
Almaty, 80 pp.
Zakharov, V.P., Bogdanova S.Y. 2011. Corpus Linguistics.
IGLU, Irkutsk, 7 pp.
Sirazitdinov, Z.A., Buskunbaeva, L.A., Ishmukhametova, A.Sh.,
Ibragimova, A.D. 2013. Information systems and databases of the
Bashkir language. Book Chamber of the RB, Ufa, 32 pp.