Development of Stemming Algorithm for Rejang Language Stemmer Based on Rejang Language Morphology

SastyaHendri Wibowo,BusonoSoerowirdjo,Ernastuti,AvinantaTariga

Stemming is the process of getting a root / stem or basic word from a sentence by separating each word from the basic word and adding it to the prefix (suffix) and ending. Stemming algorithms for one language are different from stemming algorithms for other languages. For example English has a different morphology than Indonesian so the stemming algorithm for both languages is also different. In English text, the process required is only the process of removing suffixes. Whereas in the Indonesian language the text is more complicated / complex because there are additional variations that must be discarded to get the basic word of a word. The Rejang algorithm is an algorithm used for the Rejang language morphology, which is the development of the Indonesian algorithm, the UG18 algorithm, by first studying how it works, then analyzing the strengths and weaknesses of the UG18 algorithm. The process of stemming the Rejang algorithm uses decapitation rules consisting of prefixes, infixes, suffixes, and confixes. The stemming process starts from searching the basic words in the basic word dictionary, then the stemming process is carried out using the REJANGS algorithm. The testing includes stemming per word and document. The dataset used for testing is the Rejang text document that has been converted into the .txt format. The results of stemming will display the number of words, the number of words in the system and the length of the processing time stemming. The results of the REJANGS algorithm can be applied to the Rejang language basic word search application.

Volume 11 | 05-Special Issue

Pages: 1858-1870