Automatic Dewey Decimal Classification of Indonesian Book Metadata Using IndoBERT with Weighted Loss and Context Enhancement

Joko Purwanto; Fajar Mahardika; Adlan Nugroho

doi:10.58602/jaiti.v4i2.258

Joko Purwanto ^{(Corresponding Author)} Politeknik Negeri Cilacap
Fajar Mahardika Politeknik Negeri Cilacap
Adlan Nugroho Politeknik Negeri Cilacap

DOI: https://doi.org/10.58602/jaiti.v4i2.258

Keywords: context enhancement, Dewey Decimal Classification, IndoBERT, Automatic Classification, Weighted Loss

Abstract

This study proposes an automatic Dewey Decimal Classification (DDC) classification framework for Indonesian book metadata by integrating the IndoBERT model strengthened through weighted loss and context enhancement mechanisms. The current escalation of digital book collections poses significant challenges in classification efficiency and information retrieval, while the manual DDC classification process still relies on librarian expertise and is relatively time-consuming. The dataset used includes 2,516 book metadata obtained through the Google Books API and mapped into 14 DDC categories. The context enhancement strategy is implemented by integrating book titles and descriptions into a single text representation, while weighted cross-entropy loss, random oversampling, and simple data augmentation techniques are applied to address class imbalance issues. Model performance is evaluated based on accuracy, precision, recall, and F1-score metrics. Experimental results show that the proposed approach achieves an accuracy of 90.14% and a weighted F1-score of 90.15%, outperforming the baseline IndoBERT model, which only achieved an accuracy of 47.82% and a weighted F1-score of 47.06%. These findings indicate that the combination of weighted loss and contextual text representation can improve the semantic understanding of book metadata while reducing bias towards the majority class in Transformer-based DDC classification.

Downloads

Download data is not yet available.

References

M. Borovic, M. Ojstersek, and D. Strnad, “A Hybrid Approach to Recommending Universal Decimal Classification Codes for Cataloguing in Slovenian Digital Libraries,” IEEE Access, vol. 10, no. July, pp. 85595–85605, 2022, doi: 10.1109/ACCESS.2022.3198706.

X. Yang and Z. Zhang, “Enhancing book genre classification with BERT and InceptionV3: a deep learning approach for libraries,” PeerJ Comput. Sci., vol. 11, pp. 1–20, 2025, doi: 10.7717/peerj-cs.2934.

M. P. Satija and A. Kyrios, A Handbook of History, Theory and Practice of the Dewey Decimal Classification System. Facet, 2023. doi: 10.29085/9781783306114.

S. Gao et al., “Limitations of Transformers on Clinical Text Classification.,” IEEE J. Biomed. Heal. informatics, vol. 25, no. 9, pp. 3596–3607, Sep. 2021, doi: 10.1109/JBHI.2021.3062322.

R. Su, S. Gao, K. Zhao, and J. Zhang, “Adaptive feature interaction enhancement network for text classification,” Sci. Rep., vol. 15, no. 1, pp. 1–14, 2025, doi: 10.1038/s41598-025-95492-y.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.

B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” Proc. 1st Conf. Asia-Pacific Chapter Assoc. Comput. Linguist. 10th Int. Jt. Conf. Nat. Lang. Process. AACL-IJCNLP 2020, pp. 843–857, 2020, doi: 10.18653/v1/2020.aacl-main.85.

H. Ahmadian, T. F. Abidin, H. Riza, and K. Muchtar, “Hybrid Models for Emotion Classification and Sentiment Analysis in Indonesian Language,” Appl. Comput. Intell. Soft Comput., vol. 2024, 2024, doi: 10.1155/2024/2826773.

K. E. Saputra and Riccosan, “Indonesian news article authorship attribution multilabel multiclass classification using IndoBERT,” IAES Int. J. Artif. Intell., vol. 13, no. 4, pp. 4688–4694, 2024, doi: 10.11591/ijai.v13.i4.pp4688-4694.

M. Arslan and C. Cruz, “Business text classification with imbalanced data and moderately large label spaces for digital transformation,” Appl. Netw. Sci., vol. 9, no. 1, 2024, doi: 10.1007/s41109-024-00623-5.

D. Refai, S. Abu-Soud, and M. J. Abdel-Rahman, “Data Augmentation Using Transformers and Similarity Measures for Improving Arabic Text Classification,” IEEE Access, vol. 11, no. October, pp. 132516–132531, 2023, doi: 10.1109/ACCESS.2023.3336311.

Y. Cui, M. Jia, T. Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 9260–9269, 2019, doi: 10.1109/CVPR.2019.00949.

N. Mahmoodi, H. Shirazi, M. Fakhredanesh, and K. DadashtabarAhmadi, “Automatically weighted focal loss for imbalance learning,” Neural Comput. Appl., vol. 37, no. 5, pp. 4035–4052, 2025, doi: 10.1007/s00521-024-10323-x.

R. Escobar Díaz Guerrero, L. Carvalho, T. Bocklitz, J. Popp, and J. L. Oliveira, “A Data Augmentation Methodology to Reduce the Class Imbalance in Histopathology Images,” J. Imaging Informatics Med., vol. 37, no. 4, pp. 1767–1782, 2024, doi: 10.1007/s10278-024-01018-9.

Z. Cai, Z. Li, Y. Liu, L. Guo, and Y. Song, “Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss,” ArXiv, vol. abs/2505.00021, 2025, doi: 10.48550/arxiv.2505.00021.

J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0192-5.

A. A. Khan, O. Chaudhari, and R. Chandra, “A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation,” Expert Syst. Appl., vol. 244, no. May 2023, p. 122778, 2024, doi: 10.1016/j.eswa.2023.122778.

Q. Xie, Z. Dai, E. Hovy, M. T. Luong, and Q. V. Le, “Unsupervised data augmentation for consistency training,” Adv. Neural Inf. Process. Syst., vol. 2020-Decem, no. NeurIPS, pp. 1–13, 2020.

L. Xiao and J. S. Zhang, “A novel transformer-based semantic feature extraction method for multi-label text classification,” Sci. Rep., vol. 16, no. 1, pp. 1–12, 2026, doi: 10.1038/s41598-025-30925-2.

C. H. Lin and U. Nuha, “Sentiment analysis of Indonesian datasets based on a hybrid deep-learning strategy,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537-023-00782-9.

R. Qasim, W. H. Bangyal, M. A. Alqarni, and A. Ali Almazroi, “A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification,” J. Healthc. Eng., vol. 2022, pp. 1–17, Jan. 2022, doi: 10.1155/2022/3498123.

S. Jamshidi et al., “Effective text classification using BERT, MTM LSTM, and DT,” Data Knowl. Eng., vol. 151, p. 102306, 2024, doi: https://doi.org/10.1016/j.datak.2024.102306.

Jacob Eisenstein, Introduction. The MIT Press, 2019.

G. Zhang and J. Hu, “Enhanced industrial text classification via hyper variational graph-guided global context integration,” PeerJ Comput. Sci., vol. 10, 2024, doi: 10.7717/peerj-cs.1788.

N. K. Nissa and E. Yulianti, “Multi-label text classification of Indonesian customer reviews using bidirectional encoder representations from transformers language model,” Int. J. Electr. Comput. Eng., vol. 13, no. 5, pp. 5641–5652, 2023, doi: 10.11591/ijece.v13i5.pp5641-5652.

F. Baharuddin and M. F. Naufal, “Fine-Tuning IndoBERT for Indonesian Exam Question Classification Based on Bloom’s Taxonomy,” J. Inf. Syst. Eng. Bus. Intell., vol. 9, no. 2, pp. 253–263, 2023, doi: 10.20473/jisebi.9.2.253-263.

Dr. Li Deng and Y. Liu, Deep Learning in Natural Language Processing. Singapore: Springer Singapore, 2018. doi: 10.1007/978-981-10-5209-5.

T. Y. Tandi, T. F. Abidin, and H. Riza, “Incorporation of IndoBERT and Machine Learning Features to Improve the Performance of Indonesian Textual Entailment Recognition,” J. Inf. Syst. Eng. Bus. Intell., vol. 11, no. 2, pp. 173–186, 2025, doi: 10.20473/jisebi.11.2.173-186.

H. Lu, L. Ehwerhemuepha, and C. Rakovski, “A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance,” BMC Med. Res. Methodol., vol. 22, no. 1, pp. 1–12, 2022, doi: 10.1186/s12874-022-01665-y.