Recommendation in Online Q&A Communities Based on BERT Pre-training Technique

Recommendation in Online Q&A Communities Based on BERT Pre-training Technique

Navid Khezriyan, Jafar Habibi, Issa Anamoradnejad

Abstract

Open-source and online Q&A communities make use of tags and keywords for indexing, classification, and thematic search. In this study, we propose TagBERT, a novel model for recommending tags on new questions, which makes use of a combination of deep learning and BERT models. In this model, first, the processed sentences are converted into numerical vectors by means of the BERT tokenizer. Next, the attributes are extracted using the CNN network, and, afterward, the DNN network is trained on the extracted attributes in order to recommend tags. To evaluate our model, we used four datasets, i.e. Free-code, UNIX, WordPress, and Software Engineering. Our proposed model obtained the highest precision score over baseline deep-learning and conventional methods. In contrast to previous studies in which precision was significantly reduced as a result of increased recommended tags, the precision of our model did not remarkably vary with an increase in the number of tags.

Keywords

Tag recommendation, Online Q&A communities, Open-source communities, Classification, BERT

References

  • [1] F. Figueiredo, H. Pinto, F. Belém, J. Almeida, M. Gonçalves, D. Fernandes, E. Moura, Assessing the quality of textual features in social media, Information Processing & Management 49 (1) (2013) 222–247. doi:10.1016/j.ipm.2012.03.003.
  • [2] X. Li, L. Guo, Y. E. Zhao, Tag-based social interest discovery, in: and others (Ed.), Proceedings of the 17th international conference on World Wide Web, (2008), pp. 675–684.
  • [3] F. M. Belém, J. M. Almeida, M. A. Gonçalves, A survey on tag recommendation methods, Journal of the Association for Information Science and Technology 68 (4) (2017) 830–844. doi:10.1002/asi.23736.
  • [4] P. Zhou, J. Liu, X. Liu, Z. Yang, J. Grundy, Is deep learning better than traditional approaches in tag recommendation for software information sites?, Information and Software Technology 109 (2019) 1–13. doi:10.1016/j.infsof.2019.01.002.
  • [5] J Devlin, MW Chang, K Lee, K Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). The CSI Journal on Computer Science and Engineering, Vol. 18, No. 2, 2021
  • 59
  • [6] N Kalchbrenner, E Grefenstette, P Blunsom, A convolutional neural network for modelling sentences, arXiv preprint arXiv:1404.2188 (2014).
  • [7] P Liu, X Qiu, X Huang, Recurrent neural network for text classification with multi-task learning, arXiv preprint arXiv:1605.05101 (2016)
  • [8] Z Yang, D Yang, C Dyer, X He, A Smola, Hierarchical attention networks for document classification, Proceedings of the 2016 confer- ence of the North American chapter of the association for computational linguistics: human language technologies (2016).
  • [9] S Lai, L Xu, K Liu, J Zhao, Recurrent convolutional neural networks for text classification, Twenty-ninth AAAI conference on artificial intelligence (2015).
  • [10] S Wang, D Lo, B Vasilescu, A Serebrenik, EnTagRec++: An enhanced tag recommendation system for software information sites, Empirical Software Engineering 23 (2018) 800–832.
  • [11] P Zhou, J Liu, Z Yang, G Zhou, Scalable tag recommendation for software information sites, IEEE 24th International Confer- ence on Software Analysis, Evolution and Reengineering (SANER) (2017).
  • [12] J Liu, P Zhou, Z Yang, X Liu, J Grundy, FastTagRec: fast tag recommendation for software information sites, Automated Software Engineering 25 (2018) 675–701.
  • [13] GV Menezes, JM Almeida, F Belém, Vale, Demand-driven tag recommendation, in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2010.
  • [14] H Yu, B Zhou, M Deng, F Hu, Tag recommendation method in folksonomy based on user tagging status, Journal of Intelligent Information Systems 50 (2018) 479–500.
  • [15] X Xia, D Lo, X Wang, B Zhou, Tag recommendation in software information sites, 10th Working Conference on Mining Software Repositories (MSR) (2013).
  • [16] X Cai, J Zhu, B Shen, Y Chen, Greta: Graph-based tag assignment for github repositories, IEEE 40th Annual Computer Software and Applications Conference (COMPSAC) 1 (2016).
  • [17] M. Hmimida, R. Kanawati, A graph-based metaapproach for tag recommendation, Springer, Cham, (2016).
  • [18] R Jäschke, L Marinho, A Hotho, Tag recommendations in folksonomies, European Conference on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg, (2007).
  • [19] M Shi, J Liu, D Zhou, Y Tang, A topic-sensitive method for mashup tag recommendation utilizing multirelational service data, IEEE Transactions on Services Computing (2018).
  • [20] S. Rendle, L. Schmidt-Thieme, Pairwise interaction tensor factorization for personalized tag recommendation, Proceedings of the third ACM international conference on Web search and data mining (2010).
  • [21] SD Canuto, FM Belém, JM Almeida, A comparative study of learning-to-rank techniques for tag recommendation, Journal of Information and Data Management 4 (2013) 453–453.
  • [22] T. Qin, T.-Y. Liu, H. Li, A general approximation framework for direct optimization of information retrieval measures, Information Retrieval 13 (4) (2010) 375–397. doi:10.1007/s10791-009-9124-x.
  • [23] H Cao, M Xie, L Xue, C Liu, F Teng, Social tag prediction base on supervised ranking model, Proceeding of
  • ECML/PKDD 2009 Discovery Challenge Workshop (2009).
  • [24] F Belém, E Martins, T Pontes, J Almeida, Associative tag recommendation exploiting multiple textual features, Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (2011).
  • [25] F. Figueiredo, H. Pinto, F. Belém, J. Almeida, M. Gonçalves, D. Fernandes, E. Moura, Assessing the quality of textual features in social media, Information Processing & Management 49 (1) (2013) 222– 247. doi:10.1016/j.ipm.2012.03.003.
  • [26] SK Maity, A Panigrahi, S Ghosh, A Banerjee, DeepTagRec: A Content-cum-User Based Tag Recommendation Framework for Stack Overflow, Springer, Cham, 2019.
  • [27] Y Wu, S Xi, Y Yao, F Xu, H Tong, J Lu,Guiding supervised topic modeling for content based tag recommendation, Neurocomputing 314 (2018) 479–489.
  • [28] D Kowald, S Kopeinik, P Seitlinger, T Ley, Refining frequency-based tag reuse predictions by means of time and semantic context, in: Mining, Modeling, and Recommending’Things’ in Social Media, Springer, 2013, pp. 55–74.
  • [29] F. M. Belém, A. G. Heringer, J. M. Almeida, M. A. Gonçalves, Exploiting syntactic and neighbourhood attributes to address cold start in tag recommendation, Information Processing & Management 56 (3) (2019) 771–
  • 790. doi:10.1016/j.ipm.2018.12.009.
  • [30] Y Song, Z Zhuang, H Li, Q Zhao, J Li, WC Lee, Realtime automatic tag recommendation, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, (2008), pp. 515-522
  • [31] S. Wang, D. Lo, B. Vasilescu, A. Serebrenik, EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites, in: 2014 IEEE International Conference on Software Maintenance and Evolution, 2014, pp. 291–300.
  • [32] P Zhou, J Liu, Z Yang, G Zhou, Scalable tag recommendation for software information sites, IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2017).
  • [33] FM Belém, CS Batista, RLT Santos, Beyond relevance: explicitly promoting novelty and diversity in tag recommendation, ACM Transactions on Intelligent Systems and Technology (TIST) 7 (3) (2016) 1–34.
  • [34] SD Canuto, FM Belém, JM Almeida, A comparative study of learning-to-rank techniques for tag recommendation, Journal of Information and Data Management 4 (2013) 453–453.
  • [35] L Wu, L Yang, N Yu, XS Hua, Learning to tag, Proceedings of the 18th international conference on World wide web (2009).
  • [36] E. F. Martins, F. M. Belém, J. M. Almeida, M. A. Gonçalves, On cold start for associative tag recommendation, Journal of the Association for Information Science and Technology 67 (1) (2016) 83–105. doi:10.1002/asi.23353.