A hybrid user-item-based collaborative filtering model for e-commerce recommendations

. The COVID-19 pandemic deepened understanding of e-commerce as an extremely promising sphere. Nowadays, even small businesses are widely using e-shops and e-markets. Thus, small and medium-sized e-commerce companies need powerful, flexible recommender systems, which do not require significant


INTRODUCTION
Due to the rapid growth of e-commerce, entrepreneurs faced an urgent need to stimulate consumers through giving a valuable advice. The impetuous growth and variety of information available on the web and the speedy development of new e-commerce services (buying products, product comparison, different auctions, etc.) frequently overburden the customers and result in their poor decisions (Vynogradova et al., 2020;Yasynska et al., 2019). This negatively affects profitability.
The solution is seen in the implementation of recommender systems (RSs). Over recent years they have proven to be effective in decreasing the level of information pressure on the customers. The key idea of the RSs is to direct a consumer to a new object (product, service, etc.) that would be new for him/her and at the same time relevant to his/her current goals. This is a crucial factor in substantiating the relevance of the RSs' concept development. Ricci, Rokach, and Shapira (2015) define RS as "software and techniques providing suggestions for items to be of use to a user." Initially the RSs were developed in the early 1990s (Resnik et al., 1994;Schafer et al., 1999). In those days, a large number of algorithms and approaches to the creation of such systems appeared (Ricci et al., 2015;Aggarwal, 2016). Since that time, the RSs have had great potential for application in e-commerce. In this case, products, manufacturers or suppliers, product sets or service packages turned into the objects for recommendations. Riedl (1999, 2001) outlined the features of the RSs usage in e-commerce and proved its capacity to overcome the information overload and to achieve the effect of mass customization. In their opinion, RSs improve e-commerce sales in three ways: converting "browsers" into buyers, crosssell increasing, and loyalty building. Ricci et al. (2015) listed some other reasons for the companies to use RSs technology on their sites, they are as follows: to increase the number of objects sold, to sell more diverse objects, to increase customer

160
The discussed problems determine the objective absence of a universal and accessible (in the case of limited resources of enterprises) tool for communication between the seller and the buyer. To improve widespread approaches and search for new ones, the individual characteristics of consumers should be used to for effective combination of the existing algorithms, as well as for the development of new ones. Consequently, this determines the relevance of the research of the RSs.
The purpose of this study is to develop a hybrid model to achieve more accurate and high-quality ecommerce recommendations based on the efficient combination of collaborative filtering techniques that do not require significant computing power and easy-to-integrate into the website's architecture. Such RSs will be useful for medium-sized commerce companies that are gaining ground in the electronic market.
The paper is organized as follows. The next section reviews the related literature. The description of the main collaborative filtering techniques is given in the third section. The proposed hybrid model of collaborative filtering is presented in the same section. In the fourth section, the main experimental results are shown. Finally, conclusions and opportunities for further research are considered in the last section.

LITERATURE REVIEW
The section of the literature review presents the related studies that demonstrate the broad acceptance of a hybrid approach in the development of RSs and confirm the variety of hybridization methods.

Popularity of the methods combination
Comparison of different recommender algorithms on the accuracy, the complexity of preparation and optimization time, presented in (Hnot, 2016), suggests that using separate methods demonstrates a direct connection between the complexity of the RS and its effectiveness. Today, the only hybridization animates the usage of simple classical methods for more accurate predictions.
Burke classified the main models of hybrid RSs as follows: weighted hybridization, switching hybridization, mixed hybridization, feature combination hybridization, cascade hybridization, feature augmentation hybridization, meta-level hybridization (Burke, 2007). These models differ in architecture and organization of the usage of the results for each involved algorithm in the hybridization process.
Besides, it is common to combine collaborative filtering techniques with another method or traditional measurement metrics to avoid a cold-start problem. Feng et al. (2021) considered the extreme case of sparse data (the new user cold-start problem) and proposed the collaborative filtering ranking model, which combines rating-oriented and pairwise ranking-oriented approaches, precisely the Probabilistic Matrix Factorization and a Bayesian Personalized Ranking. Ajaegbu (2021) proposes to balance the three current traditional measurement metrics such as: Cosine-based similarity, Pearson correlation similarity and Adjusted cosine similarity, in the direction of cold-start situations. This algorithm complements the strength of the traditional measurement metrics with evidence shown when measured with Mean Absolute Error.
The widespread use of collaborative filtering-based hybridization is confirmed by an analysis of research over the past two years.
Walek and Fojtik (2020) proposed a monolithic hybrid RS, which combines a collaborative filtering system, a content-based system, and a fuzzy expert system for use in the area of movie recommendation. Wu, Sang, and Cui (2021) suggested a semi-supervised ensemble filtering method to improve the recommendation performance by combining three popular collaborative filtering techniques. First, three weak predictors with labelled data are independently initialized by three different algorithms. Predictors are generated by neighbourhood methods and by latent factors model too. The final prediction is achieved by mixing the results from the three predictors enhanced with unlabeled data. Yassine et al. (2021) presented a new intelligent RS that combines collaborative filtering with the popular clustering algorithm K-means. The authors involved user demographic attributes (gender and age) to create segmented user profiles. All of the proposed RSs improve the performance and time response speed of the traditional collaborative filtering and the content-based techniques.

Deep learning and other approaches
Most state-of-the-art studies on RS development refer to this use of deep learning methods. Martins, Papa, and Adeli (2020) conducted a systematic review focusing on deep learning techniques applied in collaborative filtering recommendations. The authors considered the diverse non-linear modelling strategies to deal with rating data, the combination of deep learning techniques with traditional collaborative filtering-based linear methods. Farashah et al. (2021) propose the original method, which consists of the four phases: 1) clustering and classification of new users using Deep Neural Network algorithm; 2) using collaborative filtering RS based on hybrid similarity criterion which based on users features, such as age, gender, and occupation; 3) running improved Friendlink algorithm to count the similarity between users; 4) combining results of collaborative filtering RS and improved Friendlink algorithm. Molaei et al. (2021) developed a complete deep learning conceptual model by learning latent social features to embed in a collaborative filtering approach. First, representation learning applies to the rating matrix to extract the latent social features. Then, an advanced deep learning approach based on cascade tree forest is involved in the recommendation process.
However, RSs based on deep learning are not available to most small and medium-sized e-commerce companies. Such companies need powerful, flexible tools that do not require significant computing and financial resources. The following systems meet this request in some way. Leng et al. (2021) proposed a novel collaborative filtering approach based on multiple attribute decision making. The method determines the weight of each item and computes the preference similarities between the active and inactive users. According to the preference similarities, the candidate neighbourhood of the active user is determined. The authors used the most frequent item recommendation method to provide top-N recommendations to the active user. Li and Li (2020) developed a collaborative filtering recommendation algorithm based on user characteristics and interests. The similarity calculation based on user score similarity, user attribute feature similarity and user interest similarity was presented. The algorithm uses the number of user evaluations of attributes to determine the interest of users in different attributes. The similarity calculation formula is used to estimate the interest similarity score between users. The final similarity for recommendation is a combination of the user attribute feature similarity, user interest similarity, and the user rating similarity. However, these studies do not take into account the user review content.

Analyzing User Review Content
The RSs, which involve such elements as overall product rating, and users' comments, are much more appropriate resources to support consumer decision-making in e-commerce. In this context, research on hybrid algorithms using text comment analysis is noteworthy. Fidan (2020) demonstrated that the weighting methods such as Term Frequency and Term Frequency-Inverse Document Frequency as well as commonly used classification algorithms such as Naïve Bayes and SVM have some inadequacies in short text analysis. The author developed a grey relational classification model based on the Vector Space Model and Bag of Words. Yang et al. (2021) proposed a hybrid personalized recommendation model that extracts user preferences by analyzing user review content in different sentiment polarity at the sentence level, based on jointly applying user-item score matrices and dimension reduction methods. The authors demonstrate that fine-grained emotion recognition has good adaptability to a sparse rating matrix with a reasonable and good performance. Visa and Patel (2021) presented the approach, which initially generates the text score based on users' reviews with the help of opinion mining. Then the rating corresponding to the text estimates is submitted to the input of Convolutional Neural Network which results in better predictions in a product RS.
Having in mind the existing research gap, in this paper, we propose a new approach to the RS development, which includes a modification of the evaluating the similarity in the context of the collaborative filtering model and the use of users text comments as an additional source for a recommendation.

METHODOLOGY AND RESEARCH MODEL
The methodology section describes in detail the collaborative filtering approach to RS development. Then we introduce a new hybrid (bagging) model, which allows circumventing some limitations of the collaborative filtering methods.

Main collaborative filtering techniques
To achieve study purposes, firstly, we introduce the most common algorithmic approaches for generating personalized buying proposals. Jannach et al. (2011) proposed to divide these approaches into two main groups: collaborative filtering and content-based, information filtering. Naturally, it is possible to apply a hybrid approach, meaning to use different techniques within the RSs.
Content-based filtering provides a personalized approach to each user, taking into account the contextual features of both the user and the recommended objects.
Collaborative filtering (CF) techniques use given rating data for many objects by many users as the basis for missing rating forecasting and a top-N recommendation list generating for a given user who acts on the system as an active user. There are two main methods of CF: the nearest neighbour (NN) methods and the latent factor methods. The NN methods are based on the principle that consumers who have preferred similar objects in the past tend to prefer similar objects in the future. These methods can be user-based (UBCF) or itembased (IBCF).
The UBCF approach, also known as k-NN CF, comes under a class of memory-based algorithms. Originally, the methodology was presented in (Resnik et al., 1994). In the UBCF, the objects recommended to a consumer are those that other ones, with similar preferences, have chosen previously. A very important decision is the choice of similarity function. The literature (Ricci et al., 2015;Jannach et al., 2011;Herlocker et al., 2004) presents several different approaches for determining the similarity function -Pearson correlation coefficient, Spearman rank correlation coefficient, cosine similarity, etc. However, empirical analyses show that for UBCF the Pearson coefficient outperforms other measures of users comparing (Ricci et al., 2015;Herlocker et al., 2004).
Although UBCF is an effective method, there are some scalability issues when the set of customers grows. To expand CF capabilities for processing with large databases and facilitate the deployment of ecommerce sites, a more scalable algorithm, IBCF, was proposed. The main idea of the IBCF is to form proposals based on the relationship between objects inferred from the rating matrix. The assumption behind this approach is that users will prefer objects that are similar to other objects they like.
The IBCF finds the most similar objects (k-NN is used) to generate predictions and recommendations. In this case, the advantage of IBCF is the ability to calculate the similarity matrix in advance. Relating to the choice of the similarity function, the IBCF examines cosine similarity and Pearson correlation coefficient, as well as the UBCF, does. However, it has been emphasized in (Hahsler, 2013), the cosine similarity measure consistently outperforms the Pearson correlation metric for the later-described item-based recommendation techniques.
In RSs with a sufficiently high degree of consumer-object correspondence, one consumer adding or changing a rating is unlikely that significantly alter the similarity between the two objects, especially if the object has many ratings. This allows substantiating the expediency of a preliminary calculation of object similarity in the similarity matrix. Rows of this matrix can even be shortened to store only the most similar objects. When consumers change their ratings, these data will become a bit outdated, but they are likely to still get good recommendations, and the data can be fully updated by converting the similarity matrix during a low system load.
The first step in preparing a model for a recommendation is to divide a complete sample into training and test ones. We can use several approaches (Hahsler, 2013): splitting, bootstrap sampling and k-fold crossvalidation. The last approach suggests splitting the total sample into k sets (called folds) of approximately the same size, then we evaluate k times. It produces more robust results and error estimates.
Evaluation of different techniques for developing RSs is an important topic and its reviews were presented in (Herlocker et al., 2004;Gunawardana & Shani, 2009;Silveira et al., 2019). The evaluation metrics concerning CF scenarios were studied in (Martins et al., 2020).
A typical way to evaluate a prediction is to compute its deviation from the true value. This is the basis for the Mean Average Error (MAE) and the Root Mean Square Error (RMSE). The average RMSE and MAE have corrected error metrics for unbalanced test sets. RMSE penalizes larger errors stronger than MAE and thus is suitable for situations where small prediction errors are not very important.
Not all RSs work on the principle of predicting such a parameter as the rating of an object. They recommend to consumers those objects that potentially can be of interest to them. To evaluate the quality of such a recommendation, a confusion matrix is constructed (Hahsler, 2013). It shows how many objects are recommended correctly, where negative recommendations are truly negative and positive ones are true positive. In reliance on this matrix, some basic metrics are calculated for the analysis of the recommendation effectiveness, for instance, the accuracy of the prediction. Precision and recall are the two best-known classification metrics; they are also used for measuring the quality of information retrieval tasks in general.
In addition to the above methods, the following user-centric quality metrics are used to evaluate the perceived quality of the RSs: perceived accuracy (also called relevance), novelty, overall users' satisfaction, etc. (Cremonesi et al., 2011).
Traditional neighbourhood-based CF algorithms are widely used in the RSs field. However, they may present such problems as cold-start (the system is unable to make reliable recommendations due to the lack of initial ratings) and the data sparseness (the number of ratings available is generally very small compared to the number of ratings that need to be provided).
To overcome these and other problems of CF, as well as to improve the quality of rating predictions, we can use hybridization -a combination of different approaches of collaborative and content-based filtering. The hybrid approach is most popular when RSs are developing for commercial sites.

Hybrid model of collaborative filtering
To achieve study purposes, firstly, we introduce the most common current algorithmic approaches for generating personalized buying proposals. Jannach et al. (2011) proposed to divide these approaches into two main groups: collaborative filtering and content-based, information filtering. Naturally, it is possible to apply a hybrid approach, meaning to use different techniques within the RSs.
A new hybrid (bagging) model ST-UIBCF, which allows circumventing the limitations of the methods of IBCF and UBCF, is composed of such components as UBCF (classical method), T-UBCF (UBCF involving text comments to compute the rating matrix), and S-IBCF (IBCF using the properties of predicted objects to calculate their similarity). Thus, the model can use multiple information sources, including text.
To realize this model for the recommendation, the following problems are solved: − development of the algorithm to the optimal model; − preliminary processing and analysis of input data; − transformation of text comments into ratings; − experimenting to find the optimal RS in different configurations; − confirmation or refutation of the hypotheses: 1. Hybrid RS demonstrates more accurate results than individual algorithms (without the use of text comments); 2. The use of text comments as the basis for the consumer rating matrix demonstrates higher-precision results in hybrid models; 3. Using the content component to determine the similarity of the objects improves the accuracy of the IBCF model.
The structure of the proposed hybrid model is shown in Figure1, consists of four main parts. The first part is to generate the first rating (predicted rating No. 1) by the improved IBCF approach based on the principle of combining two types of objects similarity. The second part is responsible for creating the predicted rating No. 2, No. 3 by the UBCF approach, where a part is a regular forecast by the ratings, and the second part is the prediction of convertible set into the numerical indicator of text comments. The three rating types are combined to construct an objective function f in part 4, which is an integrated optimal model. It is configurable by the optimization parameters that are determined during the learning process. Parts 1-3 form the foundation for finalizing the appearance of the hybrid model in part 4.
The efficiency of the RS based on the CF models depends on the calculation of the similarity of the objects. This similarity is determined by internal and external factors. The classical IBCF approach involves only external factors to define similarity by the reckoning of ratings. In fact, the similarity between objects also depends on internal factors, such as the properties of objects that reflect the gained semantic charge (Xu & Raahemi, 2016). In other words, the similarity of the objects depends on both internal and external factors. In this work, internal factors denote the properties of the object, which are used to characterize and depend on object specificity. For instance, if an object is a product, properties can be trade dress, colour, functionality, price, quality, etc.
where = i j K U U -the set of customers that rated both objects i and j. The symbols i r and j r correspond to the average rating of the objects. The internal similarity of the objects ,  i j I can be rated as: where  -the set of characteristics  pP , ( ( ), , )  sim p i j is a measure of the similarity of the object i to the object j according to the characteristic p of the set  , and ()  p -the weight of the characteristic p (Cheng et al., 2017).
A combination of similarity functions is planned to be carried out based on the model of a switching hybridization based on the indicator of the rating matrix for each object.

Journal of International Studies
According to the analysis, it is advisable to combine the internal and external similarities of objects. Data sparsity coefficient can be used to balance external and internal similarities. To do this, we determine the weight function between the internal and external similarity, which includes an element of local similarity to the sigmoid function in the following way: Due to this approach, the value of the function is , which means that internal similarity will always be used since it has already proven its effectiveness (Resnik et al., 1994). Summarizing all the above concerning the integration of similarity functions, the function ultimately gets the form: , , Another important aspect of the model is the rating prediction based on text comments. First, we need to choose the law that is supposed to be distributed data. After that, according to the given examples, the parameters of this distribution are calculated for future classification.
The target class, in this case, is the positive feedback class. As a result, the probability of a comment hit is converted into a discrete five-point rating system by the following rule: The (1) where the vector which specifies the difference between the actual and predicted values, and -the Frobenius regularization function, which further penalizes the model to avoid overfitting problems, and is defined as: The impact of the regularization is controlled by constant  . The constant  controls the extent of regularization and is usually determined by cross-validation.

.
F denotes the Frobenius norm.
Consequently, we propose a new approach to the RS development, which includes a modification of the evaluating the similarity in the context of the IBCF model and the use of text comments as an additional source for the formation of the rating matrix. In the final configuration, the model can be called Similarity and Text improved -User-Item Based Collaborative Filtering (ST-UIBCF).
For the convenient experimental results following, we present the main acronyms of models in Table 1.

RESULTS
In this section, the main experimental results based on the Amazon review data are presented. Also, a comparison of the other research results obtained on the same dataset is provided in the Discussion part.
During the 4 series of experiments using 5-fold cross-validation, the following hypotheses are confirmed: 1. the hybrid system exhibits greater accuracy of predictions due to the mutual compensation of the errors of each ensemble component; 2. CF model can be constructed exclusively on text comments without, at least, loss of accuracy, 3. usage of the content component in determining the similarity of the objects greatly improves the accuracy of the IBCF model.
To test the proposed hybrid model ST-UIBCF, we choose the Amazon Fine Food Reviews dataset, which consists of 568,454 food reviews that Amazon users left up to October 2018 as our dataset (McAuley & Leskovec, 2013). It includes food information from the Amazon Web Store and customers' ratings and text reviews.
The original database is cleared from the unnecessary objects, customers without ratings, duplicate comments, but it still required additional manipulations to implement the ST-UIBCF approach.
First, it is necessary to prepare the basis for improving the IBCF model in the context of using the semantic features of objects while calculating similarity coefficients. To do this, it is necessary to establish a correspondence between different objects as demonstrated in (Cosley et al., 2003).
The second step is to develop the UBCF algorithm based on the transformation of text comments into the numerical rating. For the experiment, the Linear SVM (Support Vector Machine) algorithm is chosen as the main transformation algorithm, which proved its high efficiency in working with the classification of text data in unbalanced samples with a certain class dominance that can be assumed from the initial rating Journal of International Studies Vol.14, No.4, 2021 168 distribution (Press et al., 2007). Assuming that there is a relationship between the numerical ratings and the text reviews, and given the fact that the real rating distribution is significantly shifted towards positive ratings (5), it is more appropriate to use the binary classification, rather than to determine the 5 classes. The distribution of the classification results turned out to be 23% negative and 77% positive ratings. Classification accuracy is 0,89.
During the experiment, four groups of tests with different sizes of samplings are performed: 40%, 60%, 80% and 100% of the basic dataset. Table 2 shows comparison between such methods accuracy: Popular -the recommendation of only popular objects (baseline); classical IBCF; classical UBCF; T-UBCF -UBCF based on text comments; S-IBCF -IBCF with improved similarity mechanism; ST-UIBCF -hybrid model S-IBCF + UBCF + T-UBCF. The results identify ST-UIBCF as the best of the proposed methods. Algorithms T-UBCF and S-IBCF are much more accurate than their classical versions, but they are slightly inferior to ST-UIBCF.
As can be seen in Table 2, the testing results are greatly improving with the increase of the involved dataset up to 80% and 100%. This can be explained in this way. As the percentage of sample coverage increases, the dataset is replenished by new consumers, new objects and new ratings that qualitatively supplement the complete dataset. This allows making higher-precision predictions about the similarity of consumers and objects.
The last step is to analyse the structure of ST-UIBCF in each series of experiments. Every group of experiments is completed by the formation of 5 optimal vectors, which determine the weight of each component in the structure of the hybrid method. The average values of the optimal vectors 1 2 3 ( , , ) = w w w w are shown in Figure 2.
A similar situation is observed in all series of experiments. S-IBCF is a dominant algorithm in a composition that plays a decisive role in forecasting ratings, especially in terms of the increasing dataset. The impact of UBCF and T-UBCF decreased with an increase in the sampling. Most probably, this stems from the fact that more consumers, objects, and ratings provided more opportunities to uncover the potential of the S-IBCF, which increased its impact from 62.4% to 76% while using complete sampling.
Consequently, the experiments carried out to confirm all the introduced hypotheses and, in fact, open the potential to continue the research in the direction of optimization and development of classical CF methods that could compete in efficiency with state-of-art algorithms of machine learning.

CONCLUSIONS AND DISCUSSION
In this section, we conclude our main findings and discuss the possible directions for future research.

Discussion: Other research results based on the Amazon reviews data
Since the publication of Amazon Fine Food Reviews datasets in 2013 (McAuley & Leskovec, 2013), the collection has been used widely for evaluating recommendation techniques. According to Google Scholar, there are currently 397 citations (last checked: March 19, 2021) to the original paper (McAuley & Leskovec, 2013), where the authors use the collection to evaluate their method that models consumers personal evolution or experience for recommendations. To understand how different experience levels affect consumer's taste, they introduced the latent factor model. Chu et al. (2015) use the sets to focus on the inner connection and relationship among consumers and objects by letting it learn consumer's habits and what they purchased. Their models are designed for predicting the rating of different objects based on the data given, such as the time when the review was written, the number of helpfulness voting, as well as the length of the review. The best model they have found is the latent factor model (RMSE=0.78). Zheng et al. (2016) implemented several kinds of regressions with the feature involving text content and the regressions only focus on nontext features. The most powerful regression with the text feature is the mixture of unigrams and bigrams representation (RMSE=1.06).
Most studies chose the latent factor model as the best one to deal with a recommender problem. In the researchers' opinion, this approach has a really good performance on data that has a chaotic and unpredictable feature so far. Later examples using the datasets include Shao et al. (2017) that compared different methods to establish more reasonable and reliable RSs based on latent factor models. As a result, the researchers implemented Matrix Factorization, SVD, Deep Learning, Random Forest and Times Series. For the involved dataset, the Matrix Factorization gives a very good prediction: the train set RMSE is 0.184 and the test set RMSE is 0.188. We accentuate that a powerful pre-processing of data was implemented and some Deep Learning methods were used to build an RS for Amazon users. RMSE value of the ST-UIBCF model is higher, but appropriate RS is less complicated and readily available for the medium-scale Internet Journal of International Studies Vol.14, No.4, 2021 170 portal. The neighbourhood-based CF algorithms are widely used in the RS field for their interpretability and operability.
The conclusions on performance comparisons between different models, stated in the literature, may change when tested on cleaned data. This effect is apparently stronger for more complex methods. It was essential to recall a stronger susceptibility of these methods to overfitting. Thus, it is needful to be cautious in using data for evaluation and competitive comparison of methods (Basaran et al., 2017).

Conclusions
The experimental results identified ST-UIBCF as the best of the proposed methods. They exceeded other algorithms by RMSE value in all four series of experiments (Androniceanu et al., 2020).
The practical value of the research results is a combination of the typical methods of CF that allows to achieve higher accuracy of prediction than while using the individual methods. The proposed approach is efficient to reduce the complexity of optimization, compared with the basic methods of machine learning for solving the problem of recommendation (Androniceanu et al., 2019). The proposed hybrid model is capable to put in place more information sources than individual models. In addition to the ratings, the semantic component and customer feedback are considered to be the factors to achieve accurate and qualitative recommendations and improve classical methods of CF. Moreover, the significant advantages of this hybrid method are no requirements for additional computing power and easiness to integrate into the website architecture in the absence of necessity to update the recommendations in real-time.
According to modern realities, Internet access is a prerequisite for the survival and attraction of new clients, especially for media-oriented enterprises (movie portals, news portals, etc.), regardless of the format of doing business (Mičík & Mičudová, 2018;Saksiriruthai, 2018). The proposed model can become a methodological basis to introduce RSs to the medium-scale Internet portal, such as an e-shop, a movie portal, etc. As a rule, these companies are not able to afford the RSs that require a real-time update, which means the proposed model can give them a chance to positively affect the loyalty and engagement of their customers, to increase the average cost of the consumer basket due to recommendations' accuracy growth without heavy capital commitment.

Future research
The opportunities for hybridization are inexhaustible, and the ST-UIBCF method can be expanded and improved. The next steps in this direction might be: • development and inclusion into the ensemble of the S-UBCF model by analogy to the S-IBCF model. S-UBCF will determine the similarity of consumers based on their socio-demographic attributes; • improvement of the S-IBCF model to the SH-IBCF (Similarity-History Improved Item Based Collaborative Filtering). This approach provides for examination whether the customer purchase history can be an additional effective source of information for accuracy increase of identifying goods similarity; • implementation of a more complex approach to the text sentiment analysis and conversion of the results into a scale from 1 to 5; • rearrangement of the model architecture. Testing various hybrid model configurations to obtain a higher synergy effect.