I pity those of you who want to set the project up and running on your local env.
If you want to try still, you'll probably want to use Ubuntu with MongoDB.
That being said, the system is adapted to a very specific set of data which I cannot distribute for legal reasons. The reason I am sharing the code is because, well, maybe someone takes some good ideas from here one of these days!
- In this particular case, the DB is a MongoDB.
To accomplish this, we generate training data taken from stored real ratings from users to products. We take as many transactions userXproduct --> rating as possible, then for each one we extract the logged rating as well as the characteristics of the user (gender, nationality, age and sex) and the product (average rating and category) and then send to the NN a training set of user_characteristicsXproduct_characteristics --> rating.
This results in our NN being trained and ready to make predictions. We can use it now to process the holes in the rating of our sparse rating matrix of usersXproducts. For this we just repeat the same process, taking combinations of {user, product} and calling the predict() method of the NN.
In order to answer to the server blazingly fast, we do the recommendation job before in a continuous batch process that does the thing that I just described and then stores for each user a list of categories with the recomendations. As mentioned before, each category holds 10 products of the category it represents. All categories are calculated but only three are stored as final and sent back to the server, those that have the highest rating summing up the top 10 individual product estimated ratings.
Interestingly, I noticed that the RLU learned much faster than other neuron models. My best guess is that this is because my inputs were not normalized between [0, 1]. Instead, I was sending values as age in a range of [14, 100] (and RLU does not have upper activation limit).I observed that the NN was good at determining things like ageing being a good or a bad factor depending on the product category (say, good for books, bad for videogames) also making distinctions about gender, and giving almost always a big weight to the average rating.
It is quite impressing how a simple model like this one was capable of learning so much with a reduced set of data. It would be definitely worth it taking a look at what's going on in Spotify's recommender, for instance.