Sale action recommendation using LSTM

Motivation:

Every company have a sale team to sell their products. With small companies, the selling process (such as: send email, how/when/what to response to customer, etc) are commonly pass down in the internal training process. But with a big company, with a large number of new employees and different selling procedure, customer pool, etc which make the training procedure became inefficient and costly. Thus, we developed a recommendation system that simulate the best sale agent and suggest which action should be taken to train new employees as well as improve the overall efficient of the sale team.

Problem:

We are provided with an (artificial generated) dataset with the information of the best sale agents, the customer information and the sale procedure of previous closed deals. In details, this is a sequence-to-sequence problem where the input is Customer Information and the current action sequence from 0 to k time-steps, the output is the predicted sequence from k to n time-steps, the target are the completed action sequences (from 0 to n time-steps, n is dynamic) of the best sale agents.

Method:

The solution that we choose to solve this problem is the Long Short Term Memory (LSTM) with Attention. The architecture of our model can be see below:

 

This is a typical LSTM with Attention model. To satisfied the requirement of dynamic length for each action sequence and increase training speed, we padded each output sequence to the same length as the longest one (empirical tests shown that there is not much different in result between padding input sequence or not, so we also padded the input sequence to increase training/testing speed).

Since the Customer information is an important factor to predict the output, but there are no convention way to incorporate this static information in our model, we chose to “encode” this information through a Dense layer and used it as the initial state of the Cell State and Hidden State of the Bi-LSTM layer in our model (Blue color in the figure). We also found that if we  concatenated this information before predicting each output step, the result is slightly improved (on our generated dataset).

Result and Discussion:

Overall, with the artificial generated dataset that contain ~ 25% noise, our proposed model achieved 70.5% accuracy. Looking closer to how the dataset was generated, we found that the output is heavily influenced by the Customer Information, not the input sequence action. Thus, we trained a Multilayer Perceptron model that consisted of 5 Dense layers along with Batch Normalization, where the Customer Information and Input Action sequence are flatten and padded as the model input. This model result in 72.3% accuracy on the same test set. Although the result of MLP model are higher than LSTM, this phenomenon might only occur with our simple artificial data where input sequence have less influence on the output.