Areas Of Research
Specific Areas of AIBMod Research
In the drive to improve the performance of the Neural Transformer function, and the AIBMod model as a whole, almost every area was examined. The list below gives a high level overview of the key developments:
- Amended the standard Dropout method
- Amended the standard Layer Norm method
- Created a new and flexible Data Augmentation method
- Use a completely different Attention approach which is more computationally and memory efficient than AutoInt – a popular neural transformer model for tabular data – whilst yielding better performance
- Use a different method from Devlin et al. (2019) when creating the ‘classification token’
- Developed an additional ‘step’ for the Neural Transformer algorithm
- Increased the parameterisation of the Neural Transformer so that it can be bigger or smaller in certain places
- Created a robust process for dealing with numerical features (continuous variable, frequencies etc) – critical for business data problems
- Developed a user friendly way of controlling the learning rate in varying ways in different parts of the model
- Changed the parameterisation of the learning rate function as set out in Attention Is All You Need to provide greater flexibility and functionality
- Developed robust ways to deal with missing data
- Developed a robust way of dealing with low frequency categorical values
- Developed a very memory and computationally efficient variance reduction method
- Performed an investigation into the best embedding distribution initialisation
- Performed an investigation into the best kernel distribution initialisation
- Performed an investigation into the best non linear activation function along with kernel variance settings
- Created a robust way for scaling unbalanced binary classification problems and then de-scaling the posterior predictions
- Developed a framework to train the model for feature importance analysis – see here for more
- AIBMod wanted to see what was possible with reasonably modest computing capabilities, so all of its research was carried out on a single Nvidia GTX 1080 Ti GPU card – all models had to fit on the 11GB available on the GPU card. Modern GPU cards easily surpass its capabilities. The Credit Loan Model, which has multiple neural transformers comprising the final model, would greatly benefit from running parts of the model in parallel on multiple GPU cards.
- By training thousands of deep learning models since 2018, AIBMod has learned which training approaches tend to yield well regularised models with stable validation performance statistics over multiple epochs.
- Plus many, many other areas that didn’t appear to yield any obvious benefits.
Specific Research For Unstructured Inputs
Alongside the research and development outlined above, it was necessary to establish the architecture of the AIBMod model that would be able to train on the unstructured nature of the Credit Loan inputs. Additionally, even the process of passing the inputs with their many-to-one mapping properties in a time and memory efficient manner was a non-trivial exercise.
- Prior to RaggedTensors, if inputs to a Neural Transformer function had unequal lengths, the input Tensor would be ‘padded’ with zeros resulting in more memory being utilised and more computational operations than strictly necessary. In the early days of the model development, AIBMod devised a novel way of processing the memory intensive attention calculation so that it would fit within the confines of GPU RAM. Whilst this approach slowed the model down, it meant that the model could fit on a GPU.
- Since the capabilities of RaggedTensors were extended in TensorFlow v2, on the face of it, it was now possible to build a Neural Transformer function which could take unequal length inputs and process them efficiently. Unfortunately, AIBMod found that there are still several implementation limitations of RaggedTensors in TensorFlow v2 meaning it is still not possible to build a straightforward ‘ragged’ Neural Transformer on a GPU.
- In spite of the above, after considerable testing, AIBMod has produced a RaggedTensor implementation of a Neural Transformer in TensorFlow v2 which meets its requirements and runs on a GPU. Hence, the AIBMod model can efficiently process unstructured data inputs without needing zero-padded tensors and their associated problems.
- Furthermore, AIBMod has extensively researched how to efficiently process unstructured inputs, not only in the Neural Transformer function alone, but through the entire model process leading to an optimised RAM footprint on the GPU and computational efficiency.
AIBMod Deep Learning Development Framework
When the founder of AIBMod (the ‘Founder‘) began the development of the model in 2018 one of the first choices was which deep learning framework to use.
At that time, all of the models the Founder used to understand how deep learning models were put together were built with TensorFlow. In fact, the Founder found, on the whole, research models were mostly coded using TensorFlow back then.
Hence, it was a relatively straightforward choice for the Founder to select TensorFlow as the framework for the model.
However, TensorFlow v1, as it was back in 2018, was not a particularly user friendly framework for a novice to work with. Neural Networks were built in ‘graphs’ with ‘sessions’, but once a user became accustomed to it, it was a very flexible and powerful development environment. The Founder quickly adapted to the TensorFlow framework.
Additionally, the Founder developed various techniques to debug TensorFlow models – something that was not facilitated ‘out of the box’. Together with Tensorboard, a neural network visualisition framework, the Founder became adept at building highly complex deep learning models, visualising the activations and intermediary steps and extracting tensors for further analysis as and when required.
By building his own version of standard functions, he was able to gain insights into ‘how they worked’ and then extend these principles to his own needs.
Towards the end of 2019, TensorFlow v2 was released with a new front end for building deep learning models. However, ‘under the hood’ TensorFlow v2 still used the old v1 ‘graph’ and ‘session’ framework – so the v2 framework was a step to attempt to make TensorFlow more user friendly.
As the Founder had already written a considerable amount of code using v1, he continued to use the last supported v1 framework until the second half of 2023, when he switched to the v2 framework for, alongside various memory and performance improvements, the RaggedTensor functionality that was considerably more developed in v2 than v1.
It turned out that it is possible to run code still using the ‘graph’ and ‘session’ structure developed in the TensorFlow v1 framework, with a few changes and declarations, in TensorFlow v2 making the switch relatively simple.
Thoughts On Deep Learning Frameworks
A few years ago the Founder heard a talk by Christain Szegedy.
Christain is a legend in the AI community – reading his bio and achievements is a very humbling experience.
During Christain’s talk, among the many interesting things he spoke about, was a comment on the ability of modern day data scientists to develop new deep learning models.
He observed that many data scientists have taken to using prefabricated deep learning libraries and applying them to new and different user cases. What they are not doing is forensically examining the theory and code underlying existing libraries to see if they can be improved or adapted.
The Founder believes this to be partly down to the push towards creating high level deep learning frameworks which encourages users to ‘dip their toe’ into machine learning whilst not having to get into the details.
There are good and bad sides to this.
The good side is that it encourages people to become involved with machine learning, reducing the barriers to entry, which obviously should be encouraged.
The bad side is that, perhaps, unless one understands the context of why functions were built and the mathematical reasoning behind them, and the sanctity of preventing information leaks between training and validation / test data sets, a user may be led into a false sense of security just because a model ‘runs’ and ‘trains’ and ‘gives good results’.
Another potential bad side is the inability to research and develop new deep learning methods if one is simply used to taking prefabricated models and reusing them. It would be very difficult for such a user to change the way a standard library works internally or adapt an existing standard function to work with a new model idea, design, or input requirement.
For instance, back in 2018, the Founder started with a Google implementation of a Neural Transformer in a library called tensor2tensor, written in TensorFlow. The Founder discovered that, after passing the basic credit loan dataset through the Neural Transformer, changing hyperparameters alone could not challenge the performance of XGBoost.
It was only because the Founder was able to understand the Neural Transformer, and other standard deep learning approaches, and the mathematics behind it all that he was able to develop these methods further, come up with his own ideas. and drive forward the overall performance of the model.
