Comparison To Revisiting Deep Learning Models For Tabular Data
Discussion On Paper1
The principal aim of the paper was to set out a method for dealing with numerical inputs in a Neural Transformer. This is a difficult task to do properly and formed one of the many important areas of AIBMod’s research.
To test their research they took numerous tabular datasets and tested their model against other deep learning and boosted tree models such as CATBOOST and XGBOOST.
These datasets comprised binary classification, multi-class classification and regression problems.
Unfortunately, they used the normal accuracy metric as the performance measure but the results from the various model runs in the GitHub also show the ROC metrics for the binary classification problems.
In addition, the GitHub also has the precise train, validation and test splits for each dataset greatly facilitating comparisons.
AIBMod vs Paper Results Comparison
| Dataset | CA | AD | HE | JA | HI | YE | CO | MI |
|---|---|---|---|---|---|---|---|---|
| Problem Type | Regression | Binary Classification | 100 Class Classification | 4 Class Classification | Binary Classification | Regression | 7 Class Classification | Regression |
| Performance Metric | RMSE | Accuracy | Accuracy | Accuracy | Accuracy | RMSE | Accuracy | RMSE |
| Which Is Better | Lower | Higher | Higher | Higher | Higher | Lower | Higher | Lower |
| #Objects | 20,640 | 48,842 | 65,196 | 83,733 | 98,050 | 515,345 | 581,012 | 1,200,192 |
| #Num. features | 8 | 6 | 27 | 54 | 28 | 90 | 54 | 136 |
| # Cat. features | 0 | 8 | 0 | 0 | 0 | 0 | 0 | 0 |
| Model | ||||||||
| AIBMod | 0.423 | 0.875 | 0.409 | 0.74 | 0.736 | 8.721 | 0.974 | 0.739 |
| FT-Transformer | 0.448 | 0.86 | 0.398 | 0.739 | 0.731 | 8.727 | 0.973 | 0.742 |
| ResNet | 0.478 | 0.857 | 0.398 | 0.734 | 0.731 | 8.770 | 0.967 | 0.745 |
| XGBoost | 0.431 | 0.874 | 0.377 | 0.729 | 0.728 | 8.819 | 0.969 | 0.742 |
| CatBoost | 0.423 | 0.874 | 0.388 | 0.727 | 0.729 | 8.837 | 0.968 | 0.741 |
Bold figures indicate the leading score for each dataset.
Results Discussion
Before discussing the results themselves a quick comment on the datasets. In the table above they are denoted by the same initials as those used in section 4.2 of the paper and the reader can refer to this section for more detail.
As seen in the table above AIBMod’s model could generate a test score better than the scores from all the other models with one exception where it tied the CatBoost score for the CA dataset.
From the datasets above, the two smallest datasets provided the greatest challenge for AIBMod when trying to beat the boosted tree models. With the same two datasets AIBMod was able to comfortably beat the other neural network models. This is likely because of the variance reduction and data augmentation methods developed by AIBMod.
As the datasets grew in size, in the interest of time, AIBMod just took hyperparameters from the smaller dataset models and ran a couple of validation runs to check for stability etc. before applying the model to the test dataset. Therefore, the larger dataset scores represent an almost ‘out of the box’ performance with effectively no hyperparameter searching. It is unlikely, therefore, that these scores represent the best scores possible from the AIBMod model.
Binary Classification ROC Comparison
As mentioned here, ROC2 is a ranking metric – one which has been a key part of AIBMod’s model development.
For each of the binary classification datasets, the ROC results for each model have been extracted from GitHub and shown below:
| Dataset | ADULT INCOME (AD) | HIGGS (HI) |
|---|---|---|
| Problem Type | Binary Classification | Binary Classification |
| Performance Metric | ROC | ROC |
| Which Is Better | Higher | Higher |
| Model | ||
| AIBMod | 0.929 | 0.817 |
| FT-Transformer | 0.915 | 0.813 |
| ResNet | 0.911 | 0.813 |
| XGBoost | 0.928 | 0.807 |
| CatBoost | 0.928 | 0.808 |
Bold figures indicate the leading score for each dataset.
Results Discussion
Once again the AIBMod model has created ROC metrics which are better than all the other models – even beating the CatBoost score.
- Revisiting Deep Learning Models For Tabular Data, Gorishniy et al. – originally published June 2021, updated Oct 2023 ↩︎
- ROC here means AUC ROC i.e. the area under the ROC curve ↩︎
