Comparison To Revisiting Deep Learning Models For Tabular Data​

Discussion On Paper1

The principal aim of the paper was to set out a method for dealing with numerical inputs in a Neural Transformer. This is a difficult task to do properly and formed one of the many important areas of AIBMod’s research.

To test their research they took numerous tabular datasets and tested their model against other deep learning and boosted tree models such as CATBOOST and XGBOOST.

These datasets comprised binary classification, multi-class classification and regression problems.

Unfortunately, they used the normal accuracy metric as the performance measure but the results from the various model runs in the GitHub also show the ROC metrics for the binary classification problems.

In addition, the GitHub also has the precise train, validation and test splits for each dataset greatly facilitating comparisons.

AIBMod vs Paper Results Comparison

DatasetCAADHEJAHIYECOMI
Problem TypeRegressionBinary
Classification
100 Class
Classification
4 Class
Classification
Binary
Classification
Regression7 Class
Classification
Regression
Performance MetricRMSEAccuracyAccuracyAccuracyAccuracyRMSEAccuracyRMSE
Which Is BetterLowerHigherHigherHigherHigherLowerHigherLower
#Objects20,64048,84265,19683,73398,050515,345581,0121,200,192
#Num. features862754289054136
# Cat. features08000000
Model
AIBMod0.4230.8750.4090.740.7368.7210.9740.739
FT-Transformer0.4480.860.3980.7390.7318.7270.9730.742
ResNet0.4780.8570.3980.7340.7318.7700.9670.745
XGBoost0.4310.8740.3770.7290.7288.8190.9690.742
CatBoost0.4230.8740.3880.7270.7298.8370.9680.741
The AIBMod figures are from their own test results. All other figures are from Table 4 of the paper and represent the best result for each model for each dataset.
Bold figures indicate the leading score for each dataset.

Results Discussion

Before discussing the results themselves a quick comment on the datasets. In the table above they are denoted by the same initials as those used in section 4.2 of the paper and the reader can refer to this section for more detail.

As seen in the table above AIBMod’s model could generate a test score better than the scores from all the other models with one exception where it tied the CatBoost score for the CA dataset.

From the datasets above, the two smallest datasets provided the greatest challenge for AIBMod when trying to beat the boosted tree models. With the same two datasets AIBMod was able to comfortably beat the other neural network models. This is likely because of the variance reduction and data augmentation methods developed by AIBMod.

As the datasets grew in size, in the interest of time, AIBMod just took hyperparameters from the smaller dataset models and ran a couple of validation runs to check for stability etc. before applying the model to the test dataset. Therefore, the larger dataset scores represent an almost ‘out of the box’ performance with effectively no hyperparameter searching. It is unlikely, therefore, that these scores represent the best scores possible from the AIBMod model.

Binary Classification ROC Comparison

As mentioned here, ROC2 is a ranking metric – one which has been a key part of AIBMod’s model development.

For each of the binary classification datasets, the ROC results for each model have been extracted from GitHub and shown below:

DatasetADULT INCOME (AD)HIGGS (HI)
Problem TypeBinary
Classification
Binary
Classification
Performance MetricROCROC
Which Is BetterHigherHigher
Model
AIBMod0.9290.817
FT-Transformer0.9150.813
ResNet0.9110.813
XGBoost0.9280.807
CatBoost0.9280.808
The AIBMod figures are from their test results. All other figures have been calculated from the GitHub associated with the paper and have been taken from the runs that created the best accuracy scores in the table above – i.e. they are consistent with each other.
Bold figures indicate the leading score for each dataset.

Results Discussion

Once again the AIBMod model has created ROC metrics which are better than all the other models – even beating the CatBoost score.

  1. Revisiting Deep Learning Models For Tabular Data, Gorishniy et al. – originally published June 2021, updated Oct 2023 ↩︎
  2. ROC here means AUC ROC i.e. the area under the ROC curve ↩︎