Comparison To Revisiting Deep Learning Models For Tabular Data

Discussion On Paper¹

The principal aim of the paper was to set out a method for dealing with numerical inputs in a Neural Transformer. This is a difficult task to do properly and formed one of the many important areas of AIBMod’s research.

To test their research they took numerous tabular datasets and tested their model against other deep learning and boosted tree models such as CATBOOST and XGBOOST.

These datasets comprised binary classification, multi-class classification and regression problems.

Unfortunately, they used the normal accuracy metric as the performance measure but the results from the various model runs in the GitHub also show the ROC metrics for the binary classification problems.

In addition, the GitHub also has the precise train, validation and test splits for each dataset greatly facilitating comparisons.

AIBMod vs Paper Results Comparison

Dataset	CA	AD	HE	JA	HI	YE	CO	MI
Problem Type	Regression	Binary Classification	100 Class Classification	4 Class Classification	Binary Classification	Regression	7 Class Classification	Regression
Performance Metric	RMSE	Accuracy	Accuracy	Accuracy	Accuracy	RMSE	Accuracy	RMSE
Which Is Better	Lower	Higher	Higher	Higher	Higher	Lower	Higher	Lower
#Objects	20,640	48,842	65,196	83,733	98,050	515,345	581,012	1,200,192
#Num. features	8	6	27	54	28	90	54	136
# Cat. features	0	8	0	0	0	0	0	0
Model
AIBMod	0.423	0.875	0.409	0.74	0.736	8.721	0.974	0.739
FT-Transformer	0.448	0.86	0.398	0.739	0.731	8.727	0.973	0.742
ResNet	0.478	0.857	0.398	0.734	0.731	8.770	0.967	0.745
XGBoost	0.431	0.874	0.377	0.729	0.728	8.819	0.969	0.742
CatBoost	0.423	0.874	0.388	0.727	0.729	8.837	0.968	0.741

The AIBMod figures are from their own test results. All other figures are from Table 4 of the paper and represent the best result for each model for each dataset.
Bold figures indicate the leading score for each dataset.

Results Discussion

Before discussing the results themselves a quick comment on the datasets. In the table above they are denoted by the same initials as those used in section 4.2 of the paper and the reader can refer to this section for more detail.

As seen in the table above AIBMod’s model could generate a test score better than the scores from all the other models with one exception where it tied the CatBoost score for the CA dataset.

From the datasets above, the two smallest datasets provided the greatest challenge for AIBMod when trying to beat the boosted tree models. With the same two datasets AIBMod was able to comfortably beat the other neural network models. This is likely because of the variance reduction and data augmentation methods developed by AIBMod.

As the datasets grew in size, in the interest of time, AIBMod just took hyperparameters from the smaller dataset models and ran a couple of validation runs to check for stability etc. before applying the model to the test dataset. Therefore, the larger dataset scores represent an almost ‘out of the box’ performance with effectively no hyperparameter searching. It is unlikely, therefore, that these scores represent the best scores possible from the AIBMod model.

Binary Classification ROC Comparison

As mentioned here, ROC² is a ranking metric – one which has been a key part of AIBMod’s model development.

For each of the binary classification datasets, the ROC results for each model have been extracted from GitHub and shown below:

Dataset	ADULT INCOME (AD)	HIGGS (HI)
Problem Type	Binary Classification	Binary Classification
Performance Metric	ROC	ROC
Which Is Better	Higher	Higher
Model
AIBMod	0.929	0.817
FT-Transformer	0.915	0.813
ResNet	0.911	0.813
XGBoost	0.928	0.807
CatBoost	0.928	0.808

The AIBMod figures are from their test results. All other figures have been calculated from the GitHub associated with the paper and have been taken from the runs that created the best accuracy scores in the table above – i.e. they are consistent with each other.
Bold figures indicate the leading score for each dataset.

Results Discussion

Once again the AIBMod model has created ROC metrics which are better than all the other models – even beating the CatBoost score.

Revisiting Deep Learning Models For Tabular Data, Gorishniy et al. – originally published June 2021, updated Oct 2023 ↩︎
ROC here means AUC ROC i.e. the area under the ROC curve ↩︎

Comparison To Revisiting Deep Learning Models For Tabular Data​

Discussion On Paper1

AIBMod vs Paper Results Comparison

Results Discussion

Binary Classification ROC Comparison

Results Discussion

Comparison To Revisiting Deep Learning Models For Tabular Data

Discussion On Paper¹