Comparison To Well-tuned Simple Nets Excel on Tabular Datasets
Discussion on Paper1
The principal aim of the paper was to demonstrate that a simple multi-layer perceptron model with extensive regularisation and data augmentation could produce excellent results on tabular data.
To test their research they took numerous tabular datasets and tested their model against other neural network and boosted tree models such as XGBOOST.
These datasets are comprised entirely of classification problems.
Unfortunately, they used the balanced accuracy metric as the performance measure and didn’t give detailed results from the various model runs so the only results available are those from the paper.
The code behind the results is available in GitHub but the data has not been pre-split into train, validation and test sets. The splitting strategy is embedded in the code so it is possible to reproduce the splits the researchers used – this was confirmed by a member of the research team.
In the interest of time, AIBMod took just two of the datasets which overlapped with the ‘Revisiting Deep Learning Models For Tabular Data’ comparison runs. This meant AIBMod could simply use the same hyperparameters that were used for those comparisons.
AIBMod vs Paper Results Comparison
| Dataset | ADULT INCOME (AD) | HIGGS (HI) |
|---|---|---|
| Problem Type | Binary Classification | Binary Classification |
| Performance Metric | Balanced Accuracy | Balanced Accuracy |
| Which Is Better | Higher | Higher |
| Model | ||
| AIBMod | 84.154% (ROC – 92.936%) | 73.905% (ROC – 82.296%) |
| MLP+C | 82.443% | 73.546% |
| AutoGL. S | 80.557% | 73.798% |
| XGBoost | 79.824% | 72.944% |
Bold figures indicate the best result for each dataset.
Results Discussion
The above shows that AIBMod’s model is able to surpass the results from the other tested models on both datasets.
- Well-tuned Simple Nets Excel on Tabular Datasets, Kadra et al. – originally published June 2021, updated Nov 2021 ↩︎
