Research articles
ScienceAsia 49S (2023):ID 68-77 |doi:
10.2306/scienceasia1513-1874.2023.s003
A modification of logistic regression with imbalanced data:
F-measure-oriented Lasso-logistic regression
Bui T. T. Mya,b,*, Bao Q. Tac
ABSTRACT: Logistic regression (LR) is one of the most popular classifiers. However, LR cannot perform effectively
on imbalanced data. There are two approaches to imbalanced data for LR, including resampling techniques and
modifications to the log-likelihood function. These approaches improve performance measures of LR in some cases,
but their effectiveness is not robust in general. In this paper, we propose a classifier called F-measure-oriented LassoLogistic Regression (F-LLR) to deal with imbalanced data. The base learner of F-LLR is Lasso-Logistic regression (LLR)
which imposes the prior on the magnitude of parameters by a hyper-parameter ?. The optimal ? is determined by an
adjustment of the cross-validation procedure which aims for the highest F-measure instead of the highest accuracy. F-LLR
addresses imbalanced data by the combination of Under-sampling and Synthetic Minority Oversampling Technique
(SMOTE) selectively based on the scores of the training data. The empirical study shows that F-LLR increases F-measure
and KS as compared with LLR and the traditional balanced methods, such as the resampling techniques (Random Undersampling, Random Over-sampling, and SMOTE) and the modifications to log-likelihood function (Ridge and Weighted
likelihood estimation).
Download PDF
25 Downloads 608 Views
a |
Department of Mathematical Economics, Ho Chi Minh University of Banking, Ho Chi Minh City, 7000 Vietnam |
b |
Faculty of Mathematics and Statistics, College of Technology and Design, UEH University, Ho Chi Minh City,
7000 Vietnam |
c |
Department of Mathematics, International University, Vietnam National University, Ho Chi Minh City, 7000 Vietnam |
* Corresponding author, E-mail: mybtt@hub.edu.vn
Received 15 Jan 2023, Accepted 31 Aug 2023
|