Journal ID : AMA-11-09-2021-10708
[This article belongs to Volume - 52, Issue - 01]
Total View : 332

Title : Variable Selection Techniques for Classification of Indian Mustard Genotypes: A Simulation Study

Abstract :

In the analysis of high-dimensional data the challenging problem is selecting a useful set of variables among the set of large number of variables. Feature selection reduces the dimensionality of feature space, removes redundant, irrelevant, or noisy data. In this study, comparisons between different variable selection methods were performed. These methods include four methods such as Raoˊs F test, Wilkˊs lambda (Backward and Forward) and Random Forests. A Monte Carlo Simulation study was conducted to compare the performance of various methods of variable selection for classification and discrimination. Random samples with varying sizes (50, 100, 200, 500) were generated using Monte Carlo simulation using means and variance covariance matrices of groups formed on the basis of seed yield and oil content of the 310 genotypes of Indian mustard data set. For samples generated on the basis of seed yield of equal size three methods viz Rao's F test, Wilkˊs lambda (Backward) and Wilkˊs lambda (Forward) were found to have equal performance for (N1=200, N2=200) with least error rate of 18.50 per cent. On comparing the equal sized samples ((N1=50, N2=50), (N1=100, N2=100), (N1=200, N2=200) and (N1=500, N2=500) the most suitable methods for selection of variables affecting oil content with least leave one out cross validation 31.50 percent error rate are Wilkˊs lambda (Backward) and Wilkˊs lambda (Forward) for sample size (N1=100, N2=100).

Full article