It is of great importance to extract and validate an optimal subset of non-dominated features for effective multi-label classification. However, deciding on the best subset of features is an NP-Hard problem and plays a key role in improving the prediction accuracy and the processing time of video datasets. In this study, we propose autoencoders for dimensionality reduction of video data sets and ensemble the features extracted by the multi-objective evolutionary Non-dominated Sorting Genetic Algorithm and the autoencoder. We explore the performance of well-known multi-label classification algorithms for video datasets in terms of prediction accuracy and the number of features used. More specifically, we evaluate Non-dominated Sorting Genetic Algorithm-II, autoencoders, ensemble learning algorithms, Principal Component Analysis, Information Gain, and Correlation Based Feature Selection. Some of these algorithms use feature selection techniques to improve the accuracy of the classification. Experiments are carried out with local feature descriptors extracted from two multi-label datasets, the MIR-Flickr dataset which consists of images and the Wireless Multimedia Sensor dataset that we have generated from our video recordings. Significant improvements in the accuracy performance of the algorithms are observed while the number of features is being reduced.