As a non-contact and non-invasive technique, hyperspectral imaging enables the collection of highly informative spectral (at different wavelengths) and spatial data on the observed sample. The collected spectra reflect the chemical composition and spatial morphology of the sample. The technique is successfully applied in the food industry, for industrial classification, and in medicine for the detection of various diseases (cancer, diabetes, chronic injuries, and so on). Compared to standard devices in medical radiology, HSI is significantly cheaper, simpler and faster to use.
A fundamental problem in HSI signal analysis is the processing of large amounts of data. Namely, in order to be usable for a specific application, signal processing needs to be done (almost) in real-time. To date, only simpler approaches to HSI analysis have been used, such as determining spectral angle or optical density. However, these procedures are inaccurate and highly complex, and they cannot directly detect the desired physiological and morphological parameters of the considered tissue.
On the other hand, machine learning (ML) can overcome these problems – it is possible to learn very accurate real-time predictive models from the data. Furthermore, adding additional features to the models does not require a thorough reconstruction of the algorithms used. Therefore, the development of appropriate ML tools will have a significant impact to the HSI field.
Some ML techniques exhibit above-average performance and will, therefore, be used in this research. Maximum-margin separation models (e.g., SVMs) are excellent for finding the best discriminant limit in data, globally, but are computationally demanding and potentially unsuitable for learning from big data. In contrast, neural networks are highly expressive models that are optimized iteratively. They are, unfortunately, extremely sensitive to the choice of hyperparameters, which can result in convergence into not particularly good local optima. Furthermore, ensemble learning can achieve low variance while maintaining a high level of model expressiveness. One ML technique that has shown significant potential relatively recently is gradient boosting (specifically the XGboost technique).
Each ML technique is, to some extent, sensitive to outlier values and noise. Therefore, it is crucial to clear the data from noise as much as possible, so that the learned models are as accurate as possible. The relatively low signal-to-noise ratio, a consequence of narrow spectral ranges, is one of the significant disadvantages of HSI. Therefore, the implementation of different noise removal algorithms would significantly facilitate and improve the accuracy of HSI analysis.
Furthermore, learning an over-complicated hypothesis is not desirable and is usually prevented by using some regularisation techniques. In addition to exploring current best options of “shallow” learning, deep learning will also be explored (CNN, AE, RNN, LSTM models, and GAN).
The project is conducted at the Faculty of Engineering, University of Rijeka (www.riteh.uniri.hr).