The task of automatically extracting large homogeneous datasets of medical images based on detailed criteria and/or semantic similarity can be challenging because the acquisition and storage of medical images in clinical practice is not fully standardised and can be prone to errors, which are often made unintentionally by medical professionals during manual input. In this paper, we propose an algorithm for learning cluster-oriented representations of medical images by fusing images with partially observable DICOM tags. Pairwise relations are modelled by thresholding the Gower distance measure which is calculated using eight DICOM tags. We trained the models using 30,000 images, and we tested them using a disjoint test set consisting of 8000 images, gathered retrospectively from the PACS repository of the Clinical Hospital Centre Rijeka in 2017. We compare our method against the standard and deep unsupervised clustering algorithms, as well as the popular semi-supervised algorithms combined with the most commonly used feature descriptors. Our model achieves an NMI score of 0.584 with respect to the anatomic region, and an NMI score of 0.793 with respect to the modality. The results suggest that DICOM data can be used to generate pairwise constraints that can help improve medical images clustering, even when using only a small number of constraints.