Concept-based Learning (CBL) in machine learning is an approach that emphasizes using high-level concepts from raw features for predictions, enhancing model interpretability and efficiency. CBL models, such as Concept-based Bottleneck Models (CBMs), compress input features into a low-dimensional space to capture essential data while discarding non-essential information. This process enhances explainability in tasks like image and speech recognition.
A Concept-based Bottleneck Model (CBM) works by compressing input features into a low-dimensional space, capturing essential data while discarding non-essential information. This process enhances interpretability in tasks like image and speech recognition. CBMs have a two-module structure: one module learns concepts, and the other predicts the target variable based on these concepts. Innovations like concept embedding models and probabilistic CBMs have further improved their interpretability and accuracy.
Multiple Instance Learning (MIL) plays a role in CBL by labeling groups of data, called bags, instead of individual instances. In the context of image recognition, MIL can cluster image patches and assign probabilities based on overall image labels, inferring individual patch labels. This approach helps in situations where individual labels are unknown or difficult to obtain, allowing the model to learn from the available data and make predictions based on concept probabilities.