Hi @Ahmed, thank you for sharing the details. You're approaching this correctly, and the observations you've made are valid.     1. Loss per Epoch vs. Iteration: Plotting loss per epoch is generally recommended, especially when using contrastive loss, which often results in noisy per-iteration values. An epoch-wise plot provides a clearer view of overall training trends, which aligns with what you're seeing.
    2. Accuracy with Contrastive Loss: Contrastive loss does not directly produce class predictions, so conventional accuracy metrics are not applicable. However, you can evaluate performance using:
- k-NN accuracy in the learned embedding space.
- A similarity threshold (e.g., Euclidean or cosine distance) to convert distances into binary classification (similar vs. dissimilar).
- Alternative metrics such as precision/recall or ROC-AUC on pairwise comparisons.
    3. Overfitting and Model PerformanceThe training loss decreasing while validation loss remains high suggests overfitting. In addition to L2 regularization and dropout, you may consider:
- Data augmentation to increase variability.
- Batch normalization to stabilize training.
- Reducing network complexity, such as fewer layers or neurons.
- Early stopping based on validation loss trends.
- Balanced batch construction, ensuring diverse positive and negative pairs.
- Embedding normalization, constraining output vectors to unit norm.
- Temperature scaling, if using NT-Xent or similar, to adjust contrastive loss sensitivity.
These strategies can help improve generalization and training stability in contrastive learning setups. Hope this helps!