It seems that the labelling is not done properly.
It may be possible that some images are not labelled. So, it would be good to check the labelling of the input dataset once.
There is another possibility that if the bounding boxes present in input image are already small, then after resizing them, those small bounding boxes will become more smaller. Hence, while processing through first few layers of resNet50, all the useful labelling informations will be lost due to further downsampling of already small bounding boxes. To avoid such incident, you can crop the desired portions from the original image and adjust the bounding boxes accordingly so that after resize they don't become too small.