Image caption generator is a process of recognizing the context of an image and annotating it with relevant captions using deep learning and computer vision.
Further, it can be improved by using an attention mechanism I ran this model for quite a little time so there is some difference between the actual and predicted output Here is the prediction of the image which is not used to train the model, you can see the pictures in the "test_images" folder.
Example 1 CORRECT: Dog on a beach by the ocean
Example 1 OUTPUT: a dog is running through the grass .
Example 2 CORRECT: Child holding red frisbee outdoors
Example 2 OUTPUT: a young boy in a red shirt is playing with a toy .
Example 3 CORRECT: Bus driving by parked cars
Example 3 OUTPUT: a man in a black shirt and a black hat is standing on a sidewalk .
Example 4 CORRECT: A small boat in the ocean
Example 4 OUTPUT: a man is riding a bicycle on a dirt bike .
Example 5 CORRECT: A cowboy riding a horse in the desert
Example 5 OUTPUT: two men are standing on a beach .
You can find the paper about image captioning at this link https://arxiv.org/pdf/1502.03044v3.pdf