Recently, research on image captioning is to generate the proper description for an image given in English. No previous research has been found on image captioning to generating description in Bahasa Indonesia. In fact, quoted from Wikipedia, Bahasa Indonesia is spoken by 198.7 million people worldwide and ranked 10th for the most used languages. This paper focuses on developing a generative model connecting machine translation and computer vision to generate image description in Bahasa Indonesia. The model uses the pre-trained inception-v3 image embedding model stacked with Gated Recurrent Unit (GRU) layer. The proposed model has been trained and validated with the translated Flickr30K dataset and obtained BLEU-1, BLEU-2, BLEU-3, BLEU-4 score of 36, 17, 6, 2 respectively.