posted by Alex Mathews

The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems.

One such style is descriptions with emotions, which is commonplace in everyday communication, and influences decision-making and interpersonal relationships. We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions. Of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment.

Sample Results

Examples of captions generated by SentiCap. The captions in columns a and b express a positive sentiment, while the captions in columns c and d express a negative sentiment. The coloring behind words indicates the weight given to the model trained on the sentiment dataset. The darker the coloring the higher the weight. See the paper for full details.


The paper is published in AAAI 2016 SentiCap: Generating Image Descriptions with Sentiments, by Alex Mathews, Lexing Xie, Xuming He.

  • A combined PDF of the paper and supplemental material is here.
  • Example results: sentences with positive sentiment and negative sentiment.
  • The SentiCap dataset collected from Amazon mTurk is here.
  • The list of Adjective Noun Pairs (ANPs) is here

January 3, 2016
284 words

deeplearning vision language

Recent updates

Getting in touch:
-- drop a line if you are interested in knowing more about our work, collaborating, or joining us.
We have two PhD openings in 2018: one on Modeling Online Attention and one on Picturing Everyday Knowledge. We are also looking for research fellow candidates with passion and compelling track record.
comments powered by Disqus