# Overview of Emotion Models¶

## How to define emotion or affect?¶

#### Categorical Model¶

Tomkins Izard Plutchik Ortony Ekman
Joy Enjoyment Joy Joy Happiness
Fear Fear Fear Fear Fear
Anger Anger Anger Anger Anger
Disgust Disgust Disgust Disgust Disgust
Surprise Surprise Surprise Surprise Surprise
Interest Interest Acceptance
Shame Shame Anticipation
Shyness
Guilt

#### Dimensional Model¶

valence (positive/negative) arousal (excited/calm) dominance (feeling control/not)
neutral 5 5 5
fear 3.2 5.92 3.6
joy 7.4 5.73 6.2
anger 2.55 6.60 5.05

# Datasets¶

### Text fragments based datasets¶

Provide emotional estimations for pieces of text. For example, for newspaper headlines.

## SemEval-2007 "Affective Text"¶

1200 headlines, Ekman categories, scores from 0 to 100, variance

<instance id="1">Mortar assault leaves at least 18 dead<instance>

id anger disgust fear joy sadness surprise
1 22 2 60 0 64 0

Other:

# Datasets¶

## Words based datasets¶

SentiWordNet polarity (positive, negative, objective)

WordNet Affect(not public, but...) Hierarchycal emotions, can be flattened to near-traditional categories

ANEW (Affective Norm of English Words) - normative dimensional emotional ratings of 14,000 English words. Exists analogous researches and datasets for many languages with the same dimensions.

# Vector Space Model (VSM) on categorical emotions¶

• As training data we use documents marked with emotion categories labels.

• Similarity in terms of bag of words and Tf-Idf.

• For input query (document) we find n most similar documents and use mean of their emotion labels as a result.

TexEmo project relies on this technique. But no real results or solutions from this project, only paper.

wiq = tfiq * idfi; wid = tfid * idfi;

tfi(q or d): frequency of a term i in document d or query q;

idfi = Inverse document frequency: measures the rarity of a term i in all documents.

# Three-dimensional estimation model¶

\begin{aligned} \overline{w} = (valence, arousal, dominance) = ANEW(w)\text{, w - word} \end{aligned}

\begin{aligned} \overline{sentence} = \frac{\sum_{i=1}^{n}\overline{w}}{n}\text{, n - number of words in sentence} \end{aligned}

To increase ANEW size we can use synonims from WordNet dataset. Take synonims for keyword, Valence-Arousal-Dominance metric for each synonim from ANEW and produce mean from synonims on each dimension as a result keyword value.

\begin{aligned} \overline{emotion} = \frac{\sum_{i=1}^{k}\overline{w}}{k}\text{, k - keyword's synonims count} \end{aligned}

# Latent Symantic Analysis / Indexing (LSA/LSI)¶

1. British police know the whereabouts of the founder of Wikileaks
2. In the US court begins trial against the Russians who sent spam
3. Nobel Peace Prize award ceremony boycotted 19 countries
4. In the UK arrested the founder of the website Wikileaks Julian Assange
5. Ukraine ignores the Nobel Prize award ceremony
6. A Swedish court has refused to hear an appeal of the founder of Wikileaks
7. NATO and the US have developed plans for the defense of the Baltic countries against Russia
8. UK police found the founder of Wikileaks, but not arrested
9. In Stockholm and Oslo today will be awarded the Nobel Prize

Words that occur more than once are highlighted

T1 T2 T3 T4 T5 T6 T7 T8 T9
wikileaks 1 0 0 1 0 1 0 1 0
arrested 0 0 0 1 0 0 0 1 0
UK 0 0 0 1 0 0 0 1 0
award 0 0 1 0 1 0 0 0 1
Nobel 0 0 1 0 1 0 0 0 1
founder 1 0 0 1 0 1 0 1 0
police 1 0 0 0 0 0 0 1 0
prize 0 0 1 0 1 0 0 0 1
against 0 1 0 0 0 0 1 0 0
countries 0 0 1 0 0 0 1 0 0
court 0 1 0 0 0 1 0 0 0
US 0 1 0 0 0 0 1 0 0
ceremony 0 0 1 0 1 0 0 0 0

## Singular value decomposition (SVD)¶

Factorization of m × n matrix $M = U\Sigma V^*$

$\Sigma$ is a m × n diagonal matrix with non-negative real numbers on the diagonal. The diagonal entries $\sigma_i$ of $\Sigma$ are known as the singular values of M.

In [110]:
Image('data/Emotional_Analysis/svd.png', width=1000, height=800)

Out[110]:

## Truncated SVD¶

In [50]:
Image('data/Emotional_Analysis/svd_truncated.png', width=1000, height=600)

Out[50]:
In [45]:
fig = plt.figure(figsize=(10,10))
words = [((0.57, -0.01), "wikileaks + founder", (-0.06, 0.01)),
((0.34, -0), "arrested + UK", (0, 0.02)),
((0, 0.52), "award + prize + Nobel", (0.01, 0.01)),
((0.31, -0), "police", (-0.01, -0.03)),
((0, 0.52), "award", (0.01, 0.01)),
((0.02, 0.03), "against", (0.01, 0.01)),
((0.01, 0.22), "countries", (0.01, 0.01)),
((0.12, 0.01), "court", (0.01, 0.01)),
((0.02, 0.01), "US", (0, -0.02)),
((0, 0.38), "ceremony", (0.01, 0.01))]
documents = [((0.43, -0), "T1", (0.01, -0.02)),
((0.05, 0.02), "T2", (0.01, -0.01)),
((0.01, 0.65), "T3", (0.01, 0.01)),
((0.54, -0.01), "T4", (0, -0.03)),
((0, 0.59), "T5", (0.01, 0.01)),
((0.37, -0), "T6", (0.01, -0.02)),
((0.01, 0.09), "T7", (0.01, 0.01)),
((0.63, -0.01), "T8", (0, -0.03)),
((0, 0.47), "T9", (0.01, 0.01))]
for word_plot_info in words:
plt.scatter(word_plot_info[0][0], word_plot_info[0][1], color='blue')
color='blue', size='16')
for doc_plot_info in documents:
plt.scatter(doc_plot_info[0][0], doc_plot_info[0][1], color='orange')
color='orange', size='16')
plt.plot([0.3, 0.3], [-1, 1], color='red')
plt.plot([-1, 1], [0.3, 0.3], color='red')
plt.axis([-0.1, 0.8, -0.1, 0.7])
plt.show()


## Non-Negative Matrix Factorization (NMF)¶

\begin{aligned} \text{Minimizing error: } F(W,H) = \min_{W, H}\Arrowvert V-WH\Arrowvert^2_F; W, H \geq 0 \end{aligned}\begin{aligned} \text{Frobenius Norm: } \Arrowvert A\Arrowvert_F = \sqrt{\sum\limits_{i=1}^m\sum\limits_{j=1}^n \big\vert a_{ij}\big\vert^2} \end{aligned}

# What's wrong with dimension reduction models?¶

• They are working with document labels.
• They select most similar document for input test query.
• So, to have fine emotional classification - we need to have document with high similarity in our test set for each query.
• We need to have really large train dataset.
• This methods require extremely careful selection of train dataset, because they are unsupervised and can't learn to separate emotions - they just make document-terms matrix factorisation.
• So... NON-PROFIT!
• Looks like pure dimension reduction models good only for classification of sentences like newspapers headlines or sentences from fairytails.
• They require documents about the same size.
• Hard to understand what's going on in SVD. You will not understand what singular vector means.

# Word-based models problems¶

All this methods absolutely not take into account text semantic structure, when methods based on sentences/fragments can take in account some language structures.

For example, “I laughed at him” and “He laughed at me” would suggest different emotions from the first person’s perspective.

Absence of semantic structure leads and connected with second problem: words ambiguity.