This repository describes our system for the task of Metaphor Detection.
data_preparation.pyis used for constructing datasets in the format of https://github.com/RuiMao1988/Sequential-Metaphor-Identification/tree/master/data which are prepared by https://github.com/gao-g/metaphor-in-context.model.pycontains all the model classes.util.pycontains all the helper functions.main_toefl.pycontains the code for loading and running experiments on the TOEFL dataset.main_vua.pycontains the code for loading and running experiments on the VUA dataset.
- The environment used is python 3.6 with pytorch 1.4 with standard libraries - allennlp, sklearn, numpy, pandas, matplotlib, nltk, tqdm etc.
- Run
python util.pyto make required directories. - Download GloVe embeddings from here, unzip them and place the text file in ./data/ folder.
- Download VUA data from here and prepare the following files -
vuamc_corpus_train.csv,vuamc_corpus_test.csv,all_pos_test_tokens.csvandverb_test_tokens.csvand place all of these in ./data/vua/ folder. - For downloading TOEFL dataset, you need to fill an agreement here. Next, rename the essays/ folder of training partition as train_essays/ and place it in ./data/toefl/ folder, similarly rename essays/ folder from test partition as test_essays/ and place it in ./data/toefl/ folder. Also, place
all_pos_test_tokens.csvandverb_test_tokens.csvin ./data/toefl/ folder.
- Run
python data_preparation.py [option], where option vua creates all files (including ELMo vectors) for the VUA dataset and toefl for the TOEFL dataset. This script also splits the training dataset into train and validation sub parts. Note it takes time to compute the ELMo vectors. - Run
python main_xyz.pyto run the experiments on the respective dataset. It will store the produced graphs in ./graphs/xyz/ folder. It also produces the test predictions which are stored asxyz_all_pos_pred.csvandxyz_verb_pred.csvin the ./predictions/ folder.
- The outputs here are expected to match the results reported in paper for the single run case.
- For ensembling, the code is not provided, one can run different models by varying hyperparameters of the model (as mentioned in paper) and aggregate by majority voting.
- Structure of files is adapted from https://github.com/gao-g/metaphor-in-context
- Transformer model is adapted from https://github.com/pbloem/former
- If you find this work useful, consider citing it:
@inproceedings{kumar-sharma-2020-character,
title = "Character aware models with similarity learning for metaphor detection",
author = "Kumar, Tarun and
Sharma, Yashvardhan",
booktitle = "Proceedings of the Second Workshop on Figurative Language Processing",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.figlang-1.18",
pages = "116--125",
}