Sunday, November 1, 2015

Convert binary word2vec model to text vectors

If you have a binary model generated from google's awesome and super fast word2vec word embeddings tool, you can easily use python with gensim to convert this to a text representation of the word vectors.

Input: binary word embedding model from google's word2vec tool

Output: text vectors for word embeddings

Python conversion code:
from gensim.models import word2vec
model = word2vec.Word2Vec.load_word2vec_format('path/to/mymodel.bin', binary=True)
model.save_word2vec_format('path/to/
mymodel.txt', binary=False)

I recommend using Anaconda from Continuum Analytics for a bundled python distribution.  To install gensim in Anaconda just type: conda install gensim :)


Original ref: https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/13828/how-to-convert-bin-file-of-word2vec-model-into-txt-r/91564

2 comments: