Given a vocabulary V , we represent each word w ∈ V by a continuous vector ew ∈ Rd of length d. We define a word embedding matrix W ∈ R|V|×d, representing each word in the vocabulary V. Earlier neural networks often used pre-trained word embeddings such as Word2Vec [Mikolov et al., 2013] or Glove [Pennington et al., 2014]. Using these approaches, the word embedding matrix W is learned in an unsupervised fashion from a large amount of raw text. Word2Vec adapts a predictive feed-forward model, aiming to maximise the prediction probability of a target word, given its surrounding context. Glove achieves this by directly reducing the dimensionality of the co-occurrence counts matrix. Importantly, embeddings learned from both approaches capture the distributional similarity among words. In a parallel trend to using pre-trained word embeddings, several other text-production models have shown that word embeddings can first be initialised randomly and then trained jointly with other network parameters; these jointly trained word embeddings are often fine tuned and better suited to the task.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.