项目作者: SLotAbr

项目描述 :
Decoder model for language modelling
高级语言: Python
项目地址: git://github.com/SLotAbr/Decoder_model.git
创建时间: 2021-01-11T17:24:10Z
项目社区:https://github.com/SLotAbr/Decoder_model

开源协议:BSD 3-Clause "New" or "Revised" License

下载


Decoder_model

Decoder model for language modelling

Put your text in “input.txt” and use python 3.* + numpy framework for running Main_loop.py. Also remember about making “parameters” folder.

Be careful with high learning rate: overflowing may occur in exp().

I took some code snippets of Main_loop.py from Andrej Karpathy’s RNN_Char_Level.py. If you still haven’t seen it or the original article, then I highly recommend do it: the article and the code have not just become very popular.

Some notes

This architecture doesn’t work efficiently on char-level: it’s unclear, how to distribute attention between letters. The model achieves much better results on word-level modelling (Byte pair encode also can improve performance).

Some useful links

Possible improvements

  • more efficient MH_attention_mechanism and LayerNorm for evaluation phase
    (or STOP recalculating existing values for previous tokens!)
  • correspond module for eval phase in Decoder_model class
  • multiprocessing feature for Circle operations (e.g. head’s calculating)