YouTube改进之transformer.docx
YouTube改进之transformer可谓是无人不知无人不晓这个玩意必然也是应知应会的题目这是根本的问题与FM是一样的。ForRecommendationinDeeplearningQQSecondGroup102948747ForVisualindeeplearningQQGroup629530787Imherewaitingforyou不承受这个网页的私聊/私信本宝宝长期征集真实情感经历发在我公号美妙光阴与你同行长期承受付费咨询啥问题都可付费改代码。付费咨询专属效劳快人一步1-关于序列推荐的本质序列推荐在本质上也是由多个点击的item预测将来要点击的多个item这就是seq2seq。在翻译中不也是如此么2-几个函数2.1tf-function中的input_signaturetf.function.deff(x):.returnx1vectortf.constant(1.0,1.0)matrixtf.constant(3.0)f.get_concrete_function(vector)isf.get_concrete_function(matrix)Falsetf.function(.input_signaturetf.TensorSpec(shapeNone,dtypetf.float32).deff(x):.returnx1vectortf.constant(1.0,1.0)matrixtf.constant(3.0)f.get_concrete_function(vector)isf.get_concrete_function(matrix)Trueget_concrete_function(*args,*kwargs)methodoftensorflow.python.eager.def_function.FunctioninstanceReturnsaConcreteFunctionspecializedtoinputsandexecutioncontext.IfthisFunctionwascreatedwithaninput_signature,argsandkwargsmaybeomitted.WithaninputsignaturethereisonlyoneconcretefunctionassociatedwiththisFunction.Aninputsignaturecanbeoptionallyprovidedtotf.functiontocontrolthegraphstraced.TheinputsignaturespecifiestheshapeandtypeofeachTensorargumenttothefunctionusingatf.TensorSpecobject.Moregeneralshapescanbeused.ThisisusefultoavoidcreatingmultiplegraphswhenTensorshavedynamicshapes.ItalsorestrictstheshapeanddatatypeofTensorsthatcanbeused:只要输入上面的参数shape及数据类型那么就是指定同一个function2.2-rsqrt开根号后求倒数tf.math.rsqrt(3.0,0,-4.0)tf.Tensor:shape(3,),dtypefloat32,numpyarray(0.57735026,inf,nan,dtypefloat32)tf.math.sqrt(3.0,0,-4.0)tf.Tensor:shape(3,),dtypefloat32,numpyarray(1.7320508,0.,nan,dtypefloat32)1/tf.math.sqrt(3.0,0,-4.0)tf.Tensor:shape(3,),dtypefloat32,numpyarray(0.57735026,inf,nan,dtypefloat32)负数开根号为非数非数的倒数还是非数。2.3-增加两个维度expand_dimsktf.Tensor:shape(3,4),dtypefloat32,numpyarray(-0.19508752,-0.24705486,-1.4569125,-0.48979878,0.3164492,-0.01150408,0.45663917,-0.8849148,0.31029478,-1.5752182,1.4130656,0.41960722,dtypefloat32)tf.expand_dims(k,-1)#这个用两次啊k2k:,:,tf.newaxis,tf.newaxisk2.shapeTensorShape(3,4,1,1)3-multi-head-attention多头假如是一个头那就是常规的多个头只是增加了多个qkv仅此而已。假如单纯用多头attention是可以进展增加维度比方操作的但假如要加那么就必须保证维度一致不然没法加啊。matMultiHeadAttention(d_model,num_heads)x.shapeTensorShape(12,10,16)resmat(x,x,x,mask)res.shapeTensorShape(12,10,32)d_modelnum_headsEncoder单层即最上图的左边modelEncoderLayer(num_heads,x.shape-1,dense_dim)resmodel(x,True,mask)res.shapeTensorShape(12,10,16)x.shapeTensorShape(12,10,16)3.1-由于mask采用的是下面的形式与常见的mask可能相反self.masks1-tf.tile(mask:,:,tf.newaxis,tf.newaxis,1,1,self.seq_len,self.num_heads)然后就用了一层encoder此时仍旧比要差一点item_his_ebself.encoder(item_his_eb,True,self.masks)item_his_ebself.dense3(tf.transpose(item_his_eb,0,2,1)self.user_ebtf.squeeze(item_his_eb)0.054389870.287213240.055503190.099879470.013727053.2-当mask采用原来的mask仅仅修改上面的mask其他不改那么如下,似乎没多大影响啊self.maskstf.tile(mask:,:,tf.newaxis,tf.newaxis,1,1,self.seq_len,self.num_heads)0.05436630.28290850.044410360.098982860.010828323.3-在3.1的根底上去掉刚开场的mask效果变差item_his_ebitem_list_add_pos#tf.multiply(item_list_add_pos,tf.expand_dims(mask,-1)#B,maxlen,dim0.052112360.276172460.04367140.095786440.010405243.4-在3.1的根底上去掉mask反而是目前最正确的结果讲明mask在attention中没啥用啊item_his_ebself.encoder(item_his_eb,True,None)#self.masks)item_his_ebself.dense3(tf.transpose(item_his_eb,0,2,1)self.user_ebtf.squeeze(item_his_eb)0.055930120.29446820.061654720.102507840.015431124-聚合方式我感觉从上面的试验及最终决定效果天花板的应该是最后一步的或讲这种目前暂时没有想到十分有效的聚合方式。因此先试试吧其实与mean是没有区别的。在3.4的根底上换成gap竟然获得最正确的结果卧槽那么是不是把位置信息也给平均了0.05811510.310830530.094238870.107664410.02585611因此将位置编码去掉结果略微差点讲明pooling还是可以的0.056459980.303479340.085028310.104823940.02255137假如不去掉位置而是将pooling换成mean那么效果又进步了haha0.059375470.31342650.097334150.109243490.02662534那么这就有一个问题摆在面前一维pooling与mean有啥区别么pooling有训练的参数么而且pooling得到的结果在时相当慢pooling3s,mean-1s然而经过测试发现两者结果是一样的对于指标不同可能是训练经过的问题因为本就相差不大根本上可以忽略。但对于我不知道咋解释了。x0tf.random.normal(13,10,18)x0.shapeTensorShape(13,10,18)x2tf.keras.layers.GlobalAveragePooling1D()(x0)x3tf.math.reduce_mean(x0,1)np.array_equal(x2.numpy(),x3.numpy()按照再试下和参考其取mean方式如下在本文最优的根底上发现效果更差了woc【我疑心是user_id搞错了但搞错的话应该没这么高的指标啊】这个指标与对应都是差不多的数值item_his_ebself.encoder(item_his_eb,True,None)tmptf.concat(tf.nn.embedding_lookup(self.uid_embeddings_var,user_t1),tf.math.reduce_mean(item_his_eb,1),axis-1)self.user_ebself.dense2(tmp)0.038973760.232017790.059103210.076543420.01505933直接相加的结果也不太行self.user_ebtf.nn.embedding_lookup(self.uid_embeddings_var,user_t1)tf.math.reduce_mean(item_his_eb,1)0.036214710.244665490.069387350.076645430.01780289相乘呢。0.023928780.162164570.018284490.050569450.00450191既然如此将再改为试试没啥进步到此为止吧。0.044467430.240246370.018033270.082553930.003878435-多层Encoder用多层的时候报错了如下这是啥玩意啊。ValueError:Weightsformodelsequential_1havenotyetbeencreated.WeightsarecreatedwhentheModelisfirstcalledoninputsorbuild()iscalledwithaninput_shape后来发现是加了dropout的原因但为啥呢加上了input_shape还是不对这个暂时不要吧哈哈两层Encoder结果并没有进步如下0.05855020.31068550.092855750.107973570.02531075三层也是差不多的结果不再附。愿我们终有重逢之时而你还记得我们曾经讨论的话题。