深度学习炒股方法能用来炒股吗

深度学习技术在股票交易上的应用研究调查-科学探索-金投网
深度学习技术在股票交易上的应用研究调查
来源:百度百家-科技|编辑:实习编辑1 我要分享
摘要:Sirignano(2016)提出了一种预测限价委托单薄变化的方法。他开发了一个「空间神经网络(spatialneuralnetwork)」,该网络可以利用局部空间结构的优势,比标准的神经网络更具可解释性、也更具计算效率。
文中缩写:
DBN = 深度信念网络
LSTM = 长短期记忆网络
MLP = 多层感知器
RBM = 受限玻尔兹曼机
ReLU = 修正线性单元
CNN = 卷积神经网络
限价委托单薄模型(Limit Order Book Modeling)
Sirignano(2016)提出了一种预测限价委托单薄变化的方法。他开发了一个「空间神经网络(spatial neural network)」,该网络可以利用局部空间结构的优势,比标准的神经网络更具可解释性、也更具计算效率。他模拟了在下一状态变化时最好的出价和要价。
架构:每个神经网络有 4 层。标准的神经网络每个隐藏层有 250 个神经元,而该空间神经网络有 50 个。他在隐藏层神经元上使用双曲正切激活函数。
训练:他在 2014 年至 2015 年的 489 支股票的委托单薄上训练并测试了该网络(每支股票有一个单独的模型)。他使用了来自纳斯达克的 Level III 限价委托单薄数据,这些数据有着纳秒级别的十进制精度。训练包括了 50 TB 的数据,并且使用了 50 个 GPU 组成的集群。他总结了 200 个特征:现价委托单薄在首个 50 非零买入和卖出(bid/ask)水平的和大小。他使用 dropout 防止过拟合,并在每个隐藏层之间使用批规范化( batch normalization)来防止内部的协变量转变(covariate shift)。最后,使用 RMSProp 算法完成训练。RMSProp 类似于带有动量的随机梯度下降,但它通过一个过去梯度的移动平均(running average)对梯度进行规范化。他使用了一个自适应学习速率&&在任何时候,当训练错误率随着训练时间增加时,这个学习速率就会按一定的常数因子下降。他使用一个被一个验证集强加的提前停止(early stopping)来减少过拟合。在训练时为了减少过拟合,他也用了一个 l^2 惩罚机制。
阅读关键字:
科学探索&深度学习技术&股票交易&价格&热点&银行
【免责声明】金投网发布此信息目的在于传播更多信息,与本网站立场无关。金投网不保证该信息(包括但不限于文字、
数据及图表)全部或者部分内容的准确性、真实性、完整性、有效性、及时性、原创性等。相关信息并未经过本网站证
实,不对您构成任何投资建议,据此操作,风险自担。
今天一早,一个外卖小哥的背影在网上火了。…
耶伦奶奶终于坐不住了,狼真的来了,是的,没有意外,美联储昨夜又加...
2017年英国大选最终以悬持议会的出现落下了帷幕,近半数英国人希望梅...
法国大选首轮投票已经结束了,“非左非右”的“前进”运动候选人埃马...
4月8日,特朗普下令,美国对叙利亚发射了约60枚战斧导弹,这一军事大...
南京打拼的小伙伴们,好消息来了,南京明年2月起实行积分落户,不再实...
最近几个月黄金白银遭受了多番的强烈打击,金银基本回吐了今年90%的涨...
美国大选开锣投票,四年一度的美国总统大选投票正式开始了,希拉里特...
紧急通知,25天人民币连跌912点,了解这些才能不吃亏,人民币对美元长...
金投财经网(gold.org/)08月13日讯…
南京、长春、哈尔滨、天津、济南、太原等多个地区目前仍有雷雨天…
版权所有 &
金投网 gold.org 浙ICP备号 经营许可证编号:浙B2- 为方便用户快速收藏本站,请牢记本站易记网址:<
本站信息仅供投资者参考,不作为投资建议!本站所有广告均由第三方提供!联系管理员:webmaster@cngold.org 欢迎投稿:tougao@cngold.org
我的意见:基于大数据的深度学习在股票市场的应用_股林高手(blog)股吧_东方财富网股吧
基于大数据的深度学习在股票市场的应用
最近这一年,“大数据”,“深度学习”的概念铺天盖地卷来,声势煞是吓人,感觉这个社会马上要被机器取代了一样。沉静下来后,仔细研究了下这方面的理论,觉得这乃是杞人忧天,现在的人工智能技术还远不能达到接近人类的水平,更不说超越了。第一点,基础理论仍然有很多障碍未能越过。这个障碍不是单单可以由计算机计算速度提高来解决的,解决它必须要在神经理论研究有突破性进展后才能开始,神经理论研究是一个综合学科,需要从科学到技术有长足进展的时候才可能有突破。而基础理论的进步本身就是很缓慢的过程,未来100年要想解决这些问题都是一个未知数。第二点,就算基础理解解决了这个问题,但是自然社会的发展的过程本身就是循序渐进的过程,是连续可微的一个过程,不存在跳跃式的整个社会被人工智能AI所占据。第三点,社会的发展过程不会只向前发展,它的发展过程是一个波动的过程,时快时慢,甚至某些时间还会导致社会倒退。所以,社会的发展过程是极其缓慢的。人工智能在初期发展的过程必须依附于自然社会,它的发展速度也会很缓慢的。所以,这个时代,未来很漫长的一个时代,都是一个以人性为基础的社会。我们不要报太高的期望,奢想能有什么人工智能可以做精确的股票买卖。我们还是应该自己勤勤勉勉,脚踏实地做自己应该做的事。迈思拓客mystock.cc
评论该主题
作者:您目前是匿名发表 &
作者:,欢迎留言
提示:用户在社区发表的所有资料、言论等仅代表个人观点,与本网站立场无关,不对您构成任何投资建议。用户应基于自己的独立判断,自行决定证券投资并承担相应风险。深度学习方法能用来炒股吗? - 知乎977被浏览46152分享邀请回答20515 条评论分享收藏感谢收起深度学习你不可不知的技巧(下)
我的图书馆
深度学习你不可不知的技巧(下)
点击上方“深度学习大讲堂”可订阅哦!深度学习大讲堂致力于推送人工智能,深度学习方面的最新技术,产品以及活动。Sec. 5: Activation Functions&&&&One of the crucial factors in deep networks is&activation function, which brings the&non-linearity&into networks. Here we will introduce the details and characters of some popular activation functions and give advices later in this section.1Sigmoid&&&&&The sigmoid non-linearity has the mathematical form&. It takes a real-valued number and “squashes” it into range between 0 and 1. In particular, large negative numbers become 0 and large positive numbers become 1. The sigmoid function has seen frequent use historically since it has a nice interpretation as the firing rate of a neuron: from not firing at all (0) to fully-saturated firing at an assumed maximum frequency (1).&&&&In practice, the sigmoid non-linearity has recently fallen out of favor and it is rarely ever used. It has two major drawbacks:&&&&1. Sigmoids saturate and kill gradients. A very undesirable property of the sigmoid neuron is that when the neuron's activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. Recall that during back-propagation, this (local) gradient will be multiplied to the gradient of this gate's output for the whole objective. Therefore, if the local gradient is very small, it will effectively “kill” the gradient and almost no signal will flow through the neuron to its weights and recursively to its data. Additionally, one must pay extra caution when initializing the weights of sigmoid neurons to prevent saturation. For example, if the initial weights are too large then most neurons would become saturated and the network will barely learn.&&&&&&&&2. Sigmoid outputs are not zero-centered. This is undesirable since neurons in later layers of processing in a Neural Network (more on this soon) would be receiving data that is not zero-centered. This has implications on the dynamics during gradient descent, because if the data coming into a neuron is always positive (e.g., x&0 element wise in&), then the gradient on the weights &w will during back-propagation become either all be positive, or all negative (depending on the gradient of the whole expression f). This could introduce undesirable zig-zagging dynamics in the gradient updates for the weights. However, notice that once these gradients are added up across a batch of data the final update for the weights can have variable signs, somewhat mitigating this issue. Therefore, this is an inconvenience but it has less severe consequences compared to the saturated activation problem above.2tanh(x)&&&&&The tanh non-linearity squashes a real-valued number to the range [-1, 1]. Like the sigmoid neuron, its activations saturate, but unlike the sigmoid neuron its output is zero-centered. Therefore, in practice the tanh non-linearity is always preferred to the sigmoid nonlinearity.3Rectified Linear Unit&&&&&The Rectified Linear Unit (ReLU) has become very popular in the last few years. It computes the function&, which is simply thresholded at zero.&&&&There are several pros and cons to using the ReLUs:&&&&1.&(Pros) Compared to sigmoid/tanh neurons that involve expensive operations (exponentials, etc.), the ReLU can be implemented by simply thresholding a matrix of activations at zero. Meanwhile, ReLUs does not suffer from saturating.&&&&2. (Pros) It was found to greatly accelerate (e.g., a factor of 6 in&[1]) the convergence of stochastic gradient descent compared to the sigmoid/tanh functions. It is argued that this is due to its linear, non-saturating form.&&&&3. (Cons) Unfortunately, ReLU units can be fragile during training and can “die”. For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. That is, the ReLU units can irreversibly die during training since they can get knocked off the data manifold.&For example, you may find that as much as 40% of your network can be “dead” (i.e., neurons that never activate across the entire training dataset) if the learning rate is set too high. With a proper setting of the learning rate this is less frequently an issue.4Leaky ReLU&&&&&Leaky ReLUs are one attempt to fix the “dying ReLU” problem. Instead of the function being zero when, a leaky ReLU will instead have a small negative slope (of 0.01, or so). That is, the function computes& if&&and&&if&, where a is a small constant. Some people report success with this form of activation function, but the results are not always consistent.5Parametric ReLU&&&&&Nowadays, a broader class of activation functions, namely the&rectified unit family, were proposed. In the following, we will talk about the variants of ReLU.&&&&ReLU, Leaky ReLU, PReLU and RReLU. In these figures, for PReLU, ai&is learned and for Leaky ReLU ai&is fixed. For RReLU, aji&is a random variable keeps sampling in a given range, and remains fixed in testing.&&&&The first variant is called&parametric rectified linear unit&(PReLU)&[4]. In PReLU, the slopes of negative part are learned from data rather than pre-defined. He&et al.&[4]&claimed that PReLU is the key factor of surpassing human-level performance on&ImageNet() classification task. The back-propagation and updating process of PReLU is very straightforward and similar to traditional ReLU, which is shown in Page. 43 of the slides.、6Randomized ReLU&&&&&The second variant is called&randomized rectified linear unit&(RReLU). In RReLU, the slopes of negative parts are randomized in a given range in the training, and then fixed in the testing. As mentioned in&[5], in a recent Kaggle&National Data Science Bowl (NDSB)&competition(), it is reported that RReLU could reduce overfitting due to its randomized nature. Moreover, suggested by the NDSB competition winner, the random ai&in training is sampled from&&and in test time it is fixed as its expectation, i.e.,&&&&In&[5], the authors evaluated classification performance of two state-of-the-art CNN architectures with different activation functions on theCIFAR-10,&CIFAR-100&and&NDSB&data sets, which are shown in the following tables.&Please note that, for these two networks, activation function is followed by each convolutional layer. And the 1/a &in these tables actually indicates&, where a is the aforementioned slopes.&&&&From these tables, we can find the performance of ReLU is not the best for all the three data sets. For Leaky ReLU, a larger slope&&will achieve better accuracy rates. PReLU is easy to overfit on small data sets (its training error is the smallest, while testing error is not satisfactory), but still outperforms ReLU. In addition, RReLU is significantly better than other activation functions on NDSB, which shows RReLU can overcome overfitting, because this data set has less training data than that of CIFAR-10/CIFAR-100.&In conclusion, three types of ReLU variants all consistently outperform the original ReLU in these three data sets. And PReLU and RReLU seem better choices. Moreover, He&et al. also reported similar conclusions in&[4].Sec. 6: Regularizations&&&&There are several ways of controlling the capacity of Neural Networks to prevent overfitting:&&&&L2 regularization&is perhaps the most common form of regularization. It can be implemented by penalizing the squared magnitude of all parameters directly in the objective. That is, for every weight w in the network, we add the term&&to the objective, where& is the regularization strength. It is common to see the factor of 1/2 in front because then the gradient of this term with respect to the parameter w is simply&&instead of&. The L2 regularization has the intuitive interpretation of heavily penalizing peaky weight vectors and preferring diffuse weight vectors.&&&&L1 regularization&is another relatively common form of regularization, where for each weight w we add the term&&to the objective. It is possible to combine the L1 regularization with the L2 regularization:&&(this is called&Elastic net regularization). The L1 regularization has the intriguing property that it leads the weight vectors to become sparse during optimization (i.e. very close to exactly zero). In other words, neurons with L1 regularization end up using only a sparse subset of their most important inputs and become nearly invariant to the “noisy” inputs. In comparison, final weight vectors from L2 regularization are usually diffuse, small numbers. In practice, if you are not concerned with explicit feature selection, L2 regularization can be expected to give superior performance over L1.&&&&Max norm constraints.&Another form of regularization is to enforce an absolute upper bound on the magnitude of the weight vectorfor every neuron and use projected gradient descent to enforce the constraint. In practice, this corresponds to performing the parameter update as normal, and then enforcing the constraint by clamping the weight vector&&of every neuron to satisfy&. Typical values of c are on orders of 3 or 4. Some people report improvements when using this form of regularization. One of its appealing properties is that network cannot “explode” even when the learning rates are set too high because the updates are always bounded.&&&&Dropout&is an extremely effective, simple and recently introduced regularization technique by Srivastava&et al. in&[6]&that complements the other methods (L1, L2, maxnorm). During training, dropout can be interpreted as sampling a Neural Network within the full Neural Network, and only updating the parameters of the sampled network based on the input data. (However, the exponential number of possible sampled networks are not independent because they share the parameters.) During testing there is no dropout applied, with the interpretation of evaluating an averaged prediction across the exponentially-sized ensemble of all sub-networks (more about ensembles in the next section). In practice, the value of dropout ratio&&is a reasonable default, but this can be tuned on validation data.&&&&The most popular used regularization techniquedropout&[6]. While training, dropout is implemented by only keeping a neuron active with some probability p (a hyper-parameter), or setting it to zero otherwise. In addition, Google applied for a&US patent() for&dropout&in 2014.Sec. 7: Insights from Figures&&&&Finally, from the tips above, you can get the satisfactory settings (e.g., data processing, architectures choices and details, etc.) for your own deep networks. During training time, you can draw some figures to indicate your networks’ training effectiveness.&&&&1. As we have known, the learning rate is very sensitive. From Fig. 1 in the following, a very high learning rate will cause a quite strange loss curve. A low learning rate will make your training loss decrease very slowly even after a large number of epochs. In contrast, a high learning rate will make training loss decrease fast at the beginning, but it will also drop into a local minimum. Thus, your networks might not achieve a satisfactory results in that case. For a good learning rate, as the red line shown in Fig. 1, its loss curve performs smoothly and finally it achieves the best performance.&&&&2. &Now let’s zoom in the loss curve. The epochs present the number of times for training once on the training data, so there are multiple mini batches in each epoch. If we draw the classification loss every training batch, the curve performs like Fig. 2. Similar to Fig. 1, if the trend of the loss curve looks too linear, that indicates your
if it does not decrease much, it tells you that the learning rate might be too high. Moreover, the “width” of the curve is related to the batch size. If the “width” looks too wide, that is to say the variance between every batch is too large, which points out you should increase the batch size.&&&&3. Another tip comes from the accuracy curve. As shown in Fig. 3, the red line is the training accuracy, and the green line is the validation one. When the validation accuracy converges, the gap between the red line and the green one will show the effectiveness of your deep networks. If the gap is big, it indicates your network could get good accuracy on the training data, while it only achieve a low accuracy on the validation set. It is obvious that your deep model overfits on the training set. Thus, you should increase the regularization strength of deep networks. However, no gap meanwhile at a low accuracy level is not a good thing, which shows your deep model has low learnability. In that case, it is better to increase the model capacity for better results.Sec. 8: Ensemble&&&&In machine learning, ensemble methods&[8]&that train multiple learners and then combine them for use are a kind of state-of-the-art learning approach. It is well known that an ensemble is usually significantly more accurate than a single learner, and ensemble methods have already achieved great success in many real-world tasks. In practical applications, especially challenges or competitions, almost all the first-place and second-place winners used ensemble methods.&&&&Here we introduce several skills for ensemble in the deep learning scenario.&&&&Same model, different initialization.&Use cross-validation to determine the best hyperparameters, then train multiple models with the best set of hyperparameters but with different random initialization. The danger with this approach is that the variety is only due to initialization.&&&&Top models discovered during cross-validation.&Use cross-validation to determine the best hyperparameters, then pick the top few (e.g., 10) models to form the ensemble. This improves the variety of the ensemble but has the danger of including suboptimal models. In practice, this can be easier to perform since it does not require additional retraining of models after cross-validation. Actually, you could directly select several state-of-the-art deep models from&Caffe Model Zoo () to perform ensemble.&&&&Different checkpoints of a single model.&If training is very expensive, some people have had limited success in taking different checkpoints of a single network over time (for example after every epoch) and using those to form an ensemble. Clearly, this suffers from some lack of variety, but can still work reasonably well in practice. The advantage of this approach is that is very cheap.&&&&Some practical examples.&If your vision tasks are related to high-level image semantic, e.g., event recognition from still images, a better ensemble method is to employ multiple deep models trained on different data sources to extract different and complementary deep representations. For example in the&Cultural Event Recognition () challenge in associated with&ICCV’15 () , we utilized five different deep models trained on images of&ImageNet (),&Place Database () and the cultural images supplied by the&competition organizers (). After that, we extracted five complementary deep features and treat them as multi-view data. Combining “early fusion” and “late fusion” strategies described in&[7], we achieved one of the best performance and ranked the 2nd place in that challenge. Similar to our work,&[9]&presented the&Stacked NN&framework to fuse more deep networks at the same time.References & Source Links[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton.&ImageNet Classification with Deep Convolutional Neural Networks.() In&NIPS, 2012[2] A Brief Overview of Deep Learning () , which is a guest post by&Ilya Sutskever.[3] CS231n: Convolutional Neural Networks for Visual Recognition&of&Stanford University, held by&Prof. Fei-Fei Li&and&Andrej Karpathy.[4] K. He, X. Zhang, S. Ren, and J. Sun.&Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.()InICCV, 2015.[5] B. Xu, N. Wang, T. Chen, and M. Li.&Empirical Evaluation of Rectified Activations in Convolution Network(). In&ICML Deep Learning Workshop, 2015.[6] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov.&Dropout: A Simple Way to Prevent Neural Networks from Overfitting. ()JMLR, 15(Jun):, 2014.[7] X.-S. Wei, B.-B. Gao, and J. Wu.&Deep Spatial Pyramid Ensemble for Cultural Event Recognition. (http://lamda./weixs/publication/iccvw15_CER.pdf)In&ICCV ChaLearn Looking at People Workshop, 2015.[8] Z.-H. Zhou.&Ensemble Methods: Foundations and Algorithms().&Boca Raton, FL: Chapman & HallCRC/, 2012. (ISBN 978-1-439-830031)[9] M. Mohammadi, and S. Das.&S-NN: Stacked Neural Networks. Project in&Stanford CS231n Winter Quarter, 2015.()[10] P. Hensman, and D. Masko.&The Impact of Imbalanced Training Data for Convolutional Neural Networks.() Degree Project in Computer Science, DD143X, 2015.作者简介魏秀参南京大学计算机系机器学习与数据挖掘所(LAMDA)博士生,研究方向为计算机视觉和机器学习,特别是深度学习和弱监督学习。曾在国际顶级期刊和会议发表学术论文。个人主页:,微博ID:Wilson_NJUer该文章属于“深度学习大讲堂”原创,如需要转载,请联系loveholicguoguo。往期精彩回顾欢迎关注我们!深度学习大讲堂致力于推送人工智能,深度学习的最新技术,产品和活动!深度学习大讲堂
TA的最新馆藏
喜欢该文的人也喜欢

我要回帖

更多关于 深度学习炒股 的文章

 

随机推荐