Deep Learning Overview

Nowadays, the performances of machine learning models heavily rely on the representation of data or feature selection steps rather than just the choice of machine learning algorithms. Thus much effort is applied on preprocessing pipelines such as feature selection. Even though some specific domain knowledge can be used to help design the representation of data, the motivation of Artificial Intelligence needs more powerful representations of features. Deep Learning also called unsupervised learning, is a relatively new research field of machine learning which can learn multi- ple levels of abstraction and representation of features directly from data. It aims at learning feature hierarchies with higher level features formed by the composition of lower ones. The multiple levels structures allow to build complex functions which take data as input and output the result directly without depending on features crafted by humans [Bengio09].

Deep learning achieved many successful results on some problems, such as im- age classification [Ciresan12] [Krizhevsky12], semantic parsing [Bordes12] and speech recognition [Dahl12]. Deep architecture may express the complex distributions more efficiently with better per- formance on challenging tasks [Bengio07][Bengio09]. The hypothesis that the composition of addi- tional functional levels can give more powerful modeling capacity has already been proposed for a long time [Hinton90][Rumelhart86]. However, the training process of deep architec- ture was proven to be very difficult, until some successful approaches of [Bengio007][Hinton06][Hinton006] for training stacked autoencoder and DBN occurred. One key idea behind them is to train the deep architecture layer by layer by unsupervised learning, which is also called unsupervised feature learning. Each layer will generate a more abstract representation of the observed layer by doing a local optimization. Unsupervised feature learning can learn useful features automatically and directly from the data set by unsupervised learning without given specific features defined by human. The unsupervised learned features are more natural with less information lost [Bengio15]. Some deep learning models also have a potential powerful capacity of solving time-series problems [La14], which is another reason that makes deep learning suitable for stock trend prediction. Therefore deep learning can provide a new potential and powerful approach to improve stock prediction.

Building Deep Representations

Experimental results show that it is much harder to train a deep architecture than training a shallow one [Bengio07][Erhan09]. The rapid recent growth and success of deep learning owe to a breakthrough initiated by Geoff Hinton in 2006 and quickly followed up by some papers [Hinton006][Bengio94][Poultney06]. Greedy layer wise unsupervised pre-training as a central idea was proposed. It means just one layer of the hierarchy is trained at one time by unsupervised feature learning to learn a new transformation of data with the previous layer output. Finally, the set of pre-trained layers is combined with a stan- dard supervised classifier such as Support Vector Machine, Logistic Regression or as initialization for a deep learning model such as a Stacked Denoising Auto-encoder or Deep Belief Network. Experiments show that the layer-wise stacking can attain a better feature representation in most time [Larochelle09]. Although it is not difficult to combine single layers pretrained by unsupervised learning into a supervised model, it is not very clear how the single layers should combine to form a better unsuper- vised model [Bengio15]. One approach is to stack pre-trained RBMs (section 4.2.1) into DBN (section 4.2.2).

Stacked Denoising Autoencoders


[Bengio09] Yoshua Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1–127, 2009. Also published as a book. Now Publishers, 2009.

[Ciresan12] Dan Ciresan, Ueli Meier, and Ju ̈rgen Schmidhuber. Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3642–3649. IEEE, 2012.

[Krizhevsky12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.

[Bordes12] Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. Joint learn- ing of words and meaning representations for open-text semantic parsing. In International Conference on Artificial Intelligence and Statistics, pages 127– 135, 2012.

[Dahl12] George E Dahl, Dong Yu, Li Deng, and Alex Acero. Context-dependent pre- trained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):30–42, 2012.

[Bengio07] Yoshua Bengio, Yann LeCun, et al. Scaling learning algorithms towards ai. Large-scale kernel machines, 34(5), 2007.

[Hinton90] Geoffrey E Hinton. Connectionist learning procedures. artificial intelligence, 40 1-3: 185 234, 1989. reprinted in j. carbonell, editor,”. Machine Learning: Paradigms and Methods”, MIT Press, 1990.

[Rumelhart86] David E Rumelhart and James L McClelland. The pdp research group: Par- allel distributed processing: Explorations in the microstructure of cognition. Foundations, 1, 1986.

[Bengio007] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, et al. Greedy layer-wise training of deep networks. Advances in neural information processing systems, 19:153, 2007.

[Hinton06] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.

[Hinton006] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algo- rithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.

[Erhan09] Dumitru Erhan, Pierre-Antoine Manzagol, Yoshua Bengio, Samy Bengio, and Pascal Vincent. The difficulty of training deep architectures and the effect of unsupervised pre-training. In International Conference on artificial intelligence and statistics, pages 153–160, 2009.

[Bengio15] Yoshua Bengio, Ian J. Goodfellow, and Aaron Courville. Deep learning. Book in preparation for MIT Press, 2015.

[La14] Martin La ̈ngkvist, Lars Karlsson, and Amy Loutfi. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogni- tion Letters, 42:11–24, 2014.

[Bengio94] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term depen- dencies with gradient descent is difficult. Neural Networks, IEEE Transactions on, 5(2):157–166, 1994.

[Poultney06] Christopher Poultney, Sumit Chopra, Yann L Cun, et al. Efficient learning of sparse representations with an energy-based model. In Advances in neural information processing systems, pages 1137–1144, 2006.

[Larochelle09] Hugo Larochelle, Yoshua Bengio, J ́eroˆme Louradour, and Pascal Lamblin. Ex- ploring strategies for training deep neural networks. The Journal of Machine Learning Research, 10:1–40, 2009.

[Bengio15] Yoshua Bengio, Ian J. Goodfellow, and Aaron Courville. Deep learning. Book in preparation for MIT Press, 2015.