I am sending you the introduction Part which needs to improve writing ,this is only the first part of the paper and please keep the reciting numbers and mark as they are to be able to compare
next parts need more proofreading than improve writing,
make sure to add anything in different color ,so I can read and compare any error or any added sentence or any notes from you !
Deep Learning has the main rule of the improvement in the field of computer vision, resulting in state-of-the-art performance in many challenging tasks  such as object recognition , semantic segmentation , image captioning , and human pose estimation . Using of the convolutional neural networks (CNNs)  was the main reason of many of these achievements, which are able of learning complex and deep feature representations of images. As the complexity increases, the resource utilization of such models increases as well. Modern networks commonly contain tens to hundreds of millions of learned parameters which provide the necessary representational power for such tasks, but with the increased representational power also comes increased probability of overfitting, leading to poor generalization. In order to fight the overfitting, different regularization techniques can be applied, such as data augmentation. In the computer vision field, data augmentation is very famous technique due to its ease of implementation and effectiveness. Simple image transforms such as mirroring or cropping can be applied to create new training data which can be used to improve accuracy . Large models can also be regularized by adding noise during the training process, whether it be added to the input, weights, or gradients. One of the most common uses of noise for improving model accuracy is dropout , which stochastically drops neuron activations during training and as a result discourages the co-adaptation of feature detectors. In this work We consider applying different data augmentation techniques combined with dropout, these techniques encourage the plain convolutional networks to achieve better generalization and get better results in validation and testing phase.
. In the remainder of this paper, we introduce collected optimization methods and demonstrate that using data augmentation and dropout can improve model robustness and lead to better model performance. We show that these simple methods work with a simple plain convolutional neural networks model and can also be combined with most regularization techniques, including learning rate and early stopping and other regularization techniques in very simple manner.
Data Augmentation for Images
Data augmentation has long been used in practice when training convolutional neural networks. When training LeNet5  for optical character recognition, LeCun applied various affine transforms, including horizontal and vertical translation, scaling, squeezing, and horizontal shearing to improve their model’s accuracy and robustness.
In , Bengio demonstrated that deep architectures benefit much more from data augmentation than shallow architectures. They did apply a large variety of transformations to their handwritten character dataset, including local elastic deformation, motion blur, Gaussian smoothing, Gaussian noise, salt and pepper noise, pixel permutation, and adding fake scratches and other occlusions to the images, in addition to affine transformations .
To improve the performance of AlexNet  for the 2012 ImageNet Large Scale Visual Recognition Competition, Krizhevsky did apply image mirroring, cropping, as well as randomly adjusting color and intensity values based on ranges determined using principal component analysis on the dataset.
Wu did apply a wide range of color casting, vignetting, rotation, and lens distortion (pin cushion and barrel distortion), as well as horizontal and vertical stretching when training Deep Image  on the ImageNet dataset. In addition to flipping and cropping.
Lemley tackle the issue of data augmentation with a learned end-to-end approach called Smart Augmentation  instead of relying on hard-coded transformations. In this method, a neural network is trained to intelligently combine existing samples in order to generate additional data that is useful for the training process.
Dropout in Convolutional Neural Networks
Another common regularization technique that we had to use in our models is dropout , which was first introduced by Hinton. Dropout is implemented by setting hidden unit activations to zero with some fixed probability during training. All activations are kept when evaluating the network, but the resulting output is scaled according to the dropout probability. This technique has the effect of approximately averaging over an exponential number of smaller sub-networks, and works well as a robust type of bagging, which discourages the co-adaptation of feature detectors within the network.
While dropout was found to be very effective at regularizing fully-connected layers we discovered that it does have the same powerful at convolutional neural when use at the best place and when use the best rate.
Early stopping is a form of regularization used to avoid overfitting when using some methods, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner’s performance on data outside of the training set. Past that point, however, improving the learner’s fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.