Correlation analysis was done taking the yield as output variable and temperature and precipitation as prediction parameters. Stepwise regression technique was used to select best predictors [4, 25]. The crop production changes with the changes in climate. This effect of changing climatic variables on crop production was studied under anticipated seasonal climate change conditions. Daily weather data was obtained from weather generator and authors used the multiple linear regression models. The parameters referencing the climate such as measures of minimum and maximum temperature, density of rainfall, humidity and precipitation rate, speed of wind and solar radiation were used for analysis. These factors were used to predict the corn yield they have results implied that climate variability significantly affects crop yields [4].
The process of identifying similar objects that are different from individuals in other groups is called Cluster analysis or clustering. Clustering finds wide usage in data analysis and many other fields such as machine learning, recognition patterns, analyzing images, retrieval of information and agriculture, etc. Clustering can be studied with several algorithms such as k-means, k-medoid, etc. K-means is most common and widely used clustering algorithm [26]. Demonstration of modified k-Means clustering algorithm in prediction of crop was done in Ref. [4]. Comparison of results for modified k-Means over k-Means and -Means++ clustering algorithms was done and it was found that the modified k-Means had the maxi-mum number of highly differentiable good quality clusters, highly correct results of crop prediction and maximum count for accuracy.
A weather forecast model was developed for classifying the metrological data. The model was based on the frequency of variables. Patterns associated to severe convective activity were identified for the task. Brazilian regions were spot for collection of features during summer of 2007. A fairly good classification performance was seen in results [27].
1.1.1.5 Principal Component Analysis
Principal component analysis is one of the data mining processes that ensures correct forecast by the arrangement and familiarity in data. Monsoon rainfall is important parameter for variable for crop yield. The amount of rainfall varies periodically during monsoons depending upon the region selected for experiments. Rainfall information is considered important area where water storage from rainfall had been carried out, particularly for flood observant methods. Broad range prediction of Indian monsoon rainfall is based on statistical methods. Indian economy is highly impacted by the limited variation in the periodic rainfall. Evaluation of high spatial datasets like temperature of sea surface and rainfall periods is done using weather and water assets analysis that used component derivation method. Prediction of monsoon rainfall in India is obtained using the principal component analysis [4, 28, 29].
1.1.1.6 Bayesian Networks
A Bayesian network also known as Bayes network or belief network or Bayesian model is probabilistic directed acyclic graphical that uses statistical model. Effect of climate change on potato production was assessed using a belief network [4]. The change in climate (uncertainty) and the variability of current weather parameters were collaborated in the belief network. The parameters studied were such as temperature conditions, radiation, rainfall data and the potato development information. The network was developed to support the policy makers in agriculture. Synthetic weather scenarios were used for tests and then, comparison of the results with the conventional mathematical model was done. The belief network proved efficient for the experiment.
1.1.1.7 Time Series Analysis
Meaningful statistics can be extracted from a series data that can be analyzed on time based parameters. This is commonly termed as Time series Analysis and predicts future values based on previously obtained data. Time series analysis can be an important tool used in forecasting the crop yield. The dependent variable yield is time function that can establish the relation between yield and time. Frequency and time domain, parametric or non-parametric methods, linear or nonlinear approaches, univariate and multivariate models are few variants of time series analysis. Spectral analysis are used in frequency domain and wavelet analysis, time domain includes auto-correlation and cross-correlation, parametric approaches use autoregressive or moving average model, non-parametric [30] approaches have covariance or spectrum of the process in the core. A new concept of crop yield under average climate conditions was used in Ref. [31]. The time series techniques was used on the past yield data to set up a forecasting model. The moving average method was used first then regression equation was applied thereafter and finally the difference of the yield and impact of climate on yield was found. Moving average model was concluded as better model for yield forecasting. The model used a small dataset and useful results were obtained.
1.1.1.8 Markov Chain Model
Markov chain model is mathematical model in a probabilistic manner. It uses a stochastic process in which Markov chain of output of an experiment depends only on the results of the initial experiments. Alternately, present state determines next state. Markov chains derived the name from the mathematician who belonged to Russian origin (1856–1922). He initiated the theory of stochastic processes. Markov chain approach was used for prediction of cotton yield from pre-harvest data of crops [32]. The application of the Markov chain approach in predicting crop yields was investigated along with the analysis of data for yield of cotton crop for two leading states for cotton crop production. California and Texas were the states of study. Data was taken for the four-year period from 1981 to 1984. Probability distribution was estimated using Markov chain. Selection of key variables for the key within each period for the baseline data was done using multiple linear regressions and multiple rank regressions. Means of these predicted yield distributions was used for yield forecast. Sugarcane yield forecast was obtained from the model that implemented second order Markov chain. Results concluded that the second order Markov chain model can be preferred over other models of regression and first order Markov chain model for crop yield forecasting [33].
1.2 Conclusions
There exist a number of applications in agriculture that use machine learning techniques for prediction and analysis. The article discusses some of the commonly used approaches in research. Large amount of data can be collected from various resources for performing analysis on crop yield forecast. Integrating machine learning into agricultural processes is a vastly growing research area these days. The collaborative model of computer science with agriculture can help in exploring various domains of agronomics and forecasting agricultural crops. The merger of the two approaches can be helpful in pre-harvest crop forecasting and the traditional forecasting method can be out ruled by using computational statistical approaches.
References
1. Khoshnevisan, B., Rafiee, S., Omid, M., Mousazadeh, H., Rajaeifar, M.A., Application of artificial neural networks for prediction of output energy and GHG emissions in potato production in Iran. Agric. Syst., 123, 120–127, 2014.
2. Bejo, S., Mustaffha, S., Wan