Contributor is a human worker providing annotations on the Appen data annotation platform289.
Convenience sampling – using a dataset not gathered scientifically in order to run quick experiments. Later on, it’s essential to switch to a scientifically gathered dataset290.
Convergence – informally, often refers to a state reached during training in which training loss and validation loss change very little or not at all with each iteration after a certain number of iterations. In other words, a model reaches convergence when additional training on the current data will not improve the model. In deep learning, loss values sometimes stay constant or nearly so for many iterations before finally descending, temporarily producing a false sense of convergence. See also early stopping291,292.
Convex function is a function in which the region above the graph of the function is a convex set. The prototypical convex function is shaped something like the letter U. For example, the following are all convex functions:
By contrast, the following function is not convex. Notice how the region above the graph is not a convex set:
A strictly convex function has exactly one local minimum point, which is also the global minimum point. The classic U-shaped functions are strictly convex functions. However, some convex functions (for example, straight lines) are not U-shaped. A lot of the common loss functions, including the following, are convex functions: L2 loss; Log Loss; L1 regularization; L2 regularization. Many variations of gradient descent are guaranteed to find a point close to the minimum of a strictly convex function. Similarly, many variations of stochastic gradient descent have a high probability (though, not a guarantee) of finding a point close to the minimum of a strictly convex function. The sum of two convex functions (for example, L2 loss + L1 regularization) is a convex function. Deep models are never convex functions. Remarkably, algorithms designed for convex optimization tend to find reasonably good solutions on deep networks anyway, even though those solutions are not guaranteed to be a global minimum293,294.
Convex optimization – the process of using mathematical techniques such as gradient descent to find the minimum of a convex function. A great deal of research in machine learning has focused on formulating various problems as convex optimization problems and in solving those problems more efficiently. For complete details, see Boyd and Vandenberghe, Convex Optimization295.
Convex set is a subset of Euclidean space such that a line drawn between any two points in the subset remains completely within the subset.296.
Convolution — the process of filtering. A filter (or equivalently: a kernel or a template) is shifted over an input image. The pixels of the output image are the summed product of the values in the filter pixels and the corresponding values in the underlying image297.
Convolutional filter – one of the two actors in a convolutional operation. (The other actor is a slice of an input matrix). A convolutional filter is a matrix having the same rank as the input matrix, but a smaller shape298.
Convolutional layer is a layer of a deep neural network in which a convolutional filter passes along an input matrix299.
Convolutional neural network (CNN) is a type of neural network that identifies and interprets images300,301.
Convolutional operation – the following two-step mathematical operation: Element-wise multiplication of the convolutional filter and a slice of an input matrix. (The slice of the input matrix has the same rank and size as the convolutional filter); Summation of all the values in the resulting product matrix302.
Corelet programming environment (CPE) is a scalable environment that allows programmers to set the functional behavior of a neural network by adjusting its parameters and communication characteristics303.
Corpus of texts is a large dataset of written or spoken material that can be used to train a machine to perform linguistic tasks304.
Correlation analysis is a statistical data processing method that measures the strength of the relationship between two or more variables. Thus, it determines whether there is a connection between the phenomena and how strong the connection between these phenomena is305.
Correlation is a statistical relationship between two or more random variables306.
Cost – synonym for loss. A measure of how far a model’s predictions are from its label. Or, to put it more pessimistically, a measure of how bad a model is. To determine this value, the model must define a loss function. For example, linear regression models typically use the standard error for the loss function, while logistic regression models use the log loss307,308.
Co-training essentially amplifies independent signals into a stronger signal. For instance, consider a classification model that categorizes individual used cars as either Good or Bad. One set of predictive features might focus on aggregate characteristics such as the year, make, and model of the car; another set of predictive features might focus on the previous owner’s driving record and the car’s maintenance history. The seminal paper on co-training is Combining Labeled and Unlabeled Data with Co-Training by Blum and Mitchell309.
Counterfactual fairness is a fairness metric that checks whether a classifier produces the same result for one individual as it does for another individual who is identical to the first, except with respect to one or more sensitive attributes. Evaluating a classifier for counterfactual fairness is one method for surfacing potential sources of bias in a model. See «When Worlds Collide: Integrating Different Counterfactual Assumptions in Fairness» for a more detailed discussion of counterfactual fairness310.
Coverage bias – this bias means that the study sample is not representative and that the data set in the array has zero chance of being included in the sample311.
Crash blossom is a sentence or phrase with an ambiguous meaning. Crash blossoms present a significant problem in natural language understanding. For example, the headline Red Tape Holds Up Skyscraper is a crash blossom because an NLU model could interpret the headline literally or figuratively