Probability with R. Jane M. Horgan. Читать онлайн. Newlib. NEWLIB.NET

Автор: Jane M. Horgan
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Математика
Год издания: 0
isbn: 9781119536987
Скачать книгу

       The decimal point is 1 digit(s) to the right of the | 1 | 2344 1 | 59 2 | 11 2 | 5556777889999 3 | 0113 3 | 6 4 | 00000000 4 | 6779 5 | 12223344 5 | 56679 6 | 0011123444 6 | 566777888999 7 | 0112344 7 | 5666666899 8 | 001112222334 8 | 5678899 9 | 0122 9 | 7778

      From Fig. 3.13, we are able to see the individual observations, as well as the shape of the data as a whole. Notice that there are many marks of exactly 40, whereas just one student obtains a mark between 35 and 40. One wonders if this has anything to do with the fact that 40 is a pass, and that the examiner has been generous to borderline students. This point would go unnoticed with a histogram.

      Plots of data are useful to investigate relationships between variables. To examine, for example, the relationship between the performance of students in Programming in Semesters 1 and 2, we could write

      plot(prog1, prog2, xlab = "Programming Semester 1", ylab = "Programming Semester 2")

c03f014

      To do this, first create a data frame of all the variables that you want to compare.

      courses <- results[2:5]

      This creates a data frame images containing the second to the fifth variables in images, that is, images and images. Writing

      pairs(courses)

      or equivalently

      pairs(results[2:5])

c03f015
Function

      In the case of the Programming subjects, we have a set of points (images, images), and having established, from the scatter plot, that a linear trend exists, we attempt to fit a line that best fits the data. In R

      lm(prog2∼prog1)

      calculates what is referred to as the linear model (lm) of images on images, or simply the line

equation

      that best fits the data.

      The output is

      Call: lm(formula = prog2∼prog1) Coefficients: (Intercept) prog1 -5.455 0.960

      Therefore, the line that best fits these data is

equation

      To draw this line on the scatter diagram, write

      plot(prog2, prog1) abline(lm(prog2∼prog1))

c03f016

      A word of warning is appropriate here. The estimated values are based on the assumption that the past trend continues. This may not always be the case. For example, students who do badly in Semester 1, may get such a shock that they work harder in Semester 2, and change the pattern. Similarly, students getting high marks in Semester 1 may be lulled into a sense of false security and take it easy in Semester 2. Consequently, they may not do as well as expected. Hence, the Semester 1 trends may not continue, and the model may no longer be valid.

      Machine learning is the science of getting computer systems to use algorithms and statistical models to study patterns and learn from data. Supervised learning is the machine learning task of using past data to learn a function in order to predict a future output.