This presentation discusses the contents of the SimpleData1.R example intended to introduce a basic pattern of programming: the three kinds of tasks.
- Input
- Analysis (a.k.a. Processing)
- Output
4/21/2016
This presentation discusses the contents of the SimpleData1.R example intended to introduce a basic pattern of programming: the three kinds of tasks.
dta <- read.csv( "../data/sampledta1/Test1.csv" )
Potential hiccup: Windows conventionally uses \ to separate directory and file names, but this is a special character in R literal strings. Either double them up when putting the string into R code ("C:\\TEMP\\file.csv") or use the forward slash alternative ("C:/TEMP/file.csv").
dta.lm <- lm( Reading ~ Seconds, data = dta )
lm is the “linear model” (regression) function. The dta.lm object holds the regression results.
The first argument is a “formula” because of the ~ operator. The left side indicates what y is and the right side indicates that a coefficient is needed for Seconds. In the absence of a -1 on the right side, an intercept will be computed.
A general feature of analysis functions is that they don’t interact with the world directly. Information goes in as parameters, and one “blob” of data is returned as a result.
In fact, a good quality analysis function won’t have side effects. R makes it difficult (but not impossible) to change global variables from inside functions. Don’t try to circumvent this… write your functions to direct all output through the return value, like the lm function does.
summary( dta )
## Seconds Reading ## Min. :0.00 Min. :2.200 ## 1st Qu.:0.75 1st Qu.:2.425 ## Median :1.50 Median :2.550 ## Mean :1.50 Mean :2.575 ## 3rd Qu.:2.25 3rd Qu.:2.700 ## Max. :3.00 Max. :3.000
Overview of the contents of a data frame.
dta.lm # minimal printout
## ## Call: ## lm(formula = Reading ~ Seconds, data = dta) ## ## Coefficients: ## (Intercept) Seconds ## 2.32 0.17
Default console view of linear regression analysis result.
summary( dta.lm )
## ## Call: ## lm(formula = Reading ~ Seconds, data = dta) ## ## Residuals: ## 1 2 3 4 ## -0.12 0.01 0.34 -0.23 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.3200 0.2531 9.167 0.0117 * ## Seconds 0.1700 0.1353 1.257 0.3358 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.3025 on 2 degrees of freedom ## Multiple R-squared: 0.4412, Adjusted R-squared: 0.1618 ## F-statistic: 1.579 on 1 and 2 DF, p-value: 0.3358
Counterintuitively, summary yields a more detailed printout than just printing the object does.
grid objects that can be printedgrid objects using different syntaxMake sure when you search for graphing functions that you don’t try to mix and match… use one system at a time.
# R base graphics... like "painting" on the screen plot( dta$Seconds, dta$Reading ) abline( dta.lm, col = "blue" )
library(lattice)
p <- xyplot( Reading ~ Seconds, dta
           , panel = function( x, y ) {
              panel.xyplot( x, y )
              panel.abline( dta.lm, col = "blue" ) } )
print( p )
library(ggplot2) ggp <- ggplot( dta, aes( x=Seconds, y=Reading ) ) + geom_point() + geom_smooth( method="lm", se=FALSE, color = "blue" ) print( ggp )
## `geom_smooth()` using formula 'y ~ x'
ggp <- # ggplot produces a "printable" object
  ggplot( dta  # default source for data to plot
        , aes( # define "aesthetic" map from data to display
               x = Seconds # horizontal position
             , y = Reading # vertical position
             ) 
        ) + # ggplot objects are "added" together
  geom_point() + # first layer plots data as points
  geom_smooth( # second layer uses data to generate a "smooth" curve
               method = "lm" # using linear regression
             , se = FALSE # don't display confidence band
             , color = "blue" # specify color of curve
             )
# object is not displayed until it is printed
# print( ggp ) # can be explicit, like this, or by interactive default
These three groups of plotting functions can be used consecutively, but they don’t work together (you cannot paint an abline on a ggplot object).
Make sure when you search for graphing functions that you don’t try to mix and match
My examples will primarily use ggplot2 functions.