# Statistical functions

R was written for statistics and therefore it has endless methods for doing this job. I can only summarize some of the most important functions and a lot will be missing in any case.

## basic information

First chapter is about some basic methods to get information about the data.
`summary` returns some information for each dimension of the data:
```> c(23,154,22,64,33,41) -> d
> summary(d)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
22.00   25.50   37.00   56.17   58.25  154.00
```
The exact behavior of this function is determined by the class of the particular object. This means, a vector produces this example, a result of some fitting something else, and so on.
The help for summary shows this for a logical vector, where things like mean and median don't make any sense:
```> unclass(attenu\$station) < 25 ->  d
> summary(d)
Mode   FALSE    TRUE    NA's
logical     124      42      16
```
To calculate quantiles, R has one general method which is able to compute them using different algorithms. The following example computes them using custom probabilities.
```> d <- c(10,11,12,15,19,25,29,44,81,99)

> quantile(d)
0%   25%   50%   75%  100%
10.00 12.75 22.00 40.25 99.00

> quantile(d,probs=c(0.1,0.2,0.5))
10%  20%  50%
10.9 11.8 22.0

> quantile(d,probs=c(0.1,0.2,0.5))
10%  20%  50%
10.9 11.8 22.0

> quantile(d,probs=c(0.1,0.2,0.5),type=8)
10%      20%      50%
10.36667 11.40000 22.00000
```

## models

R has support for models, linear, generalized and nonlinear. Here is a link to R's online manual, which shows some of its capabilities. Those data sets are data frames. They are a bit special and consist mainly of named vectors. Here is a short example:
```> dy <- c(0.1, 0.2, 0.5, 0.6, 0.67, 0.9)
> dx <- c(  0, 0.1, 0.4, 0.6, 0.8,  1 )
> data.frame(x=dx, y=dy) -> df
> df
x    y
1 0.0 0.10
2 0.1 0.20
3 0.4 0.50
4 0.6 0.60
5 0.8 0.67
6 1.0 0.90

> fit <- lm(y ~ x, data=df)
> fit

Call:
lm(formula = y ~ x, data = df)

Coefficients:
(Intercept)            x
0.1298       0.7555

> summary(fit)

Call:
lm(formula = y ~ x, data = df)

Residuals:
1        2        3        4        5        6
-0.02983 -0.00538  0.06796  0.01685 -0.06425  0.01464

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.12983    0.03458   3.754 0.019880 *
x            0.75553    0.05751  13.138 0.000194 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.05041 on 4 degrees of freedom
Multiple R-Squared: 0.9774,     Adjusted R-squared: 0.9717
F-statistic: 172.6 on 1 and 4 DF,  p-value: 0.0001938

> newdat <- data.frame(x = seq(0,1,0.1))
> predict(fit,newdat,interval="confidence") -> pred.dy
> pred.dy
fit        lwr       upr
1  0.1298265 0.03380449 0.2258484
2  0.2053796 0.12164929 0.2891099
3  0.2809328 0.20805487 0.3538106
4  0.3564859 0.29228714 0.4206847
5  0.4320390 0.37337348 0.4907046
6  0.5075922 0.45039349 0.5647909
7  0.5831453 0.52304869 0.6432420
8  0.6586985 0.59190479 0.7254922
9  0.7342516 0.65795575 0.8105475
10 0.8098048 0.72210872 0.8975008
11 0.8853579 0.78500848 0.9857074

> matplot(newdat\$x, pred.dy, lty = c(1,2,2), type="l", ylab="predicted")
> points(df,col=3,pch=20)
```
The last two commands give that plot (the upper and lower lines are the confidence borders):