Statistical functions

R was written for statistics and therefore it has endless methods for doing this job. I can only summarize some of the most important functions and a lot will be missing in any case.

basic information

First chapter is about some basic methods to get information about the data.
summary returns some information for each dimension of the data:
> c(23,154,22,64,33,41) -> d
> summary(d)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  22.00   25.50   37.00   56.17   58.25  154.00
The exact behavior of this function is determined by the class of the particular object. This means, a vector produces this example, a result of some fitting something else, and so on.
The help for summary shows this for a logical vector, where things like mean and median don't make any sense:
> unclass(attenu$station) < 25 ->  d
> summary(d)
   Mode   FALSE    TRUE    NA's
logical     124      42      16
To calculate quantiles, R has one general method which is able to compute them using different algorithms. The following example computes them using custom probabilities.
> d <- c(10,11,12,15,19,25,29,44,81,99)

> quantile(d)
   0%   25%   50%   75%  100%
10.00 12.75 22.00 40.25 99.00

> quantile(d,probs=c(0.1,0.2,0.5))
 10%  20%  50%
10.9 11.8 22.0

> quantile(d,probs=c(0.1,0.2,0.5))
 10%  20%  50%
10.9 11.8 22.0

> quantile(d,probs=c(0.1,0.2,0.5),type=8)
     10%      20%      50%
10.36667 11.40000 22.00000

models

R has support for models, linear, generalized and nonlinear. Here is a link to R's online manual, which shows some of its capabilities. Those data sets are data frames. They are a bit special and consist mainly of named vectors. Here is a short example:
> dy <- c(0.1, 0.2, 0.5, 0.6, 0.67, 0.9)
> dx <- c(  0, 0.1, 0.4, 0.6, 0.8,  1 )
> data.frame(x=dx, y=dy) -> df
> df
    x    y
1 0.0 0.10
2 0.1 0.20
3 0.4 0.50
4 0.6 0.60
5 0.8 0.67
6 1.0 0.90

> fit <- lm(y ~ x, data=df)
> fit

Call:
lm(formula = y ~ x, data = df)

Coefficients:
(Intercept)            x
     0.1298       0.7555

> summary(fit)

Call:
lm(formula = y ~ x, data = df)

Residuals:
       1        2        3        4        5        6
-0.02983 -0.00538  0.06796  0.01685 -0.06425  0.01464

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.12983    0.03458   3.754 0.019880 *
x            0.75553    0.05751  13.138 0.000194 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.05041 on 4 degrees of freedom
Multiple R-Squared: 0.9774,     Adjusted R-squared: 0.9717
F-statistic: 172.6 on 1 and 4 DF,  p-value: 0.0001938

> newdat <- data.frame(x = seq(0,1,0.1))
> predict(fit,newdat,interval="confidence") -> pred.dy
> pred.dy
         fit        lwr       upr
1  0.1298265 0.03380449 0.2258484
2  0.2053796 0.12164929 0.2891099
3  0.2809328 0.20805487 0.3538106
4  0.3564859 0.29228714 0.4206847
5  0.4320390 0.37337348 0.4907046
6  0.5075922 0.45039349 0.5647909
7  0.5831453 0.52304869 0.6432420
8  0.6586985 0.59190479 0.7254922
9  0.7342516 0.65795575 0.8105475
10 0.8098048 0.72210872 0.8975008
11 0.8853579 0.78500848 0.9857074

> matplot(newdat$x, pred.dy, lty = c(1,2,2), type="l", ylab="predicted")
> points(df,col=3,pch=20)
The last two commands give that plot (the upper and lower lines are the confidence borders):

(more about plotting later)

Links