#### First, a handy trick Agenda: - Vector operations - Building a matrix - Matrix operations - Distributions - Density - Distributions - Cumulative - Distributions - Random Generation - Central Limit Theorem ###### ## Vector Operations Simple math operations are performed on every element in a vector ## If we start with 1 to 5 1:5 ## We can double it 1:5 * 2 ## Or add 3 1:5 - 3 c(3,9,2) + 5 ## Vectors can also be added element by element c(5,10,10) + c(1,2,3) 1:5 + 5:1 ########### Matrix construction ### The general form is ### matrix( values, nrow = , ncol = 2) ### values is the values of the matrix ### nrow and ncol are the number of rows and columns mat.1 = matrix( 1:4, nrow = 2, ncol = 2) ## It fills one column up before moving on to the next mat.1 matrix(3, nrow = 2, ncol = 2) ## If you don't put enough elements in the values part ## then it'll repeat whatever is there until the matrix is full ### You can also take the transpose of a matrix with t( name of matrix ) t(mat.1) ### Or get the eigen values and eigen vectors (if the matrix is square) eigen(mat.1) ### Or get the determinant det(mat.1) ########## Matrix operations ## By default matrix operations are done element by element A = matrix( 1:4, nrow = 2, ncol = 2) B = matrix( 5:8, nrow=2, ncol=2) ### This is correct for addition, A + B ### But it isn't how matrix multiplication is done A * B ### To do a proper matrix multiplication, you need to use %*% A %*% B ### We can test this with the determinant, beacuse det(AB) = det(A)det(B) det(A) det(B) det(A)*det(B) ## What we should be getting det(A * B) ## No det(A %*% B) ## Yes ### A word of caution: Sometimes very small values are supposed to be zeroes. ### Because computers have storage limitations, sometimes answers that should be ### zero show up as something with e-14 e-15 or e-16, which is the same as 10^-14, 10^-15 and so on ### The determinant of C is obviously zero (singular matrix), but it's showing up as something else. C = matrix( 5:6, nrow=2, ncol=2) det(C) ############## Distributions ### Density ## We can access most of the classic distributions in R in several ways ## We can get their density with functions like dnorm, dpois ## Density of a standard normal at 0 dnorm(0) ## Compare ## exp stands for "natural exponent", it's used instead of e^ 1 / sqrt(2*pi) * exp( - 0/2) ## Density of standard normal at 1 dnorm(1) 1 / sqrt(2*pi) * exp( - (1- 0)^2 /2) ## We can modify it to suit a particular mean and standard deviation dnorm(3, mean=5, sd=10) ## Compare to 1 / sqrt(2*pi*10^2) * exp( -(3-5)^2 / (2*10^2)) ### Some other distributions we can do ### Density of a binomial, 3 successes out of 5, with chance of success 0.5. dbinom(x=3, size=5, p=0.5) ### Density of a poisson, at x=4, with lambda=6 dpois(x=4,lambda=6) ### Look up others with the ? command ?dbeta ?dgamma ?dt ?dchisq ?dexp ?dunif ############ Distributions ## Cumulative distributions ## Sometimes getting the cumulative distribution is useful ## especailly for finding the p-value of something when that variable is assumed ## For example, it we were doing a t-test, and got a t-score of 2.2, with 17 degrees of freedom, ## we could get the p-value. pt(2.2, df=17) ## The function above shows Pr( t < 2.2). ## If we wanted Pr( t > 2.2), we could modify the function pt(2.2, df=17, lower.tail=FALSE) ## Or just use the compliment property 1 - pt(2.2, df=17) ## Double to do a two-tailed test 2*(1 - pt(2.2, df=17)) ## These should be very similar 1 - pt(2.2, df=99999) 1 - pnorm(2.2) ## Some other tests you'll be likely to do ## Chi-squared test and ANOVA f-test 1 - pchisq(9, df=5) 1 - pf(4, df1=2, df2=15) ####################### ## Quantiles (skipped in the lab) ## We can also get critical values using the quantiles ## If we wanted to know the point at which 97.5% of the normal distribution was below qnorm(.975) ## Or where 5% of the t distibution was below qt(.05, df=11) ## as with d and p, we can use other settings to modify the distributions in question. qnorm(.50, mean=5) qnorm(.50, mean=5, sd=100) ############################################## #### Random generation ## Sometimes, especially when doing simulations, we'll want to randomly generate ## values from a distribution. For that, it's r rexp(mean=4) rexp(mean=4) ## If generating 1 value is good, generating 10 values is, like, twice as good. rpois(n=10, lambda=4) ## Be careful, if you generate a huge number of values, it will print them ## ALL to thre screen unless you give them a name x = runif(n=1000000, min=2, max=4) x[10:20] hist(x) mean(x) sd(x) ## If you're publishing the results of your simulation, or collaborating with someone else ## it's useful to be able to generate the same (pseudo) random numbers on ## different computers at different times ## To set the random number generator to a specific value, use set.seed() set.seed(42) rexp(mean=4) rexp(mean=4) set.seed(42) rexp(mean=4) rexp(mean=4) ######### Central Limit Theorem #### We can use random generation to see what a distribution looks like x = rnorm(n=100000) hist(x) ### Although there are much more efficient ways curve( dnorm(x), from=-3, to=3) ### Now that we've seen what a normal distribution looks like, ### we can try out the central limit theorem by simulation. ## Central Limit Theorem: As n --> infinity, the distribution of the sum (and the mean) of any distribution ## tends towards a normal. (If the variance is finite) ## We can do this by 1. Generating many sets of data from a distribution. 2. Getting the mean of each set. 3. Plotting a histogram of the means. 4. Comparing it to the normal distribution. ## First, set up a vector for the means (rep means repeat, so this is 0 repeated 40000 times) means1 = rep(0,40000) means3 = rep(0,40000) means5 = rep(0,40000) ## Then generate and get the means for(k in 1:40000) { means1[k] = mean(runif(n=1)) ## From a uniform, get 1 value, then get the mean means3[k] = mean(runif(n=3)) ## " " 3 values means5[k] = mean(runif(n=5)) ## " " 5 values } ## Build a histogram of the values ## For the sake of visual effect I've included ## probability = TRUE, so it's scaled to density and not frequency hist(means1, probability=TRUE, ylim=c(0,1.5), n=30) ## Draw the curve of the normal distribution ## mean = 0.5, and sd=sqrt(1/12) because uniform(0,1) variance = 1/12 ## add=TRUE because we want it on top of the histogram, not as a new picture curve(dnorm(x,mean=0.5, sd=sqrt(1/12)),add=TRUE) ## The same thing for the mean of 3 values hist(means3, probability=TRUE, n=30) curve(dnorm(x,mean=0.5, sd=sqrt(1/(12*3))),add=TRUE) ## And of 5 values hist(means5, probability=TRUE, n=30) curve(dnorm(x,mean=0.5, sd=sqrt(1/(12*5))),add=TRUE)