This file contains the handouts for the course January – April 2019

Before term tests and exams some old tests and exams are posted on the course OWL website.

January 7, 2019 : Stat3858bOutline2019

January 9, 2019 :

BEESWAX.DAT and coal.txt These data files are in ASCII format. Data (not too massive) often comes in ASCII (text) form, or in a spreadsheet form (eg excel). Sometimes data comes in a data base format, discussed in another course.

Setting for statistical models and parameterization. Most of this material is not in the text.

Ch8-StatModels.pdf

January 11, 2019

How can we compare estimators and later make statistical inferences? Two examples.

Aside : One can use Monte Carlo methods to simulate some r.v.s even if one cannot easily calculate their distributions. This is done often in financial mathematics, but can be done in actuarial sciences.

Jan 14, 2019

Here we first consider a complex but otherwise not interesting example.

Simulation example with not known distribution (R code)

TwoEstimatorsPoisson.r

compare-estimators.r - contains Poisson and some other examples.

Over the next couple of lectures we study

ch8-3-estimator-prop.pdf

ch8-3-Consistency-ContinuousFunctions.pdf

How do we construct of find estimators? The two most common methods are the method of moments and maximum likelihood estiamtion (sections 8.4 and 8.5). Some of this material is a review and some is new.

Ch8-4MethodofMoments and Ch8-5-MLE-I

January 23, 2019

Data and R code for fit

Gamma method of moments point estimate This R code is corrected Jan 25, 2019. Ignore the warning with this R code; I will modify later the code to avoid the step that generates this warning.

data illinois rain 1960 in ASCII or text format one data point per line

illinois rain 1961 , illinois rain 1962 , illinois rain 1963 , illinois rain 1964

Next we need to obtain the distributions of the estimators, and more specifically the centred at the true theta or centred and scaled (standardized) r.v.s from these estimators.

Sometimes these be approximated (recall the CLT is an approximation to the distribution of the centred or scaled r.v. X.bar), via a limit theorem as the sample size N tends to infinity.

delta method useful for obtaining approximations to the distribution of method of moment estimators

Ch8-5-MLE-II.pdf used for the so called regular models for MLE. Introduces Fisher’s information

Gamma-Method-of-Moments-II.R continuation of method of moments gamma fit example but with the normal approximaion using the delta method.

Jan 30 2019 : The February term test and formula sheet from 2018 are posted on the course OWL website. Students should be able to access these as of today.

February 1, 2019 continue with MLE.

Some additional comments on Fisher's Infoinformation see specifically pages 3 to 5

Sometimes we need to use numerical methods, since an analytic or algebraic solution is not available. The Gamma distribution MLE is one such case.

GammaMLE R script

February 4, 2019 : Continue with MLE from Feb 1. Some typos p7 in this handout corrected Feb 15, 2019.

Ch8-7and8-CramerRao-SufficientStats Sufficient statistics is very useful in statistical applications. The topic at the end of this handout, Rao-Blackwell, is useful since it tells us that estimators which are functions of sufficient statistics are better than estimators that are not (there are some technical qualifications for this statement).

The text for next week on February 13 will cover material up to the end of the section on sufficient statistics. The test will be held during the tutorial time and in WSC 240 and WSC 248 as there are too many to fit into WSC 240 for an exam. About 20 students will write in WSC 248.

February 8, 2019

Continue with Rao-Bllackwell

The Rice text discusses another approximation for the distribution of a pivotal (normalized and standardized estimator), called the parametric bootstrap.

Introduction parametric bootstrap introduction

Apply to gamma fit rainfall example

parametric bootstrap

another example using coal mine time between disaster date with exponential model

parametricboot.r R script and data (in ASCII or text format) coal.txt This data is in the R base package boot.

This data frame (in a data frame format) gives the dates of 191 explosions in coal mines which resulted in 10 or more fatalities. The time span of the data is from March 15, 1851 until March 22 1962

Later we will introduce the nonparametric bootsrap method, more simply called the bootstrap method, and boot is one of several R packages that implements this.

Chapter 9 Introduction to hypothesis testing and decision rules

February 11, 15 Section 9.1 up to Section 9.4

Likelihood Ratio and decision rules

Additional example is these sections and 9.5 are given in further handouts.

R script for Binomial (n = 10) and exponential (n = 1) likelihood ratio LR-intro.r The numerator and denominator for the LRs have been changed to correspond to the Rice text.

Feb 15 and after reading week. Section 9.2 Neyman Pearson Lemma and size and power

Feb 27, 2019 Test 1

summary(gr)

Q1 Q2 Q3 Total

Min. : 4.00 Min. : 0.000 Min. : 2.00 Min. : 9.00

1st Qu.: 9.00 1st Qu.: 5.000 1st Qu.: 8.00 1st Qu.:21.00

Median :11.00 Median : 5.000 Median :11.00 Median :28.00

Mean :10.66 Mean : 6.446 Mean :10.71 Mean :27.82

3rd Qu.:13.00 3rd Qu.: 7.000 3rd Qu.:14.00 3rd Qu.:34.00

Max. :16.00 Max. :17.000 Max. :17.00 Max. :47.00

March 1 and 4, 2019 : Continue with LR and decision rules.

Exponential example Exponential Generalized Likelihood Ratio

Chapter 9.5 Multinomial Multinomial Likelihood Ratio This also introduces a limit theorm (Theorem A p 341) and another approximation to the GLR statistic giving rise to the Pearson chi-suqre statistic. This approximation uses a second order truncated Taylor series; students should review this from first year calculus if needed, and recall it was also used in Stat 3657.

Hardy Weinberg R code GLRHardyWeinberg.r

Continue with Multinomial GLR material from Chapter 13.3 and 13.4. ContingencyTables

Note that contingency tables are just a special case on multinomial distributions. There are two natural hypothesis testing problems that are a direct application of Chapter 9.5

Also the due date for Assignment 2 is March 13 and not March 12 (which is not a lecture date)

March 6, 2019 : The March 2018 term test is now posted on the coures OWL website.

March 8, 2019 : continue with contingency tables

Sec 13.3 Jane Austen example from text

Sec13-3-JaustenData.txt Austen data from text 13.3 page 517

March 11, 2019

JaneAusten-eg.r R script for this data analysis. Part of the data analysis is to decide on which of the two contingency table models (mechanism 1 or 2) is appropriate.

Another contingency table example cold-eg.r

More GLR applications Section 11.2.1 two sample normal This handout also solves problem 11.6.11

March 13, 2019 : The test on March 20 will cover material up to the end of GLR. In chapter 9 sections 9.6 – 9.9 are not covered for this test. In chapter 11 only section 11.2.1 equal variance normal case is covered, and in Chapter 13 only sections 13.3 and 4 are covered. Any algebraic calculations of course have to fit into the time constraints of the test. The main emphasis wil be on material since the first test, and not that some of the ideas from that first test material are used even in the hypothesis testing calculations. The same formula sheet will be supplied, along with required statistical tables fro the Rice text. The only relevant tables will be normal, chisquare and student’s but not necessarily all.

We now consider some further questions of model diagnostics which are discussed in the text in miscellaneous sections in Chapters 8, 9, 10

Ch9-Goodness-of-fit.pdf

construction of QQ plots, including QQ norm qqplotsintro.r which requires the topic of moment approximations from Stat 3657 Moment approximations (these are notes on this topic from Stat 3657)

QQ plots take some practice to read. Probability-probability (PP) plots are sometmes used, but not as often as QQ plots.

R code to use Jarque-Bera and Kolmogorov-Smirnov tests normtest.r

March 22, 2019

Date and room for final exam

Sunday April 28 7:00 – 10 PM SSC 2028

Nonparametric bootstrap (also called bootstrap) Bootstrap

boot-1.r implementation of bootstrap by direct coding

boot-2.r a continuation of boot-1.r

boot-realdata examples.r apply bootstrap to analyze some data from earlier in the course

March 29, 2019 Assignment 3 due date April 3, 2019

Some additional remarks about (nonparametric) bootstrap.

boot-1A.r an example of the ratio type test statistic

Return to section 8.6 Introduction to Bayesian methods.

BayesMethods.pdf

BayesEgnonconjugate.pdf

April 1, 2019

Test results for March term test

The exam was about a half question to long, so it is graded out of 42.

The scores on the returned test is out of 50 = 16 + 17 + 17 but is counted in the records as a score out of 42

(42 = 50 – floor(17/2) = 50 – 8. )

Histogram (in percent) out of 42.

T2-2019-Percent-out-of-42.pdf

April 3, 2019 : continue with the Bayes estimation from last week.

A non conjugate example with simplest numerical integration

cointipping1.r

coin tipping data cointipping.csv