Home
StatPlus 2007 Professional Help Prev Page Prev Page
StatPlus
License agreement
Support
What's New
Getting started
Loading program
Using Keyboard
Entering Data
Editing Data
Statistics
Analyzing Data
Bibliography
Elementary Concepts
Basic Statistics
Descriptive Statistics
Comparing Means
One Sample T-Test
F-Test Two-Sample for Variances
Linear Correlation (Pearson)
Fechner Correlation
Covariance
Normality Tests
Frequency Tables
Cross Tabulation
ANOVA
One-way ANOVA
Two-way and Three-way ANOVA
GLM ANOVA
Latin Squares Analysis
Regression
Linear Regression
Polynomial regression
Stepwise Regression
Binary logistic regression
Cox proportional-hazards regression
Nonparametric statistics
2x2 Tables
Rank Correlations
Comparing two independent samples
Comparing multiple independent samples
Comparing two dependent samples
Comparing multiple dependent samples
Cochran Q Test
Time Series/Forecasting
Autocorrelation and Partial AC
Moving Average
Interrupted Series Analysis
Survival Analysis
Cox proportional-hazards regression
Probit analysis
Charts
Control Charts
Tutorial On Chart Building
Function Reference
All Functions
Math
General
Statistical
Financial
Customizing StatPlus
General
View
Saving
Add-ons
Other
About AnalystSoft

Purpose

This procedure performs simple and multiple regression using least squares.

Montgomery (1982) outlines the following four purposes for running a regression analysis:

Description - The analyst is seeking to find an equation that describes or summarizes the relationship between two variables. This purpose makes the fewest assumptions.

Coefficient Estimation - This is a popular reason for doing regression analysis. The analyst may have a theoretical relationship in mind, and the regression analysis will confirm this theory. Most likely, there is specific interest in the magnitudes and signs of the coefficients. Frequently, this purpose for regression overlaps with others.

Prediction - The prime concern here is to predict the response variable, such as sales, delivery time, efficiency, occupancy rate in a hospital, reaction yield in some chemical process, or strength of some metal. These predictions may be very crucial in planning, monitoring, or evaluating some process or system. There are many assumptions and qualifications that must be made in this case. For instance, you must not extrapolate beyond the range of the data. Also, interval estimates require that normality assumptions to hold.

Control - Regression models may be used for monitoring and controlling a system. For example, you might want to calibrate a measurement system or keep a response variable within certain guidelines. When a regression model is used for control purposes, the independent variable must be related to the dependent variable in a causal way. Furthermore, this functional relationship must continue over time. If it does not, continual modification of the model must occur.  

Preparations

Run Statistics→Regression→Linear Regression... command. Select dependent variable and predictors (independent variables).

Assumptions

First of all, as is evident in the name multiple linear regression, it is assumed that the relationship between variables is linear. In practice this assumption can virtually never be confirmed. Fortunately, multiple regression procedures are not greatly affected by minor deviations from this assumption. Also it is assumed in multiple regression that the residuals (predicted minus observed values) are distributed normally (i.e., follow the normal distribution).

Results

R2 (R-Square) Coefficient of determination; indicates how much variation in the response is explained by the model. The higher the R2 , the better the model fits your data.
Adjusted R-Square Accounts for the number of predictors in your model and is useful for comparing models with different numbers of predictors. The formula is:
1 -         MS Error       
SS Total / DF Total
Sum of squares (SS) The sum of squared distances. SS Total is the total variation in the data. SS Regression is the portion of the variation explained by the model, while SS Error is the portion not explained by the model and is attributed to error.
Degrees of freedom (d.f.) Indicates the number of independent pieces of information involving the response data needed to calculate the sum of squares. The degrees of freedom for each component of the model are:
DF Regression = p
DF Error = n - p - 1
Total = n - 1
where n = number of observations and p = number of predictors.
MS Regression Mean square regression. The formula is:
SS Regression
DF Regression
MS Error Mean square error, which is the variance around the fitted regression line. MS Error = s2. The formula is:
SS Error
DF Error
F If the calculated F-value is greater than the F-value from the F-distribution, then at least one of the coefficients is not equal to zero. The F-value is used to determine the p-value. The formula for the calculated F-value is:
MS Regression
MS Error
Residuals The difference between the observed values and predicted values.
Variance inflation factor (VIF) Used to detect multicollinearity (correlated predictors). VIF measures how much the variance of an estimated regression coefficient increases if your predictors are correlated.