Jan

22

CLASSIFICATION TASK

Perhaps the most common data mining task is that of classification. Examples of classification tasks may be found in nearly every field of endeavor:

Banking: determining whether a mortgage application is a good or bad credit risk, or whether a particular credit card transaction is fraudulent
Education: placing a new student into a particular track with regard [...]

Filled Under: General

Jan

21

BIAS–VARIANCE TRADE-OFF

Suppose that we have the scatter plot in Figure 5.3 and are interested in constructing the optimal curve (or straight line) that will separate the dark gray points from the light gray points. The straight line in has the benefit of low complexity but suffers from some classification errors (points ending up on the wrong [...]

Filled Under: General

Jan

20

METHODOLOGY FOR SUPERVISED MODELING (2)

The adjusted data mining model is then applied to a validation data set, another holdout data set, where the values of the target variable are again hidden temporarily from the model. The adjusted model is itself then adjusted, to minimize the error rate on the validation set. Estimates of model performance for future, unseen data [...]

Filled Under: General

Jan

19

METHODOLOGY FOR SUPERVISED MODELING

Most supervised data mining methods apply the following methodology for building and evaluating a model. First, the algorithm is provided with a training set of data, which includes the preclassified values of the target variable in addition to the predictor variables. For example, if we are interested in classifying income bracket, based on age, gender, [...]

Filled Under: General

Jan

18

SUPERVISED VERSUS UNSUPERVISED METHODS

Data mining methods may be categorized as either supervised or unsupervised. In unsupervised methods, no target variable is identified as such. Instead, the data mining algorithm searches for patterns and structure among all the variables. The most common unsupervised data mining method is clustering, our topic in Chapters 8 and 9. For example, political consultants [...]

Filled Under: General

Jan

17

VERIFYING MODEL ASSUMPTIONS (3)

Of course, point estimates have drawbacks, so analogous to the simple linear regression case, we can find confidence intervals and prediction intervals in multiple regression as well. We can find a 95% confidence interval for the mean nutritional rating of all such cereals (with characteristics similar to those of Shredded Wheat: 80 calories, 2 grams [...]

Filled Under: General

Jan

16

VERIFYING MODEL ASSUMPTIONS (2)

After thus checking that the assumptions are not violated, we may therefore proceed with the multiple regression analysis. Minitab provides us with the multiple regression output shown in Figure 4.10.
Let us examine these very interesting results carefully. The estimated regression equation is as follows:

Filled Under: General

Jan

15

VERIFYING MODEL ASSUMPTIONS

Before a model can be implemented, the requisite model assumptions must be verified. Using a model whose assumptions are not verified is like building a house whose foundation may be cracked. Making predictions using a model where the assumptions are violated may lead to erroneous and overoptimistic results, with costly consequences when deployed.

Filled Under: General

Jan

14

MULTIPLE REGRESSION

Suppose that a linear relationship exists between a predictor variable and a response variable but that we ignored the relationship and used only the univariate measures associated with the response variable (e.g., mean, median) to predict new cases. This would be a waste of information, and such univariate measures would on average be far less [...]

Filled Under: General

Jan

13

PREDICTION INTERVALS FOR A RANDOMLY CHOSEN VALUE OF y GIVEN x (2)

Minitab supplies us with the regression output shown in Figure 4.5 for predicting nutrition rating based on sugar content. We also asked Minitab to calculate the confidence interval for the mean of all nutrition ratings when the sugar content equals 1 gram. Lets examine this output for a moment. 

Filled Under: General