How should one go about choosing the value of k? In fact, there may not be an obvious best solution. Consider choosing a small value for k. Then it is possible that the classification or estimation may be unduly affected by outliers or unusual observations (noise). With small k (e.g., k = 1), the algorithm [...]
Jan
30
DATABASE CONSIDERATIONS
For instance-based learning methods such as the k-nearest neighbor algorithm, it is vitally important to have access to a rich database full of as many different combinations of attribute values as possible. It is especially important that rare classifications be represented sufficiently, so that the algorithm does not only predict common classifications. Therefore, the data [...]
Jan
29
QUANTIFYING ATTRIBUTE RELEVANCE : STRETCHING THE AXES
Consider that not all attributes may be relevant to the classification. In decision trees (Chapter 6), for example, only those attributes that are helpful to the classification are considered. In the k-nearest neighbor algorithm, the distances are by default calculated on all the attributes. It is possible, therefore, for relevant records that are proximate to [...]
One may feel that neighbors that are closer or more similar to the new record should be weighted more heavily than more distant neighbors. For example, in Figure 5.5, does it seem fair that the light gray record farther away gets the same vote as the dark gray vote that is closer to the new [...]
Now that we have a method of determining which records are most similar to the new, unclassified record, we need to establish how these similar records will combine to provide a classification decision for the new record. That is, we need a combination function. The most basic combination function is simple unweighted voting.
Jan
27
Solution of Merchant Termination
Have you ever been facing merchant termination problem? What do you feel about that? Most of you may feel disappointed or even get stressed. Yeah, this condition can be called as the worst condition in the merchant life. There are lots of more problems that occur because of this problem. However, you can still solve [...]
For example, lets find an answer to our earlier question: Which patient is more similar to a 50-year-old male: a 20-year-old male or a 50-year-old female? Suppose that for the age variable, the range is 50, the minimum is 10, the mean is 45, and the standard deviation is 15. Let patient A be our [...]
We have seen above how, for a new record, the k-nearest neighbor algorithm assigns the classification of the most similar record or records. But just how do we define similar? For example, suppose that we have a new patient who is a 50-year-old male. Which patient is more similar, a 20-year-old male or a 50-year-old [...]
Jan
24
k-NEAREST NEIGHBOR ALGORITHM (2)
However, suppose that we now let k = 2 for our k-nearest neighbor algorithm, so that new patient 2 would be classified according to the classification of the k = 2 points closest to it. One of these points is dark gray, and one is medium gray, so that our classifier would be faced with [...]
Jan
23
k-NEAREST NEIGHBOR ALGORITHM
The first algorithm we shall investigate is the k-nearest neighbor algorithm, which is most often used for classification, although it can also be used for estimation and prediction. k-Nearest neighbor is an example of instance-based learning, in which the training data set is stored, so that a classification for a new unclassified record may be [...]


