Methods of collection, classification and prediction. Decision trees.



Classification. Classification can be used to get an idea of the type of customer, product or object, describing a number of attributes to identify a specific class. For example, cars are easily classified by type (sedan, SUV, convertible), defining various attributes (seats, body style, drive wheels). Studying the new car, you can take it to a certain class by comparing the attributes with well-known definition. The same principles can be applied to customers, for example, classifying them by age and social group.

Prediction - this is a broad topic, which extends from the prediction hardware component failures to detect fraud, and even predict the company's profits. In combination with other methods of data mining prediction involves trend analysis, classification, matching and model relationships. By analyzing past events or items, you can predict the future.

For example, using data on the credit card authorization, you can combine the analysis of the decision tree of past human transactions with the classification and comparison with historical patterns to identify fraudulent transactions. If you buy tickets in the US coincides with the transaction in the US, it is likely that these transactions are genuine.

Decision trees

The DT algotithms are the most useful in classification problems.Using this technique a tree is constructed to model the classification process.Once the tree is built it is applied to each tuple in the database and results in classifications for that tuple.

Definition: Decision Tree (DT)or Classification Tree is a tree associated with D(D is data base which contains tuples {t,…..t} and attributes {A,….A})that has the following properties:

each internal node (vertex) is labeled with A1 attribut

each are (edge)is labeled with a predicate that can be appiled to the attribute associted with the parent node

each leaf node is labelad with a C j class

Solving the classification problem using DT is a two steps process

1. decision tree induction: construction of DT and using training data

2. application of DT for each ti in D to determinate its class.

Lets consider a simple DT to understand this definition. There is a known rule to determine body weight of adult person ( at age after 20-25):

Fristly,let’ s determine parameter DW =100* ( W/ H-100)-1), % , where W is a weight of a person , kg; H is a height of a person , cm. dimensioneless values of these attributes are used to calculate DW. The following approximate information has been obtained using statistical data of medical observations

Questions

1.What is a Data Mining? What is the significance of DM application in some modern fields of science and business dealing with large Databases (Big Data).

2.Give the main differences between Data Warchouse and conventional Database System.

3.Define and explain the typical DM tasks.

4.Describe several often used DM tehniques.

5.Describe the 5A process model.

6.Describe the CRISP DM process model.

7.Describe some DM applications for customer services.

8.Describe DM application in computer security analysis and management.

9.How are classifications used and applied?

References

1.Jiawei Han and MichelineKamber. Data Mining: Concepts and Techniques, Second Edition, Elsevier,2006, Third edition, Tlsevier, 2012.743pp.

2.Margaret H. Dunham. Data mining. Introductory and adbanced Topics Pearson Education,Singapore 2003; Naj press, India 2005.,314pp.

3.WEKA. Collection of machine Learning algorithms and software for data mining tasks.Accessible Open sourse On-Line and free downland including several manuals PDF: www.cs.waicato.ac.nz/ml/weka/

 

Lecture №12. Data management..


Purpose: to give a common concepts of correlation, regression, and also to become acquainted with descriptive statistics.


Plan:
1. Processing of large volumes of data. Methods and stages of  Data mining.

     2. Tasks Data mining. Visualization of data.

 

1. Processing of large volumes of data.Methods and stages  Data Mining
All Data Mining techniques are divided into two large groups according to the principle of working with the original training data. The upper level of this classification is determined based on whether the data Data Mining after they are stored or distilled for later use.

1. The direct use of the data, or save the data.
In this case, the original data is stored in a detailed and explicit directly used for predictive modeling steps and / or analysis exceptions. The problem with this group of methods - using them can be difficult analysis of very large databases.

Methods of this group: cluster analysis, nearest neighbor method, the method of k-nearest neighbor, reasoning by analogy.

2. Identify and use of formal laws or distillation templates.

When distillation technology templates one sample (template) information is extracted from the raw data is converted into a certain formal structure whose form depends on the Data Mining method. This process is performed on the stage of the free search, in the first group stage of this method, in principle, no. On stage predictive modeling and analysis results are used exceptions stage of free search, they are much more compact databases themselves. Recall that the construction of these models can be interpreted by the analyst.


Дата добавления: 2018-04-15; просмотров: 2710; Мы поможем в написании вашей работы!

Поделиться с друзьями:






Мы поможем в написании ваших работ!