The student who wants to sign up for this subject should have acquired programming skills, preferably, object oriented programming with a language such as C++ and/or Java.
Upon completion of the course, the student will be able to:
- Design and implement a data mining process for real-world applications
- Use data warehouse tools and OLAP techniques to perform a given data mining analysis.
- Assess the results of data mining, visualize the results, and use the knowledge extraction appropriate for the domain of interests.
- Communicate with experts and non-experts (broad audience) the application and use of data mining
- Describe the main algorithms of data mining and be able to develop them, and if it is necessary, adapting the best algorithms for the application at hand and thus, improving their efficiency and performance.
As a summary, the student will be able to work as a data mining analyst, using current commercial data mining tools, and develop projects of data mining as well.
The contents for this course are the following:
1. Introduction to Data Mining and KDD
2. Datawarehouse and OLAP technologies for data mining
3. Data preprocessing
4. Association analysis
5. Classification and regression
6. Clustering
7. Assessment, visualization and use of the knowledge extracted
8. Mining complex data (spacial, temporal, web mining)
9. Data Mining tools: Pentaho, Weka, DBMiner.
The course follows the methodology Problem Based Learning (PBL). This methodology is aimed at promoting the student´s learning by means of the definition of a problem that the student should develop with a team of other students. The student does not attend classes as in the traditional way (lecture classes). Lectures usually anticipate the knowlege that a student should acquire before the student feels the need for that particular knowledge. By using PBL, the student builds his own knowledge on the domain by means of the guided solution to a given problem properly designed.
The benefits of this methodology are the achievement of a better learning and the preparation of the professional skills needed by a student when she enters the workforce. During the development of the problem, the student has to use and improve several skills such as teamwork, project management, communication, etc. In this methodology, there is not a clear separation between `theory´ and `practice´, but both are interleaved continously during the project development. The student acquires the required knowledge incrementally, as he needs it to solve the problem.
The specific implementation of this methodology in this course is detailed as follows:
- For each subject of the course, a problem will be defined. This refers to section 2,3,4 and 5 of the contents detailed above.
- The student will work and teams and will be required to provide a solution for that problem. The professor will act as a supervisor that will help and guide the project development. A list of resources (notes, books and articles) will be available for each project. Some of this resources will be prepared by the professor, while others will be included by the students themselves, thus promoting that the students acquire autonomous learning.
- There are no lecturing classes as in the traditional way. Instead, seminars and meetings will be scheduled, which will garantee that 1) the project is properly develop by each of the teams and 2) there are no knowledge gaps by the students.
- The teams will be sized depending on the number of students at class and the complexity of the projects.
- The might be different problems for a specific subject of the course. Each team might develop a given problem and at the end, each team shares the findings and results with the remaining students of the course.
Grading is adapted to the PBL methodology as follows. Each student will be graded based on the results of the project (team grade) and on his own contribution to the development of the project as well (individual rate). There is also an exam that evaluates the student on their overall acquisition of the topics of the course.
The project grade counts for 70% of the final grade and the exam grade counts for the remaining 30%. The resulting value should be greater than 5 to pass the course.
The projects should be delivered in the scheduled dates. Each delivery includes an oral presentation. This presentation is needed to share the knowledge with the remaining teams and to consolidate the knowledge and lessons learned during the project.
If the student is not able to deliver the project on time, there is another extraordinary call open at june/july. However, the students presenting their projects at that call will have their grades penalized with a maximum value of 6 over 10. The students are encouraged to deliver their results on time so that all students can extract the maximum learning from this experience.
Please read the previsous section.
[1] Han Jiawei, Kamber Micheline, `Data Mining: Concepts and Techniques, Second Edition´, Morgan Kaufmann Publishers- Elsevier, 2006.
[2] J.Hernández Orallo, M.J.Ramírez, C. Ferri Ramírez, `Introducción a la Minería de Datos´, Pearson - Prentice Hall, 2004.
[3] Ian H. Witten, Eibe Frank, `Data Mining: Practical Machine Learning Tools and Techniques, Second Edition´, Morgan Kaufmann Publishers-Elsevier, 2005.
[4] Dorian Pyle, `Data Preparation for Data Mining´, Morgan Kaufmann Publishers, 1999.
Slides and articles will be available for each project in eStudy.