Discussion on: Identify and describe some approaches to Data Mining and Analytics

View post

Data mining is extraction of hidden predictive information from large databases. It is a powerful new technology with great potential to help banks and financial institutions focus on the most important information in their data warehouses (Pei, Han, & Lakshmanan, 2001). The tools of data mining predicts future trends and behaviors, allowing banks and financial institutions to make proactive, knowledge-driven decisions. It also provides response to business questions that traditionally were too time consuming to resolve. Tools also help to search the data for hidden pattern. For example most of corporate houses, business houses and commercial banks and financial institutions already collect and refine large quantities of data, implementation of data mining solution. Nowadays these techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line (Hong & Mozetic, 2001).

Data Mining is supported by three technology and these are

Huge amount of data collection or large amount of data collection
Fast speed, high storage and powerful multi-processor computer
The Powerful data mining algorithm

The size of a bank or financial institution’s databases would in general depend on the kind of activities, which are being carried on by it. However a typical bank or financial institution engaged in retail activities may have databases of size in petabyte range. The accompanying need for improved computational engines can now be met in a cost effective manner with parallel multiprocessor computer technology. The last component of data mining algorithm techniques, which cull out information from the large mass of raw data residing in the databases (Hecht-Nielsen, 2001).

The most common using approach and technique in data mining (Hosking & .Pednault, 2007) is

Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure. The data is arrange automatically by using machine learning and deep learning teaching and analyze these data as a similar process.
Decision trees: Tree-shaped structures that represent sets of decisions. These decision generate rules for the classification of dataset. Specific decision tree methods include Classification and Regression Trees (CART) and chi Square Automatic Interaction Detection (CHAID). We have different algorithm like Heuristic algorithm, A stare and other algorithm for proper data mining.
Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.
Rule induction: The extraction of useful if-then rules from data based on statistical significance. It is rule based system we analysis the data step by step as rule wise.

Generally, these technique have been use for more than a decade in specialized analysis tools work with relatively big volumes of data. These capabilities arc now evolving to integrate directly with industry-standard data warehouse and OLAP platforms. When we have huge amount of data like Facebook, Google, and Amazon and Alibaba server then we definitely follow the OLAP (Online analytical processing server) Technique for data warehousing and data mining technique.

References
Hecht-Nielsen, R. (2001). Neurocomputing and data mining. MA: Addison-Wesley.

Hong, J., & Mozetic, I. (2001). Incremental learning of attribute-based descriptions from examples, the method and user’s guide. In Report ISG 85-5 UIUCDCS-F Department of Computer Science, University of Illinois.

Hosking, J., & .Pednault, E. (2007). A statistical perspective on data mining. Future Generation Computer Systems.

Pei, J., Han, J., & Lakshmanan, S. (2001). Mining frequent itemsets with convertible con- straints. In Proc. Int. Conf. Data Engineering (ICDE’01).