从你的数据仓库发掘隐藏财富13453.docx
《从你的数据仓库发掘隐藏财富13453.docx》由会员分享,可在线阅读,更多相关《从你的数据仓库发掘隐藏财富13453.docx(14页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、An Introduction to Data MiningDiscovering hidden value in your data warehouseOverviewData mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses
2、. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data min
3、ing tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.Most companies already collect and refine massive quantities of dat
4、a. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel pro
5、cessing computers, data mining tools can analyze massive databases to deliver answers to questions such as, Which clients are most likely to respond to my next promotional mailing, and why?This white paper provides an introduction to the basic technologies of data mining. Examples of profitable appl
6、ications illustrate its relevance to todays business environment as well as a basic description of how data warehouse architectures can evolve to deliver the value of data mining to end users.The Foundations of Data MiningData mining techniques are the result of a long process of research and produc
7、t development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective d
8、ata access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: Massive data collection Powerful multiprocessor computers Data mining algorithms
9、 Commercial databases are growing at unprecedented rates. A recent META Group survey of data warehouse projects found that 19% of respondents are beyond the 50 gigabyte level, while 59% expect to be there by second quarter of 1996.1 In some industries, such as retail, these numbers can be much large
10、r. The accompanying need for improved computational engines can now be met in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms embody techniques that have existed for at least 10 years, but have only recently been implemented as mature, reliable, under
11、standable tools that consistently outperform older statistical methods.In the evolution from business data to business information, each new step has built upon the previous one. For example, dynamic data access is critical for drill-through in data navigation applications, and the ability to store
12、large databases is critical to data mining. From the users point of view, the four steps listed in Table 1 were revolutionary because they allowed new business questions to be answered accurately and quickly.Evolutionary StepBusiness QuestionEnabling TechnologiesProduct ProvidersCharacteristicsData
13、Collection (1960s)What was my total revenue in the last five years?Computers, tapes, disksIBM, CDCRetrospective, static data deliveryData Access (1980s)What were unit sales in New England last March?Relational databases (RDBMS), Structured Query Language (SQL), ODBCOracle, Sybase, Informix, IBM, Mic
14、rosoftRetrospective, dynamic data delivery at record levelData Warehousing & Decision Support(1990s)What were unit sales in New England last March? Drill down to Boston.On-line analytic processing (OLAP), multidimensional databases, data warehousesPilot, Comshare, Arbor, Cognos, MicrostrategyRetrosp
15、ective, dynamic data delivery at multiple levelsData Mining (Emerging Today)Whats likely to happen to Boston unit sales next month? Why?Advanced algorithms, multiprocessor computers, massive databasesPilot, Lockheed, IBM, SGI, numerous startups (nascent industry)Prospective, proactive information de
16、liveryTable 1. Steps in the Evolution of Data Mining.The core components of data mining technology have been under development for decades, in research areas such as statistics, artificial intelligence, and machine learning. Today, the maturity of these techniques, coupled with high-performance rela
17、tional database engines and broad data integration efforts, make these technologies practical for current data warehouse environments.The Scope of Data MiningData mining derives its name from the similarities between searching for valuable business information in a large database for example, findin
18、g linked products in gigabytes of store scanner data and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find exactly where the value resides. Given databases of sufficient size and quality, dat
19、a mining technology can generate new business opportunities by providing these capabilities: Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive hands-on analysis can now
20、 be answered directly from the data quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bank
21、ruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events. Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery
22、is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors. Data mining techniques
23、 can yield the benefits of automation on existing software and hardware platforms, and can be implemented on new systems as existing platforms are upgraded and new products developed. When data mining tools are implemented on high performance parallel processing systems, they can analyze massive dat
24、abases in minutes. Faster processing means that users can automatically experiment with more models to understand complex data. High speed makes it practical for users to analyze huge quantities of data. Larger databases, in turn, yield improved predictions. Databases can be larger in both depth and
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数据仓库 发掘 隐藏 财富 13453
限制150内