《人工智能与数据挖掘教学课件》2.datawarehou.ppt
《《人工智能与数据挖掘教学课件》2.datawarehou.ppt》由会员分享,可在线阅读,更多相关《《人工智能与数据挖掘教学课件》2.datawarehou.ppt(74页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、Data WarehouseWhy Data warehousenThe most common issue companies face when looking at data mining is that the information is not in one place.nThe biggest challenge business analysts face in using data mining is how to extract,integrate,cleanse,and prepare data to solve their most pressing business
2、problems.What is Data WarehousenThe idea of a data warehouse is to put a wide range of operational data from internal and external sources into one place so it can be better utilized by executives,line of business managers and other business analysts.nOnce the information is gathered,OLAP(on-line an
3、alytical processing)software comes into play by providing the desktop analysis tools for querying,manipulating and reporting the data from the data warehouse.Data Warehouse environment nthe source systems from which data is extracted nthe tools used to extract data for loading the data warehouse nth
4、e data warehouse database itself where the data is stored nthe desktop query and reporting tools used for decision support Data Warehousing Process Overview Operational Vs.Multidimensional View Of SalesCreating A Data WarehouseThe Data WarehousenThe Data Warehouse is an integrated,subject-oriented,t
5、ime-variant,non-volatile database that provides support for decision making.The Data WarehousenIntegratednThe Data Warehouse is a centralized,consolidated database that integrates data retrieved from the entire organization.nSubject-Oriented nThe Data Warehouse data is arranged and optimized to prov
6、ide answers to questions coming from diverse functional areas within a company.The Data WarehousenTime Variant nThe Warehouse data represent the flow of data through time.It can even contain projected data.nNon-Volatile nOnce data enter the Data Warehouse,they are never removed.nThe Data Warehouse i
7、s always growing.Operational Database vs.Data warehouse Operational DBnSimilar data can have different representations or meaningsnFunctional or process orientationnCurrent transactionnFrequent updating Data WarehousenUnified view of all data elementsnSubject orientation for decision supportnHistori
8、cal information with time dimensionnData are added without changeData MartnA data mart is a small,single-subject data warehouse subset that provides decision support to a small group of people.Data MartnData Marts can serve as a test vehicle for companies exploring the potential benefits of Data War
9、ehouses.nData Marts address local or departmental problems,while a Data Warehouse involves a company-wide effort to support decision making at all levels in the organization.Enterprise Data Warehouse(EDW)nA large scare data warehouse that is used across the enterprise for decision supportnEDW are us
10、ed to provide data for many types of DSS,including CRM,SCM,BPM,BAM,PLM,and KMS.nBPM:Business performance managementnBAM:Business activity monitoringnPLM:product lifecycle managementnKMS:Knowledge management systemsMetadatanMetadata is the data about data.nIn a data warehouse,metadata describe the co
11、ntents of a data warehouse and the manner of its usenGood metadata is essential to the effective operation of a data warehouse and it is used in data acquisition/collection,data transformation,and data access.The needs for Technical metadatanThe use of data warehousing and decision processing often
12、involves a wide range of different products,and creating and maintaining the meta data for these products is time-consuming and error prone.nAutomating the meta data management process and enabling the sharing of this so-called technical meta data between products can reduce both costs and errors.Th
13、e Needs for Business metadatanBusiness users need to have a good understanding of what information exists in a data warehouse.They need to understand what the information means from a business viewpoint,how it was derived,from what source systems it comes,when it was created,what pre-built reports a
14、nd analyses exist for manipulating the information,and so forth.metadata in a data warehousenKimball lists the following types of metadata in a data warehouse:nSource system metadatanData staging metadatanDBMS metadatanRalph Kimball,The Data Warehouse Lifecycle Toolkit,Wiley,1998,ISBN 0-471-25547-5s
15、ource system metadata nsource specifications,such as repositories,and source logical schemas nsource descriptive information,such as ownership descriptions,update frequencies and access methods nprocess information,such as job schedules and extraction code data staging metadata ndata acquisition inf
16、ormation,such as data transmission scheduling and results,and file usage ndimension table management,such as definitions of dimensions,and surrogate key assignments ntransformation and aggregation,such as data enhancement and mapping,DBMS load scripts,and aggregate definitions naudit,job logs and do
17、cumentation,such as data lineage records,data transform logs Star SchemanThe star schema is a data modeling technique used to map multidimensional decision support into a relational database.nStar schemas yield an easily implemented model for multidimensional data analysis while still preserving the
18、 relational structure of the operational database.Star SchemanFour Components:nFactsnDimensionsnAttributesnAttribute hierarchiesFigure 13.14 A Three-Dimensional View of Sales Figure 13.17 Attribute Hierarchies in Multidimensional Analysis FactsnNumeric measurements that represent specific business a
19、spect or activitynNormally stored in fact table that is center of star schemanFact table contains facts linked through their dimensionsnMetrics are facts computed at run timeDimensionsnQualifying characteristics provide additional perspectives to a given factnDecision support data almost always view
20、ed in relation to other datanStudy facts via dimensionsnDimensions stored in dimension tablesAttributesnDimensions provide descriptions of facts through their attributesnNo mathematical limit to the number of dimensionsnUse to search,filter,and classify factsnSlice and dice:focus on slices of the da
21、ta cub for more detailed analysisAttribute HierarchiesnProvide top-down data organizationnTwo purpose:nAggregationnDrill-down/roll-up data analysisnDetermine how the data are extracted and representednStored in a DBMSs data dictionarynUsed by OLAP tool to access warehouse properly.Star SchemanA star
22、 schema consists of fact tables and dimension tables.nFact tables contain the quantitative or factual data about a business-the information being queried.This information is often numerical,additive measurements and can consist of many columns and millions or billions of rows.nDimension tables are u
23、sually smaller and hold descriptive data that reflects the dimensions,or attributes,of a business.Figure 13.17 Star Schema For SalesStar Schema RepresentationnFacts and dimensions are normally represented by physical tables in the data warehouse database.nThe fact table is related to each dimension
24、table in a many-to-one(M:1)relationship.nFact and dimension tables are related by foreign keys and are subject to the primary/foreign key constraints.Figure 13.18 Orders Star SchemaStar SchemanPerformance-Improving TechniquesnNormalization of dimensional tablesnMultiple fact tables representing diff
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 人工智能与数据挖掘教学课件 人工智能 数据 挖掘 教学 课件 datawarehou
限制150内