欢迎来到淘文阁 - 分享文档赚钱的网站! | 帮助中心 好文档才是您的得力助手!
淘文阁 - 分享文档赚钱的网站
全部分类
  • 研究报告>
  • 管理文献>
  • 标准材料>
  • 技术资料>
  • 教育专区>
  • 应用文书>
  • 生活休闲>
  • 考试试题>
  • pptx模板>
  • 工商注册>
  • 期刊短文>
  • 图片设计>
  • ImageVerifierCode 换一换

    《人工智能与数据挖掘教学课件》2.datawarehou.ppt

    • 资源ID:72522488       资源大小:3.06MB        全文页数:74页
    • 资源格式: PPT        下载积分:11.9金币
    快捷下载 游客一键下载
    会员登录下载
    微信登录下载
    三方登录下载: 微信开放平台登录   QQ登录  
    二维码
    微信扫一扫登录
    下载资源需要11.9金币
    邮箱/手机:
    温馨提示:
    快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如填写123,账号就是123,密码也是123。
    支付方式: 支付宝    微信支付   
    验证码:   换一换

     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    《人工智能与数据挖掘教学课件》2.datawarehou.ppt

    Data WarehouseWhy Data warehousenThe most common issue companies face when looking at data mining is that the information is not in one place.nThe biggest challenge business analysts face in using data mining is how to extract,integrate,cleanse,and prepare data to solve their most pressing business problems.What is Data WarehousenThe idea of a data warehouse is to put a wide range of operational data from internal and external sources into one place so it can be better utilized by executives,line of business managers and other business analysts.nOnce the information is gathered,OLAP(on-line analytical processing)software comes into play by providing the desktop analysis tools for querying,manipulating and reporting the data from the data warehouse.Data Warehouse environment nthe source systems from which data is extracted nthe tools used to extract data for loading the data warehouse nthe data warehouse database itself where the data is stored nthe desktop query and reporting tools used for decision support Data Warehousing Process Overview Operational Vs.Multidimensional View Of SalesCreating A Data WarehouseThe Data WarehousenThe Data Warehouse is an integrated,subject-oriented,time-variant,non-volatile database that provides support for decision making.The Data WarehousenIntegratednThe Data Warehouse is a centralized,consolidated database that integrates data retrieved from the entire organization.nSubject-Oriented nThe Data Warehouse data is arranged and optimized to provide answers to questions coming from diverse functional areas within a company.The Data WarehousenTime Variant nThe Warehouse data represent the flow of data through time.It can even contain projected data.nNon-Volatile nOnce data enter the Data Warehouse,they are never removed.nThe Data Warehouse is always growing.Operational Database vs.Data warehouse Operational DBnSimilar data can have different representations or meaningsnFunctional or process orientationnCurrent transactionnFrequent updating Data WarehousenUnified view of all data elementsnSubject orientation for decision supportnHistorical information with time dimensionnData are added without changeData MartnA data mart is a small,single-subject data warehouse subset that provides decision support to a small group of people.Data MartnData Marts can serve as a test vehicle for companies exploring the potential benefits of Data Warehouses.nData Marts address local or departmental problems,while a Data Warehouse involves a company-wide effort to support decision making at all levels in the organization.Enterprise Data Warehouse(EDW)nA large scare data warehouse that is used across the enterprise for decision supportnEDW are used to provide data for many types of DSS,including CRM,SCM,BPM,BAM,PLM,and KMS.nBPM:Business performance managementnBAM:Business activity monitoringnPLM:product lifecycle managementnKMS:Knowledge management systemsMetadatanMetadata is the data about data.nIn a data warehouse,metadata describe the contents of a data warehouse and the manner of its usenGood metadata is essential to the effective operation of a data warehouse and it is used in data acquisition/collection,data transformation,and data access.The needs for Technical metadatanThe use of data warehousing and decision processing often involves a wide range of different products,and creating and maintaining the meta data for these products is time-consuming and error prone.nAutomating the meta data management process and enabling the sharing of this so-called technical meta data between products can reduce both costs and errors.The Needs for Business metadatanBusiness users need to have a good understanding of what information exists in a data warehouse.They need to understand what the information means from a business viewpoint,how it was derived,from what source systems it comes,when it was created,what pre-built reports and analyses exist for manipulating the information,and so forth.metadata in a data warehousenKimball lists the following types of metadata in a data warehouse:nSource system metadatanData staging metadatanDBMS metadatanRalph Kimball,The Data Warehouse Lifecycle Toolkit,Wiley,1998,ISBN 0-471-25547-5source system metadata nsource specifications,such as repositories,and source logical schemas nsource descriptive information,such as ownership descriptions,update frequencies and access methods nprocess information,such as job schedules and extraction code data staging metadata ndata acquisition information,such as data transmission scheduling and results,and file usage ndimension table management,such as definitions of dimensions,and surrogate key assignments ntransformation and aggregation,such as data enhancement and mapping,DBMS load scripts,and aggregate definitions naudit,job logs and documentation,such as data lineage records,data transform logs Star SchemanThe star schema is a data modeling technique used to map multidimensional decision support into a relational database.nStar schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structure of the operational database.Star SchemanFour Components:nFactsnDimensionsnAttributesnAttribute hierarchiesFigure 13.14 A Three-Dimensional View of Sales Figure 13.17 Attribute Hierarchies in Multidimensional Analysis FactsnNumeric measurements that represent specific business aspect or activitynNormally stored in fact table that is center of star schemanFact table contains facts linked through their dimensionsnMetrics are facts computed at run timeDimensionsnQualifying characteristics provide additional perspectives to a given factnDecision support data almost always viewed in relation to other datanStudy facts via dimensionsnDimensions stored in dimension tablesAttributesnDimensions provide descriptions of facts through their attributesnNo mathematical limit to the number of dimensionsnUse to search,filter,and classify factsnSlice and dice:focus on slices of the data cub for more detailed analysisAttribute HierarchiesnProvide top-down data organizationnTwo purpose:nAggregationnDrill-down/roll-up data analysisnDetermine how the data are extracted and representednStored in a DBMSs data dictionarynUsed by OLAP tool to access warehouse properly.Star SchemanA star schema consists of fact tables and dimension tables.nFact tables contain the quantitative or factual data about a business-the information being queried.This information is often numerical,additive measurements and can consist of many columns and millions or billions of rows.nDimension tables are usually smaller and hold descriptive data that reflects the dimensions,or attributes,of a business.Figure 13.17 Star Schema For SalesStar Schema RepresentationnFacts and dimensions are normally represented by physical tables in the data warehouse database.nThe fact table is related to each dimension table in a many-to-one(M:1)relationship.nFact and dimension tables are related by foreign keys and are subject to the primary/foreign key constraints.Figure 13.18 Orders Star SchemaStar SchemanPerformance-Improving TechniquesnNormalization of dimensional tablesnMultiple fact tables representing different aggregation levelsnDenormalization of fact tablesnTable partitioning and replicationFigure 13.19 Normalized Dimension TablesMultiple Fact TablesPracticenHow to design a star schema for an auto insurance company to do risk analysis?nWhat is the Objective?nWhat are the Facts?nWhat are the Dimensions?nWhat are the Attributes?nWhat are the Attribute hierarchy?Auto insurance DW star schemaData Warehouse Design nGrain A definition of the highest level of detail that is supported in a data warehouse nDrill-downThe process of probing beyond a summarized value to investigate each of the detail transactions that comprise the summary Data Warehouse ImplementationnThe Data Warehouse as an Active Decision Support NetworknA Company-Wide Effort that Requires User Involvement and Commitment at All LevelsnSatisfy the Trilogy:Data,Analysis,and UsersnApply Database Design ProceduresData Warehouse Implementation nImplementing a data warehouse is generally a massive effort that must be planned and executed according to established methodsnThere are many facets to the project lifecycle,and no single person can be an expert in each area Data Warehouse Implementation Road MapData Integration and the Extraction,Transformation,and Load(ETL)ProcessnData integration comprises three major processes:ndata access(the ability to access and extract data from any data source)ndata federation(the integration of business views across multiple data stores),and nchange capture(the identification,capture,and delivery of the changes made to enterprise data sources).Data Integration and the Extraction,Transformation,and Load(ETL)ProcessnExtraction,transformation,and load(ETL)nExtraction-reading data from a databasenTransformation-converting the extracted data from its previous form into the form that can be placed into a data warehouse nLoad-putting the data into the data warehouseData Integration and the Extraction,Transformation,and Load(ETL)ProcessData CleansenData cleansing or data scrubbing is the act of detecting and correcting(or removing)corrupt or inaccurate records from a record set,table,or database.nUsed mainly in databases,the term refers to identifying incomplete,incorrect,inaccurate,irrelevant etc.parts of the data and then replacing,modifying or deleting this dirty data.ETL toolsnA good ETL tool must be able to communicate with the many different relational databases and read the various file formats used throughout an organization.nETL tools have started to migrate into Enterprise Application Integration,or even Enterprise Service Bus,systems that now cover much more than just the extraction,transformation and loading of data.Many ETL vendors now have data profiling,data quality and metadata capabilities.On-Line Analytical ProcessingnOn-Line Analytical Processing(OLAP)is an advanced data analysis environment that supports decision making,business modeling,and operations research activities.nFour Main Characteristics of OLAPnUse multidimensional data analysis techniques.nProvide advanced database support.nProvide easy-to-use end user interfaces.nSupport client/server architecture.On-Line Analytical ProcessingnAdditional Functions of Multidimensional Data Analysis TechniquesnAdvanced data presentation functionsnAdvanced data aggregation,consolidation,and classification functionsnAdvanced computational functionsnAdvanced data modeling functionsIntegration Of OLAP With A Spreadsheet ProgramFigure 13.7 OLAP Server ArrangementSAPs Business Information Warehouse:an Enterprise-Wide Information HubnAn end-to-end enterprise-wide information hub to support planning and decision-making.nA central data repository of SAP,non-SAP,current,and historical business transactions and meta data.nTimely information to all levels and roles,from analyst to executive.nYears of SAP financial,logistic,and human resource information systems experience wedded with modern data warehouse methodologies.SAP AG 1999 /2BW Architecture detailsR/3 OLTP ApplicationsR/3 OLTP ApplicationsOLTPReportingOLTPReportingProduction DataExtractorProduction DataExtractorBusiness InformationWarehouse ServerStagingStagingBAPIBAPIBusiness ExplorerAnalyzer(hosted by MS Excel)Analyzer(hosted by MS Excel)BrowserBrowserNon R/3 Production DataExtractorNon R/3 Production DataExtractorNon R/3 OLTP ApplicationsNon R/3 OLTP Applications3rd party OLAP client3rd party OLAP clientData ManagerData ManagerOperationalData Store3rd party OLAP client3rd party OLAP client3rd party OLAP clients3rd party OLAP clientsMeta Data ManagerMeta Data ManagerStaging EngineStaging EngineAdministratorWorkbenchAdministrationAdministrationSchedulingSchedulingMonitorMonitorOLAP ProcessorOLAP ProcessorMeta DataRepositoryMeta DataRepositoryInfoCubesOLE-DB for OLAP ProviderOLE-DB for OLAP ProviderODBOODBOBAPIBAPIData ProviderServerData ProviderServerRemoteCubeRemoteCubeBAPIBAPIStagingStagingBAPIBAPIPSAA Sample Of Current Data Warehousing And Data Mining VendorsTable 13.10Success Stories at PepsinUsing the data warehouse,weve been able to identify important items,find national suppliers for them,and leverage those relationships to reduce costs.“nThanks to the warehouse,Pepsi can monitor purchasing compliance at the user level,an ability that has boosted price and product compliance well over 90 percent.nThe warehouse also helps ensure 100 percent sales tax compliance,says Bridgman.nSince going online in 1995,the warehouse has helped generate procurement savings in excess of$100 million.Levels of DW Support for Enterprise Decision MakingThe need for real-time datanA business often cannot afford to wait a whole day for its operational data to load into the data warehouse for analysisnProvides incremental real-time data showing every state change and almost analogous patterns over timenMaintaining metadata in sync is possiblenLess costly to develop,maintain,and secure one huge data warehouse so that data are centralized for BI/BA toolsnAn EAI with real-time data collection can reduce or eliminate the nightly batch processes Real-Time/Active Data Warehouse(RDW/ADW)nLoading and and providing data via the data warehouse as they become available.nExpand traditional data warehouse functions into the realm of tactical decision makingnEmpower decision making when interact directly with customers and suppliers.Real-Time Data WarehousingData Warehouse AdministrationnDue to its huge size and its intrinsic nature,a data warehouse requires especially strong monitoring in order to sustain satisfactory efficiency and productivitynA new job title:Data Warehouse Administrator Data warehouse administration functionsnData Warehouse Administration involves the overall management of the a data warehouse.Administration tasks include archiving,consistency checks,developing/maintaining indexing and retrieval functionality,tracking data changes,migration,monitoring,performance issues,replication issues,data quality,and sizing/space management.All data warehouses should also have a backup and recovery plan in place so that data can be recovered after an emergency.Security and Privacy IssuesnPrivate intelligence-gathering gives some people the creepsnTargeted marketing efforts are intrusive and annoyingnThe collection,manipulation,and combination of lists of personal information amount to an ominous invasion of privacyData Warehouse Security Issues nEffective security in a data warehouse should focus on four main areas:nEstablishing effective corporate and security policies and proceduresnImplementing logical security procedures and techniques to restrict accessnLimiting physical access to the data center environmentnEstablishing an effective internal control review process with an emphasis on security and privacy

    注意事项

    本文(《人工智能与数据挖掘教学课件》2.datawarehou.ppt)为本站会员(wuy****n92)主动上传,淘文阁 - 分享文档赚钱的网站仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知淘文阁 - 分享文档赚钱的网站(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    关于淘文阁 - 版权申诉 - 用户使用规则 - 积分规则 - 联系我们

    本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

    工信部备案号:黑ICP备15003705号 © 2020-2023 www.taowenge.com 淘文阁 

    收起
    展开