数据仓库及应用-数据仓库.ppt
数据仓库数据仓库杭诚方杭诚方 教授教授What is Data Warehouse Data warehouse is a very large database that stores integrated data of one or more business subject areas Data warehouse is built to support data analysis for decision making Integrated customer data warehouse is a necessary step to the success of the business intelligence strategy Data warehouses are also used for many other purposes such as product manufacturing data warehouses Data warehouses become the focal point in the enterprise-wide IT infrastructureWhat is Data Warehouse?What is Data WarehouseWhat is Data WarehouseA data warehouse is simply a single,complete,and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use in a business context.What is Data WarehouseWhat is Data WarehouseData Warehouse DefinitionsData Warehouse DefinitionsThe key elements in the definitions:Subject-Oriented:Presentation as business subjects,not as computer files.Integrated:A single source of information for and about the business.Non-Volatile:Stable information that doesnt change each time an operational process is executed.Time-Variant:Containing a history of the business,as well as current business information.Accessible:The primary purpose of a data warehouse is to provide readily accessible information to business people.Subject-OrientedSubject-OrientedIntegratedIntegratedIntegratedIntegrated网管系统财务管理系统市场分析决策系统Telcom DWSystem分析型大客户CRM系统信用度管理客服系统客服系统(Call Center 1000/180/112/170)系统)系统其它电信专业网业务系统“九七工程九七工程”之营业管之营业管理生产调度系统理生产调度系统(配线(配线/配号配号/开通)开通)112故障查修故障查修系统资源管理系统系统资源管理系统170/催催缴缴系系统统营营业系统业系统本地公司/IC卡/磁卡管理系统网上营业系统商务管理层省级计费结算计费帐务系统中心业务管理层网络及网元管理层电信网网元Non-VolatileNon-VolatileTime VariantTime VariantAccessibleAccessibleData Warehouse CharacteristicsData Warehouse CharacteristicsData warehouse separates functions from operational systems.PropertyOperationalData WarehouseResponse TimeSub-second to secondsSeconds to hoursData OperationDMLPrimarily read onlyNature of Data30-60 daysSnapshots over timeData OrganizationApplicationSubject,TimeSizeSmall to largeLarge to very largeData SourcesOperational,InternalOperational,Internal,ExternalActivitiesProcessesAnalysisData Warehouse CharacteristicsData Warehouse CharacteristicsData warehouse serves as a central repository for recording everything about the business for information retrieval.Data is loaded from internal business operational system,and external systems.Data Warehouse CharacteristicsData Warehouse CharacteristicsA data warehouse has a fundamental effect on how the users see the data available about the organization,what to do with it and how to use it for decision making.Data Warehouse CharacteristicsData Warehouse CharacteristicsData Warehouse CharacteristicsData Warehouse CharacteristicsA data warehouse is not a single software or hardware product you purchase to strategic.It is a computing environment where users can find strategic information to make better decisions.It is a user-centric environment.Data Warehouse CharacteristicsData Warehouse CharacteristicsData warehouse is a blend of many different technologies needed for supporting the various functions of a data warehouse environment.These different technologies all work together in a data warehouse environment.ApplicationAdministrationStorage ManagementAnalysisData ManagementData ModelingData AcquisitionData WarehouseEnterprise Data WarehouseEnterprise Data WarehouseEnterprise data warehouses are funded on a corporate basis.Enterprise data warehouse covers the entire business(corporation),incorporating data from all operational systems.Information is extracted from the operational environment,cleansed,and transformed into a central,integrated enterprise-wide data warehouse environment,so that all the departments and other internal organizations of the corporation can benefit from a consistent,integrated source of decision support information.Data MartData MartData marts are often funded on a departmental basis.Data mart is a collection of data tailored to the DSS processing needs of a particular department.It is a subset of a enterprise data warehouse that has been customized to fit the needs of a department.Data marts serve users at a specific level,or for a specific department.Data Warehouse versus Data MartData Warehouse versus Data Mart PropertyData WarehouseData MartScopeEnterpriseDepartmentSubjectsMultipleSingle-subjectData SourceManyFewSize(Typical)TB TBImplement Time Months to yearsMonthsData MartData MartData MartData MartControl:A department can completely control the data and processing that occurs inside a data mart.Cost:The cost of storage and processing is less,because the data marts machine is smaller than DWsCustomization:The data marts data is customized to suit the peculiar needs of the department.Data MartData MartData MartData MartData MartData MartDependent Data Mart:The source is the data warehouse.The extraction,transformation,and loading process is easy.The data mart is part of the enterprise plan.Independent Data Mart:The source are operational system external source.The extraction,transformation,and loading process is difficult.The data mart is built to satisfy analytical needs.Operational Data Store(ODS)Operational Data Store(ODS)Integrate information from the production system.Relieve the production systems reporting and analysis demands.Provide access to current data.ODSODSODSODS ODS looks very much like a data warehouse,such as subject-oriented,and integration.However,the remaining characteristics of an ODS are quite different from a data warehouse:Volatile:An ODS can be updated as a normal part of processing.Current-Values:An ODS typically contains daily,weekly,or even monthly data,but the data ages very quickly.Detailed Data:An ODS contains detailed data only.Different Classes of the ODSDifferent Classes of the ODSClass I:A synchronous interface in which a very,very small amount of time lapses between an applications transaction and the reflection of the transaction in the ODS.Class II:If an hour or two passes from the time a transaction is created and interacted in the application environment until that transaction is reflected in the ODS.Class III:There may be a time lag between 12 hours and a day as transaction data is collected in the I&T interface.Class IV:The data is fed into the ODS directly from the data warehouse.Determining the ClassDetermining the ClassSpeed of movement of data into the ODSVolume of data that must be movedVolume of data that must be stored in intermediate location during I&T processingUpdate of data and integrity of transaction processingThe time of day the movement needs to occurData ArchitectureData WarehouseOperational Data Store ODSOperational Data Store ODSLegacy System Legacy System Legacy System Legacy System Call CenterWebEmailATMSFASupport Operational CRMSupport Analytical CRMExample:The Content of a Customer ODSIdentificationNameAddressPhoneE-mailPreferencesOpt in/outMediumData sharingTransactionsPurchasesCancellationsReturnsHH/Company AffiliationEventsComplaintsPre-approvalsInquiriesSales callsCustomer ODSCorporate HierarchyHousehold link数据仓库系统的体系结构数据仓库系统的体系结构两层架构两层架构(Generic Two-Level Architecture)独立型数据集市独立型数据集市(Independent Data Mart)依赖型数据集市和操作型数据存储依赖型数据集市和操作型数据存储(Dependent Data Mart and Operational Data Store)两层数据仓库体系结构两层数据仓库体系结构 基于独立数据集市的数据仓库体系结构基于独立数据集市的数据仓库体系结构 基于依赖型数据集市和操作型数据存储基于依赖型数据集市和操作型数据存储(ODS)的数据仓库体系结构的数据仓库体系结构