DataStage基础培训教程.ppt
DataStageDataStageDataStageDataStage基础培训基础培训基础培训基础培训 JerryJerry 2006.032006.03议程议程Hello WorldHello WorldDataStageDataStage ComponentsComponentsDefine Parameter&TableDefine Parameter&TableHash FileHash File、TransformerTransformer、AggregatorAggregatorDirector&MonitorDirector&MonitorAdministrator&ManagerAdministrator&ManagerRoutine&ControlRoutine&Control2演示:演示:Hello WorldExtractTransformLoan3Hello WorldHello WorldDataStageDataStage ComponentsComponentsDefine Parameters&TablesDefine Parameters&TablesHash FileHash File、TransformerTransformer、AggregatorAggregatorDirector&MonitorDirector&MonitorAdministrator&ManagerAdministrator&ManagerRoutine&ControlRoutine&Control议程议程4DataStage ArchitectureTarget(Database or File)ODBC/NativeDataStage Connect APIDataStage Server (WinNT,Win2000 or UNIX)ODBC/NativeData Sources(Database or File)DataStage Connect APIDataStage Connect APIDataStage Connect APIData flowData flow5DataStage ComponentsManager DesignerDirectorMetadata collection and management Design process flowRun jobs,check logs and set schedulesDataStageAdministratorCreate,Edit projects6Hello WorldHello WorldDataStageDataStage ComponentsComponentsDefine Parameter&TableDefine Parameter&TableHash FileHash File、TransformerTransformer、AggregatorAggregatorDirector&MonitorDirector&MonitorAdministrator&ManagerAdministrator&ManagerRoutine&ControlRoutine&Control议程议程7全局变量与全局变量与Job变量变量全局变量全局变量-生命周期:整个生命周期:整个ProjectProject-在在AdministratorAdministrator中定义中定义JobJob变量变量-生命周期:一生命周期:一个个JobJob-在在DesignerDesigner、ManagerManager中定中定义义8演示:定义一个演示:定义一个Job变量变量在在DesignerDesigner中定义参数中定义参数9Meta data definition元数据管理的重要组成部分元数据管理的重要组成部分在在MangerManger或或DesignDesign中中定义定义演示:演示:-import from a flat file in.txt format-import from a flat file in.txt format-import from an DBMS table-import from an DBMS table10演示演示Table Definition在在ManageManage中定义中定义TableTable11Hello WorldHello WorldDataStageDataStage ComponentsComponentsDefine Parameter&TableDefine Parameter&TableHash FileHash File、TransformerTransformer、AggregatorAggregatorDirector&MonitorDirector&MonitorAdministrator&ManagerAdministrator&ManagerRoutine&ControlRoutine&Control议程议程12演示:生成事实表演示:生成事实表明细表事实表关联聚合13Hash File用途:用途:-左连接时用作副表左连接时用作副表-多次被访问的数据集多次被访问的数据集-存储其他临时数据存储其他临时数据关键点:关键点:-必须指定必须指定keykey-output-output的的positionposition必须与必须与inputinput一致一致14Transformer用途:用途:-提供丰富的运算符和函数提供丰富的运算符和函数-数据清洗、转换数据清洗、转换-关联多个数据源关联多个数据源关键点:关键点:-副表的副表的keykey必须被主表的某个字段关联必须被主表的某个字段关联-尽量避免两个尽量避免两个TransformerTransformer直接相连直接相连15Aggregator用途:用途:-Sum,Max,Min,AverageSum,Max,Min,Average等聚合函数等聚合函数-一般用于生成事实表一般用于生成事实表16Hello WorldHello WorldDataStageDataStage ComponentsComponentsDefine Parameter&TableDefine Parameter&TableHash FileHash File、TransformerTransformer、AggregatorAggregatorDirector&MonitorDirector&MonitorAdministrator&ManagerAdministrator&ManagerRoutine&ControlRoutine&Control议程议程17Debug and TuningView Status and LogsView Status and Logs-status,log,detail-status,log,detail等等多多种视图种视图-配合配合MonitorMonitor来查错来查错、调优调优18Job StatusNot CompiledNot CompiledCompiledCompiledResetResetRunningRunningFinishedFinishedFinished(with warning)Finished(with warning)AbortAbort19ScheduleJob Job Add to ScheduleAdd to Schedule20Hello WorldHello WorldDataStageDataStage ComponentsComponentsDefine Parameter&TableDefine Parameter&TableHash FileHash File、TransformerTransformer、AggregatorAggregatorDirector&MonitorDirector&MonitorAdministrator&ManagerAdministrator&ManagerRoutine&ControlRoutine&Control议程议程21AdministratorAdd a new projectAdd a new projectModify project propertiesModify project properties-字符集字符集-日志保留天数日志保留天数-hash file and write catch-hash file and write catch Define environment viableDefine environment viable22ManagerImport and export projects or jobsImport and export projects or jobs-两种两种文件格式:文件格式:.dsx .xml.dsx .xml-整整个个project,project,根据根据categorycategoryTable definitionTable definitionManage RoutineManage Routine23演示:演示:备备份份project24Hello WorldHello WorldDataStageDataStage ComponentsComponentsDefine Parameter&TableDefine Parameter&TableHash FileHash File、TransformerTransformer、AggregatorAggregatorDirector&MonitorDirector&MonitorAdministrator&ManagerAdministrator&ManagerRoutine&ControlRoutine&Control议程议程25Routine一种自定义函数,使用一种自定义函数,使用VBVB语法语法-Transformer Routine-Transformer Routine-Before/After Subroutine-Before/After Subroutine系统内置了丰富的系统内置了丰富的RoutineRoutine演示:定义一个演示:定义一个Transformer RoutineTransformer Routine26Job Control在一个在一个JobJob中调度其他中调度其他JobJob27Q&AThanks!Thanks!28