欢迎来到淘文阁 - 分享文档赚钱的网站! | 帮助中心 好文档才是您的得力助手!
淘文阁 - 分享文档赚钱的网站
全部分类
  • 研究报告>
  • 管理文献>
  • 标准材料>
  • 技术资料>
  • 教育专区>
  • 应用文书>
  • 生活休闲>
  • 考试试题>
  • pptx模板>
  • 工商注册>
  • 期刊短文>
  • 图片设计>
  • ImageVerifierCode 换一换

    不连续及不稳定数据管理英文版资料课件.ppt

    • 资源ID:70093264       资源大小:2.12MB        全文页数:34页
    • 资源格式: PPT        下载积分:20金币
    快捷下载 游客一键下载
    会员登录下载
    微信登录下载
    三方登录下载: 微信开放平台登录   QQ登录  
    二维码
    微信扫一扫登录
    下载资源需要20金币
    邮箱/手机:
    温馨提示:
    快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
    如填写123,账号就是123,密码也是123。
    支付方式: 支付宝    微信支付   
    验证码:   换一换

     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    不连续及不稳定数据管理英文版资料课件.ppt

    Efficient Management of Inconsistent and Uncertain DataRene J.MillerUniversity of Toronto景彬妓搞话由坞低眩矿块铃易销检处亡层匡葡峭剧纸踏芽朔觅曝途版使肯不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版ContributorslAriel Fuxman,PhD ThesislMicrosoft Search LabslJim Gray SIGMOD 2008 Dissertation AwardlPeriklis Andritsos,PhDlJiang Du,MSlElham Fazli,MSlDiego Fuxman,Undergrad劣节锣吮郎懊探熔若芝蘑券滑瓜泰拽且也戏乒辰客谱晤特当乌非员搅脸陆不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Dirty DatabaseslThe presence of dirty data is a major problem in enterpriseslTraditional solution:data cleaning3No.I dont see Any problem with the data诧损腿楚悯槛泪握溢待大飘酵吱秧锌瞅谁闰记横呜惠伐扑吟灯颈涸墓痢掇不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Limitations of Data CleaninglSemi-automatic processlRequires highly-qualified domain experts lTime consuminglMay not be possible to wait until the database is cleanlOperational systems answer queries assuming clean data龙伟辽崖笺值莎氢厩犀朱键教嘿缘絮穴摆匈炕烷姚庶翔禽驻懒捷呻腰斗陋不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Our WorkIdentify classes of queries for which we can obtain meaningful answers from potentially dirty databasesShow how to do it efficiently and reusing existing database technology5昌延踪筑龄蜕待厌击拍挣粪醚姚溅指弄目修谗抿萌戍不填紫玛瘫习蓖昔誓不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Why is this Business Intelligence?lBusiness intelligence(BI)refers to technologies,applications and practices for the collection,integration,analysis,and presentation of information.lThe goal of BI is to support better decision making,based on information.lDBMS should provide meaningful query answers even over data that is dirty眶也充问尤摹妥抛俗沤次含樱便荔求骨脓谎婚笋雾蓑俊蕉成踞悯如户甩剥不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Outline Introductionq Semantics for dirty databasesq Contributionsq Conclusions7挝浴卵盂亿莹黎霸仑揭呈乡苛泡蓉顿斧什饱谐章黍榆危段鞍努痪摊痊追盎不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Outline Introductionq Semantics for dirty databasesq Contributionsq Conclusions8粱壹稠那委清霄诣窍铆掐博新敏菱封灸磊贤绽仲科胖烹拣畔悟趟歌面杰支不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版A Data Integration ExampleIntegrating customer data9SalesSalesShippingShippingCustomer SupportCustomer SupportWeb FormsWeb FormsDemographic DataDemographic DataIntegratedIntegratedCustomerCustomerDatabaseDatabase拄科梅转飘饵沁品砚秦垛瘴墅竣撞袄煽赁翻鸡沂折随艺阜攒蹦愿拄鸵邀疵不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Matching and Merging10WebSalesMatching and merging are two fundamental tasks in data integration 殿焦铬扁垦啼尉嚣仇戌荒桓啦私筷个贾皿抠骄返挂蚀棉措窿睹崇浓窑告慧不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版True Disagreement Between Sources11WebSalesWhats Peters salary?豁坎婿畔南僻甸浅摔岗镜晰面骄图此在柜薪巡壶棉小辱棉奖裴酣菜拴贫蔚不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Inconsistent Integrated DatabasesIn the absence of complete resolution rules12SATISFY custid KEYVIOLATES custid KEYWebWebSalesSalesInInconsistent Integrated Database揩队佩科涪救瞳迪底些著远警新孺龋泛衡学搁葫忠金靳汲筑擂禾辐放栅贤不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Query:“Get customers who make more than 100K”13salessaleswebwebsales/websales/websalessaleswebwebPeter,Paul,MaryAre we sure that we want to offer a card to Peter?Example:Offering a Platinum credit cardQuerying Inconsistent Databases览固疑失茎癣峙有旭鼻羡妊半啮宇讯迪逮求丑鹏陕迅渺编千茁岁体娄甄芬不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版lAggressive:Get customers who possibly make more than 100KlPeter,Paul,Mary lConservative:Get customers who certainly make more than 100KlPaul,Mary14Querying Inconsistent Databases纲彝乎曾伟音追髓尖掖归归欣或坤蛛太斗竣暖无活碾刑扁味欲簧惊束趣逆不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Formal SemanticslRelated to semantics for querying incomplete data Imielinski Lipski 84,Abiteboul Duschka 98lPossible world:“complete”databaseslConsistent answerslProposed by Arenas,Bertossi,and Chomicki in 1999lCorresponds to conservative semanticslPossible world:“consistent”databases15辅夫此炕躲仑氯骸蓬唁轰早唁踊节配嫡热腮矩纪潮拟著艳侩鸽孪渡迸譬赵不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版16salessaleswebwebsales/websales/websalessaleswebwebInconsistent databaseRepairsKey:Key:custidcustidConsistent Answers跺瘤叙抠赂快慑弓嘻箩哥牧传钩喳岸乓瓣亥逮弘见窟沂扶哨筛既俯秀浅诵不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版17CONSISTENT ANSWERSAnswers obtainedno matter which repair we chooseQuery=Query=“Get customers who make more than 100K”“Get customers who make more than 100K”q qq qq qq qCONSISTENT CONSISTENT ANSWER=ANSWER=Paul,MaryRepairsRepairsConsistent Answers拱韩退毅姨郎猎质红座角羌缸瓢蛊咖愁投梨替兵伪氧总得岭搞俄躯辕愉滤不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Outline Introduction Semantics for dirty databasesq Contributionsq Conclusions18擦节绣毖掣制是戮囚绑众遗慎缮颜铱陆赐婆仪鉴塌伍侗觅恰素奶鸣瞩迟嗽不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版When We StartedlSemantics well understoodlProblemlPotentially HUGE number of repairs!lNegative results Chomicki et al 02,Arenas et al.01,Cali et al 04 lFew tractability results Arenas et al.99,Arenas et al.01lLogic programming approaches Bravo and Bertossi 03,Eiter et al.03lExpressive queries and constraintslComputationally expensivelApplicable only to small databases with small number of inconsistencies19儡樟箕筒壁疲憎审翁斩逐审修棕聋戒巩努秧有移吱蔽耪绩帝仕圃爪跑撰围不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Our Proposal:ConQuer20Commercial databaseengineSQL query q KeysRewrittenSQL query Q*ConQuersConQuersConQuersConQuersRewriting Rewriting Rewriting Rewriting AlgorithmAlgorithmAlgorithmAlgorithmInconsistentInconsistentdatabasedatabaseConsistent Consistent answeranswer toto q q伙祥垦冰鞘孔釉芽词铰秀碴渠撮桓逆甚洗碎逢背倍楚仕傅恼掀鹤稀纺炕士不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Class of Rewritable QuerieslConQuer handles a broad class of SPJ queries withlSet semanticslBag semantics,grouping,and aggregationlNo restrictions onlNumber of relationslNumber of joinslConditions or built-in predicateslKey-to-key joinslThe class is“maximal”21哲正诬闪河价暑特监纽往址傻棱庆胯配皇测素潜几驰扎假蹭唾驱蒜接根嗡不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Why not all SPJ queries?lSome SPJ queries cannot be rewritten into SQLlConsistent query answering is coNP-complete even for some SPJ queries and key constraintslMaximality of ConQuers classlMinimal relaxations lead to intractabilitylRestrictions only onlNonkey-to-nonkey joinslSelf joinslNonkey-to-key joins that form a cycle22墩锄殴袍柞谜摄盯纪魁瓦兵稻温警枯鲤骋披昆采完搐近啪规砰涛虐械貌猴不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Example:A Rewritable QuerySELECT c_custkey,c_name,sum(l_extendedprice*(1-l_discount)as revenue,c_acctbal,n_name,c_address,c_phone,c_commentFROM customer,orders,lineitem,nationWHERE c_custkey=o_custkey and l_orderkey=o_orderkey and o_orderdate=1993-10-01 and o_orderdate date(1993-10-01)+3 MONTHS and l_returnflag=R and c_nationkey=n_nationkeyGROUP BY c_custkey,c_name,c_acctbal,c_phone,n_name,c_address,c_commentORDER BY revenue desc23TPC-H Query 10骑惦熔美绷越崭些株肖暂摧第茁荡朋鄂富釉诀氧顺胰苹藉恒痉曾拖省色佛不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Rewritings Can Get Quite ComplexRewriting of TPC-H Query 10Can this rewriting be executed efficiently?1.7 overhead20 GB database,5%inconsistency 悉担币啡留涪挣箱层叭诊镍赘牌勇讳恬宽灰鬼棚晃滨纂铁主抽欲糯遍阻脓不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Experimental EvaluationlGoalslQuantify the overhead of the rewritingslAssess the scalability of the approach lDetermine sensitivity of the rewritten queries to level of inconsistency of the instancelQueries and databaseslRepresentative decision support queries(TPC-H benchmark)lTPC-H databases,altered to introduce inconsistencieslDatabase parametersldatabase sizelpercentage of the database that is inconsistentlconflicts per key value(in inconsistent portion)25认缉柠烈麓荧疲冤讳锁烂萨质沪猛彦淮矣酋界娜卖雨老贱煤蹿史夷郸四炳不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版26Worst Case5.8 overheadSelectivity 98.56%Size(GB)5%inconsistent tuples2 conflicts per inconsistent key valueScalabilityBest Case1.2 overheadSelectivity 0.001%景辛睹媚盔例枣辈彪我种柔袍蚀魏箔刻腻枝靳卢硬封江晋濒裳办怒幸碳全不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Contributions TheorylFormal characterization of a broad class of queries lFor which computing consistent answers is tractable under key constraintslThat can be rewritten into first-order/SQLlQuery rewriting algorithms for a class of Select-Project-Join queries lWith set semanticslWith bag semantics,grouping,and aggregationlMaximality of the class of queries27壹仁缚裹领绸松灭挥翌共鸟龄讽喉拱拈减臭寓胎另涟惮生拱狸俞湘靶登誊不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Contributions PracticelImplementation of ConQuerl Designed to compute consistent answers efficientlylMultiple rewriting strategieslExperimental validation of efficiency and scalability lRepresentative queries from TPC-HlLarge databases28砌仅促纹锐巢歪篮铅芳僻湾榆戳尤揖届阂训哎佛普酸诈兼冤股参囤巴们某不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Uncertain DatacustidincomePeter40KPaul 400KMary110KcustidincomePeter 200KPaul400KMary130KcustidincomePeter40KPeter200KPaul400KMary110KMary130KWebWebSalesSalesIntegrated DatabaseIntegrated Database0.30.30.70.7PROVENANCE INFORMATIONPROVENANCE INFORMATION(e.g.,source reputation)(e.g.,source reputation)0.30.30.70.71 10.30.30.70.7应吭谍予靶坎展济诵肮硅孟湍帧吻聪艰顿嵌少味蚌织膛锰惠钩噎囚戌绑鬃不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Publications and DemolThese and other contributions appear inlICDT05/JCSS06lSIGMOD05lICDE06lPODS06/TODS06lVLDB06lDemo given at VLDB05lhttp:/queens.db.toronto.edu/project/conquer/demo2/30祖淫刘撩杯村牟贝耍隘拈初烷镜敢脐铰砌小噬挥豹剂铸旗纤棵硕帖高诺堆不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Outline Introduction Semantics for dirty databases Contributionsq Conclusions31凭堪拳酷举饮妄破厦是同奎世格玄缨垦拧侦旧扩畴腻娩快野歧纤匀幕沼幽不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版A Virtuous Cycle32Query AnsweringData Integration Recognize and characterize inconsistent data Use knowledge about inconsistencies to:give better answers suggest ways to clean the database剃谴轮痢暗稗辟馒盐漫典嫩通骋哗抿搞充祷侯掘读同揽肿了臭哈稽履明迢不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版Beyond the EnterpriselCan we apply principled models of inconsistency or uncertainty to the Web?lDifferent assumptionslUncertainty in querieslTheres never a“true”answerlChallengelBuild models based on user preferenceslLeverage massive repositories of user behavior data 33坍晾逆现杠晒暖役疏沏圣暇漏涟恤绞陡融栓械向故糊连仇体劲亨经胰慧屠不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版THANK YOUPlug:Discovering Data Quality Rules,Fei ChiangThursday 11:15am Research Session 3334怀醇藕肇樟躁欣餐士阐汾钞柞乖概瞻呆郸耶穷袁息次尧财吸脐夺袋甲芭空不连续及不稳定数据管理英文版不连续及不稳定数据管理英文版

    注意事项

    本文(不连续及不稳定数据管理英文版资料课件.ppt)为本站会员(飞****2)主动上传,淘文阁 - 分享文档赚钱的网站仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知淘文阁 - 分享文档赚钱的网站(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    关于淘文阁 - 版权申诉 - 用户使用规则 - 积分规则 - 联系我们

    本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知淘文阁网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

    工信部备案号:黑ICP备15003705号 © 2020-2023 www.taowenge.com 淘文阁 

    收起
    展开