第 设计立方体计算.pptx
《第 设计立方体计算.pptx》由会员分享,可在线阅读,更多相关《第 设计立方体计算.pptx(88页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、2023/2/171第四章第四章:数据立方体计算与数据泛化数据立方体计算与数据泛化数据立方体计算的有效方法数据立方体和OLAP技术的进一步发展面向属性的规约-另一种数据泛化和概念描述方法第2页/共88页第1页/共88页2023/2/172数据立方体计算的有效方法数据立方体计算的有效方法Preliminary cube computation tricks(Agarwal et al.96)Computing full/iceberg cubes:3 methodologies Top-Down:多路数组聚集(Zhao,Deshpande&Naughton,SIGMOD97)Bottom-Up:
2、Bottom-up computation:BUC(Beyer&Ramarkrishnan,SIGMOD99)H-cubing technique(Han,Pei,Dong&Wang:SIGMOD01)Integrating Top-Down and Bottom-Up:Star-cubing algorithm(Xin,Han,Li&Wah:VLDB03)High-dimensional OLAP:A Minimal Cubing Approach(Li,et al.VLDB04)Computing alternative kinds of cubes:Partial cube,closed
3、 cube,approximate cube,etc.第3页/共88页第2页/共88页2023/2/173Preliminary Tricks(Agarwal et al.VLDB96)排序、散列和分组 operations are applied to the dimension attributes in order to reorder and cluster related tuplesAggregates may be computed from previously computed aggregates,rather than from the base fact tableSm
4、allest-child:computing a cuboid from the smallest,previously computed cuboidCache-results:caching results of a cuboid from which other cuboids are computed to reduce disk I/OsAmortize-scans:computing as many as possible cuboids at the same time to amortize disk readsShare-sorts:sharing sorting costs
5、 cross multiple cuboids when sort-based method is usedShare-partitions:sharing the partitioning cost across multiple cuboids when hash-based algorithms are used第4页/共88页第3页/共88页2023/2/174Multi-Way Array AggregationMulti-Way Array AggregationArray-based“bottom-up”algorithmUsing multi-dimensional chunk
6、sNo direct tuple comparisonsSimultaneous aggregation on multiple dimensionsIntermediate aggregate values are re-used for computing ancestor cuboidsCannot do Apriori pruning:No iceberg optimization第5页/共88页第4页/共88页2023/2/175Multi-way Array Aggregation for Cube Computation(MOLAP)Partition arrays into c
7、hunks(a small subcube which fits in memory).Compressed sparse array addressing:(chunk_id,offset)Compute aggregates in“multiway”by visiting cube cells in the order which minimizes the#of times to visit each cell,and reduces memory access and storage cost.What is the best traversing order to do multi-
8、way aggregation?AB29303132123459131415166463626148474645a1a0c3c2c1c 0b3b2b1b0a2a3CB442856402452362060第6页/共88页第5页/共88页2023/2/176Multi-way Array Aggregation for Cube ComputationAB29303132123459131415166463626148474645a1a0c3c2c1c 0b3b2b1b0a2a3C442856402452362060B第7页/共88页第6页/共88页2023/2/177Multi-way Arra
9、y Aggregation for Cube ComputationAB29303132123459131415166463626148474645a1a0c3c2c1c 0b3b2b1b0a2a3C442856402452362060B第8页/共88页第7页/共88页2023/2/178Multi-Way Array Aggregation for Cube Computation(Cont.)Method:the planes should be sorted and computed according to their size in ascending orderIdea:keep
10、the smallest plane in the main memory,fetch and compute only one chunk at a time for the largest planeLimitation of the method:computing well only for a small number of dimensionsIf there are a large number of dimensions,“top-down”computation and iceberg cube computation methods can be explored第9页/共
11、88页第8页/共88页2023/2/179Bottom-Up Computation(BUC)BUC(Beyer&Ramakrishnan,SIGMOD99)Bottom-up cube computation(Note:top-down in our view!)Divides dimensions into partitions and facilitates iceberg pruningIf a partition does not satisfy min_sup,its descendants can be prunedIf minsup=1 compute full CUBE!No
12、 simultaneous aggregation第10页/共88页第9页/共88页2023/2/1710BUC:PartitioningUsually,entire data set cant fit in main memorySort distinct values,partition into blocks that fitContinue processingOptimizationsPartitioningExternal Sorting,Hashing,Counting SortOrdering dimensions to encourage pruningCardinality
13、,Skew,CorrelationCollapsing duplicatesCant do holistic aggregates anymore!第11页/共88页第10页/共88页2023/2/1711H-Cubing:Using H-Tree StructureH-Cubing:Using H-Tree StructureBottom-up computationExploring an H-tree structureIf the current computation of an H-tree cannot pass min_sup,do not proceed further(pr
14、uning)No simultaneous aggregation第12页/共88页第11页/共88页2023/2/1712H-tree:A Prefix Hyper-treeMonthCityCust_grpProdCostPriceJanTorEduPrinter500485JanTorHhdTV8001200JanTorEduCamera11601280FebMonBusLaptop15002500MarVanEduHD540520rooteduhhdbusJanMarJanFebTorVanTorMonQ.I.Q.I.Q.I.Quant-InfoSum:1765Cnt:2binsAtt
15、r.Val.Quant-InfoSide-linkEduSum:2285 HhdBusJanFebTorVanMonHeadertable第13页/共88页第12页/共88页2023/2/1713H-Cubing:Computing Cells Involving Dimension CityrootEdu.Hhd.Bus.Jan.Mar.Jan.Feb.Tor.Van.Tor.Mon.Q.I.Q.I.Q.I.Quant-InfoSum:1765Cnt:2binsAttr.Val.Quant-InfoSide-linkEduSum:2285 HhdBusJanFebTorTorVanMonAt
16、tr.Val.Q.I.Side-linkEduHhdBusJanFebHeaderTableHTorFrom(*,*,Tor)to(*,Jan,Tor)第14页/共88页第13页/共88页2023/2/1714Computing Cells Involving Month But No CityrootEdu.Hhd.Bus.Jan.Mar.Jan.Feb.Tor.Van.Tor.Mont.Q.I.Q.I.Q.I.Attr.Val.Quant-InfoSide-linkEdu.Sum:2285 Hhd.Bus.Jan.Feb.Mar.Tor.Van.Mont.1.Roll up quant-i
17、nfo2.Compute cells involving month but no cityQ.I.Top-k OK mark:if Q.I.in a child passes top-k avg threshold,so does its parents.No binning is needed!第15页/共88页第14页/共88页2023/2/1715Computing Cells Involving Only Cust_grprooteduhhdbusJanMarJanFebTorVanTorMonQ.I.Q.I.Q.I.Attr.Val.Quant-InfoSide-linkEduSu
18、m:2285 HhdBusJanFebMarTorVanMonCheck header table directlyQ.I.第16页/共88页第15页/共88页2023/2/1716Star-Cubing:An Integrating MethodStar-Cubing:An Integrating MethodIntegrate the top-down and bottom-up methodsExplore shared dimensionsE.g.,dimension A is the shared dimension of ACD and ADABD/AB means cuboid
19、ABD has shared dimensions ABAllows for shared computationse.g.,cuboid AB is computed simultaneously as ABDAggregate in a top-down manner but with the bottom-up sub-layer underneath which will allow Apriori pruningShared dimensions grow in bottom-up fashion第17页/共88页第16页/共88页2023/2/1717Iceberg Pruning
20、 in Shared DimensionsAnti-monotonic property of shared dimensionsIf the measure is anti-monotonic,and if the aggregate value on a shared dimension does not satisfy the iceberg condition,then all the cells extended from this shared dimension cannot satisfy the condition eitherIntuition:if we can comp
21、ute the shared dimensions before the actual cuboid,we can use them to do Apriori pruningProblem:how to prune while still aggregate simultaneously on multiple dimensions?第18页/共88页第17页/共88页2023/2/1718Cell TreesCell TreesUse a tree structure similar to H-tree to represent cuboidsCollapses common prefix
22、es to save memoryKeep count at nodeTraverse the tree to retrieve a particular tuple第19页/共88页第18页/共88页2023/2/1719Star Attributes and Star NodesStar Attributes and Star NodesIntuition:If a single-dimensional aggregate on an attribute value p does not satisfy the iceberg condition,it is useless to dist
23、inguish them during the iceberg computationE.g.,b2,b3,b4,c1,c2,c4,d1,d2,d3 Solution:Replace such attributes by a*.Such attributes are star attributes,and the corresponding nodes in the cell tree are star nodesABCDCounta1b1c1d11a1b1c4d31a1b2c2d2 1a2b3c3d41a2b4c3d41第20页/共88页第19页/共88页2023/2/1720Example
24、:Star ReductionExample:Star ReductionSuppose minsup=2Perform one-dimensional aggregation.Replace attribute values whose count 2 with*.And collapse all*s togetherResulting table has all such attributes replaced with the star-attributeWith regards to the iceberg computation,this new table is a loseles
25、s compression of the original tableABCDCounta1b1*2a1*1a2*c3d42ABCDCounta1b1*1a1b1*1a1*1a2*c3d41a2*c3d41第21页/共88页第20页/共88页2023/2/1721Star TreeStar TreeGiven the new compressed table,it is possible to construct the corresponding cell treecalled star treeKeep a star table at the side for easy lookup of
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 设计立方体计算 设计 立方体 计算
限制150内