z.cn/wangzhan/weihu/"> 网站运营维护
  • 案例

  • 方案

  • 电商网站开发

  • 微信小程序

  • 我们

  • 联系

  • 资讯

    精准传达 • 有效沟通

    从品牌网站建设到网络营销策划,从策略到执行的一站式服务

    Oracle学习之数据仓库(二)Dimension的理解

       在数据仓库中,有事实表、维度表两个概念。

    员工经过长期磨合与沉淀,具备了协作精神,得以通过团队的力量开发出优质的产品。创新互联坚持“专注、创新、易用”的产品理念,因为“专注所以专业、创新互联网站所以易用所以简单”。公司专注于为企业提供成都网站建设、成都网站设计、微信公众号开发、电商网站开发,微信小程序定制开发,软件按需网站建设等一站式互联网企业服务。

       事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。

       维度表是维度属性的集合。是分析问题的一个窗口。是人们观察数据的特定角度,是考虑问题时的一类属性,属性的集合构成一个维。

       如图示

     

    Oracle 学习之 数据仓库(二) Dimension 的理解

    我们以sh用户下的sales表和times表来看,

    SALES为事实表

    SQL> desc sales
     Name					   Null?    Type
     ----------------------------------------- -------- ----------------------------
     PROD_ID				   NOT NULL NUMBER
     CUST_ID				   NOT NULL NUMBER
     TIME_ID				   NOT NULL DATE
     CHANNEL_ID				   NOT NULL NUMBER
     PROMO_ID				   NOT NULL NUMBER
     QUANTITY_SOLD				   NOT NULL NUMBER(10,2)
     AMOUNT_SOLD				   NOT NULL NUMBER(10,2)

    TIMES为维度表

    SQL> desc times
     Name					   Null?    Type
     ----------------------------------------- -------- ----------------------------
     TIME_ID				   NOT NULL DATE
     DAY_NAME				   NOT NULL VARCHAR2(9)
     DAY_NUMBER_IN_WEEK			   NOT NULL NUMBER(1)
     DAY_NUMBER_IN_MONTH			   NOT NULL NUMBER(2)
     CALENDAR_WEEK_NUMBER			   NOT NULL NUMBER(2)
     FISCAL_WEEK_NUMBER			   NOT NULL NUMBER(2)
     WEEK_ENDING_DAY			   NOT NULL DATE
     WEEK_ENDING_DAY_ID			   NOT NULL NUMBER
     CALENDAR_MONTH_NUMBER			   NOT NULL NUMBER(2)
     FISCAL_MONTH_NUMBER			   NOT NULL NUMBER(2)
     CALENDAR_MONTH_DESC			   NOT NULL VARCHAR2(8)
     CALENDAR_MONTH_ID			   NOT NULL NUMBER
     FISCAL_MONTH_DESC			   NOT NULL VARCHAR2(8)
     FISCAL_MONTH_ID			   NOT NULL NUMBER
     DAYS_IN_CAL_MONTH			   NOT NULL NUMBER
     DAYS_IN_FIS_MONTH			   NOT NULL NUMBER
     END_OF_CAL_MONTH			   NOT NULL DATE
     END_OF_FIS_MONTH			   NOT NULL DATE
     CALENDAR_MONTH_NAME			   NOT NULL VARCHAR2(9)
     FISCAL_MONTH_NAME			   NOT NULL VARCHAR2(9)
     CALENDAR_QUARTER_DESC			   NOT NULL CHAR(7)
     CALENDAR_QUARTER_ID			   NOT NULL NUMBER
     FISCAL_QUARTER_DESC			   NOT NULL CHAR(7)
     FISCAL_QUARTER_ID			   NOT NULL NUMBER
     DAYS_IN_CAL_QUARTER			   NOT NULL NUMBER
     DAYS_IN_FIS_QUARTER			   NOT NULL NUMBER
     END_OF_CAL_QUARTER			   NOT NULL DATE
     END_OF_FIS_QUARTER			   NOT NULL DATE
     CALENDAR_QUARTER_NUMBER		   NOT NULL NUMBER(1)
     FISCAL_QUARTER_NUMBER			   NOT NULL NUMBER(1)
     CALENDAR_YEAR				   NOT NULL NUMBER(4)
     CALENDAR_YEAR_ID			   NOT NULL NUMBER
     FISCAL_YEAR				   NOT NULL NUMBER(4)
     FISCAL_YEAR_ID 			   NOT NULL NUMBER
     DAYS_IN_CAL_YEAR			   NOT NULL NUMBER
     DAYS_IN_FIS_YEAR			   NOT NULL NUMBER
     END_OF_CAL_YEAR			   NOT NULL DATE
     END_OF_FIS_YEAR			   NOT NULL DATE

    如果我们创建一个物化视图

    create materialized view sales_month_sum 
    enable query rewrite as 
      SELECT t.calendar_month_id,
             prod_id,
             channel_id,
             promo_id,
             SUM (quantity_sold) quantity_sold,
             SUM (amount_sold) amount_sold
        FROM sales s, times t
       WHERE s.time_id = t.time_id
    GROUP BY prod_id,
             channel_id,
             promo_id,
             t.calendar_month_id;

    如果我们做如下按月的分组查询

    SQL> alter session set query_rewrite_enabled=true;
    SQL> alter session set query_rewrite_integrity=trusted;
    SQL> set autotrace traceonly
    SQL> set line 200
    SQL>   SELECT t.calendar_month_id,
             prod_id,
             channel_id,
             promo_id,
             SUM (quantity_sold) quantity_sold,
             SUM (amount_sold) amount_sold
        FROM sales s, times t
       WHERE s.time_id = t.time_id
    GROUP BY prod_id,
             channel_id,
             promo_id,
             t.calendar_month_id; 
    
    9068 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3287305789
    
    ------------------------------------------------------------------------------------------------
    | Id  | Operation		     | Name	       | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT	     |		       |  9068 |   690K|    13	 (0)| 00:00:01 |
    |   1 |  MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13	 (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------------

    可见查询使用的是物化视图,但是如果我需要按年、季度对数据做分组查询呢?

      SELECT t.calendar_quarter_id,prod_id,
             channel_id,
             promo_id,
             SUM (quantity_sold) quantity_sold,
             SUM (amount_sold) amount_sold
        FROM sales s, times t
       WHERE s.time_id = t.time_id
    GROUP BY prod_id,
             channel_id,
             promo_id,
             t.calendar_quarter_id;

    这个查看肯定是不能使用物化视图的,执行计划如下

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3221963832
    
    ---------------------------------------------------------------------------------------------------------
    | Id  | Operation		      | Name	| Rows	| Bytes | Cost (%CPU)| Time	| Pstart| Pstop |
    ---------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT	      | 	|  2037 | 79443 |   569   (6)| 00:00:07 |	|	|
    |   1 |  HASH GROUP BY		      | 	|  2037 | 79443 |   569   (6)| 00:00:07 |	|	|
    |*  2 |   HASH JOIN		      | 	|   918K|    34M|   546   (2)| 00:00:07 |	|	|
    |   3 |    PART JOIN FILTER CREATE    | :BF0000 |  1826 | 21912 |    18   (0)| 00:00:01 |	|	|
    |   4 |     TABLE ACCESS FULL	      | TIMES	|  1826 | 21912 |    18   (0)| 00:00:01 |	|	|
    |   5 |    PARTITION RANGE JOIN-FILTER| 	|   918K|    23M|   525   (2)| 00:00:07 |:BF0000|:BF0000|
    |   6 |     TABLE ACCESS FULL	      | SALES	|   918K|    23M|   525   (2)| 00:00:07 |:BF0000|:BF0000|
    ---------------------------------------------------------------------------------------------------------

    Oracle为了是查询重写更加的智能,引入了Dimension的概念。Dimension我们称之为维,它是基于维度表的,用来描述维度表的维度之间的层级关系。

    CREATE DIMENSION SH.TIMES_DIM
      LEVEL DAY                            IS 
        (SH.TIMES.TIME_ID)
      LEVEL MONTH                          IS 
        (SH.TIMES.CALENDAR_MONTH_ID)
      LEVEL QUARTER                        IS 
        (SH.TIMES.CALENDAR_QUARTER_ID)
      LEVEL YEAR                           IS 
        (SH.TIMES.CALENDAR_YEAR_ID)
      HIERARCHY CAL_ROLLUP
        (DAY                               CHILD OF
         MONTH                             CHILD OF
         QUARTER                           CHILD OF
         YEAR);

    LEVEL定义等级,基于维度表,HIERARCHY关键字定义层级关系。由层级关系,我们知道quarter是由month组成的。

    我们再次查询

    SQL> SELECT t.calendar_quarter_id,prod_id,
             channel_id,
             promo_id,
             SUM (quantity_sold) quantity_sold,
             SUM (amount_sold) amount_sold
        FROM sales s, times t
       WHERE s.time_id = t.time_id
    GROUP BY prod_id,
             channel_id, 
             promo_id,
             t.calendar_quarter_id;
    
    3375 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3397140165
    
    --------------------------------------------------------------------------------------------------
    | Id  | Operation		       | Name		 | Rows  | Bytes | Cost (%CPU)| Time	 |
    --------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT	       |		 |    20 |  1720 |    36  (14)| 00:00:01 |
    |   1 |  HASH GROUP BY		       |		 |    20 |  1720 |    36  (14)| 00:00:01 |
    |*  2 |   HASH JOIN		       |		 |   128K|    10M|    33   (7)| 00:00:01 |
    |   3 |    VIEW 		       |		 |   849 |  6792 |    19   (6)| 00:00:01 |
    |   4 |     HASH UNIQUE 	       |		 |   849 |  6792 |    19   (6)| 00:00:01 |
    |   5 |      TABLE ACCESS FULL	       | TIMES		 |  1826 | 14608 |    18   (0)| 00:00:01 |
    |   6 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 |
    --------------------------------------------------------------------------------------------------

    这次是使用物化视图与times表做关联,性能更高了。

    我们对比如下两个查询

    SQL>   SELECT t.calendar_quarter_id,
             prod_id,
             channel_id,
             promo_id,
             SUM (quantity_sold) quantity_sold,
             SUM (amount_sold) amount_sold
        FROM sales s, times t
       WHERE s.time_id = t.time_id AND t.calendar_quarter_id = 1769
    GROUP BY prod_id,
             channel_id,
             promo_id,
             t.calendar_quarter_id; 
    
    168 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3397140165
    
    --------------------------------------------------------------------------------------------------
    | Id  | Operation		       | Name		 | Rows  | Bytes | Cost (%CPU)| Time	 |
    --------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT	       |		 |     1 |    86 |    33   (7)| 00:00:01 |
    |   1 |  HASH GROUP BY		       |		 |     1 |    86 |    33   (7)| 00:00:01 |
    |*  2 |   HASH JOIN		       |		 |  6423 |   539K|    32   (4)| 00:00:01 |
    |   3 |    VIEW 		       |		 |    34 |   272 |    19   (6)| 00:00:01 |
    |   4 |     HASH UNIQUE 	       |		 |    34 |   272 |    19   (6)| 00:00:01 |
    |*  5 |      TABLE ACCESS FULL	       | TIMES		 |    90 |   720 |    18   (0)| 00:00:01 |
    |   6 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 |
    --------------------------------------------------------------------------------------------------

    使用了物化视图

    SQL>SELECT t.calendar_quarter_id,
             prod_id,
             channel_id,
             promo_id,
             SUM (quantity_sold) quantity_sold,
             SUM (amount_sold) amount_sold
        FROM sales s, times t
       WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'
    GROUP BY prod_id,
             channel_id,
             promo_id,
             t.calendar_quarter_id;
    
    168 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3221963832
    
    -----------------------------------------------------------------------------------------------------------------
    | Id  | Operation		      | Name	| Rows	| Bytes |TempSpc| Cost (%CPU)| Time	| Pstart| Pstop |
    -----------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT	      | 	|  8146 |   373K|	|   848   (2)| 00:00:11 |	|	|
    |   1 |  HASH GROUP BY		      | 	|  8146 |   373K|  3632K|   848   (2)| 00:00:11 |	|	|
    |*  2 |   HASH JOIN		      | 	| 57459 |  2637K|	|   546   (2)| 00:00:07 |	|	|
    |   3 |    PART JOIN FILTER CREATE    | :BF0000 |    91 |  1820 |	|    18   (0)| 00:00:01 |	|	|
    |*  4 |     TABLE ACCESS FULL	      | TIMES	|    91 |  1820 |	|    18   (0)| 00:00:01 |	|	|
    |   5 |    PARTITION RANGE JOIN-FILTER| 	|   918K|    23M|	|   525   (2)| 00:00:07 |:BF0000|:BF0000|
    |   6 |     TABLE ACCESS FULL	      | SALES	|   918K|    23M|	|   525   (2)| 00:00:07 |:BF0000|:BF0000|
    -----------------------------------------------------------------------------------------------------------------

    没有使用物化视图。

    其实条件实质上是一样的,因为t.calendar_quarter_desc = '1998-01' 和t.calendar_quarter_id = 1769 在times表中表示相同的数据。

    但是Oracle不知道CALENDAR_QUARTER_DESC与CALENDAR_QUARTER_ID的关系。

    我们在创建Dimension时,可以为LEVEL指定属性值。

    如下

    CREATE DIMENSION SH.TIMES_DIM
      LEVEL DAY                            IS 
        (SH.TIMES.TIME_ID)
      LEVEL MONTH                          IS 
        (SH.TIMES.CALENDAR_MONTH_ID)
      LEVEL QUARTER                        IS 
        (SH.TIMES.CALENDAR_QUARTER_ID)
      LEVEL YEAR                           IS 
        (SH.TIMES.CALENDAR_YEAR_ID)
      HIERARCHY CAL_ROLLUP
        (DAY                               CHILD OF
         MONTH                             CHILD OF
         QUARTER                           CHILD OF
         YEAR)
      ATTRIBUTE QUARTER DETERMINES 
        (SH.TIMES.CALENDAR_QUARTER_DESC,
         SH.TIMES.DAYS_IN_CAL_QUARTER,
         SH.TIMES.END_OF_CAL_QUARTER,
         SH.TIMES.CALENDAR_QUARTER_NUMBER)
      ATTRIBUTE YEAR DETERMINES 
        (SH.TIMES.CALENDAR_YEAR,
         SH.TIMES.DAYS_IN_CAL_YEAR,
         SH.TIMES.END_OF_CAL_YEAR);

    我们再次查询

    SQL> SELECT t.calendar_quarter_id,
             prod_id,
             channel_id,
             promo_id,
             SUM (quantity_sold) quantity_sold,
             SUM (amount_sold) amount_sold
        FROM sales s, times t
       WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'
    GROUP BY prod_id,
             channel_id,
             promo_id,
             t.calendar_quarter_id;  
    
    168 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3290467316
    
    --------------------------------------------------------------------------------------------------
    | Id  | Operation		       | Name		 | Rows  | Bytes | Cost (%CPU)| Time	 |
    --------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT	       |		 |    20 |  2240 |    33   (7)| 00:00:01 |
    |   1 |  HASH GROUP BY		       |		 |    20 |  2240 |    33   (7)| 00:00:01 |
    |*  2 |   HASH JOIN		       |		 | 17191 |  1880K|    32   (4)| 00:00:01 |
    |   3 |    VIEW 		       | VW_GBF_5	 |    91 |  3094 |    19   (6)| 00:00:01 |
    |   4 |     HASH GROUP BY	       |		 |    91 |   728 |    19   (6)| 00:00:01 |
    |   5 |      VIEW		       |		 |    91 |   728 |    19   (6)| 00:00:01 |
    |   6 |       HASH UNIQUE	       |		 |    91 |  1456 |    19   (6)| 00:00:01 |
    |*  7 |        TABLE ACCESS FULL       | TIMES		 |    91 |  1456 |    18   (0)| 00:00:01 |
    |   8 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 |
    --------------------------------------------------------------------------------------------------

    这次就使用了物化视图。


    新闻标题:Oracle学习之数据仓库(二)Dimension的理解
    文章路径:http://cdkjz.cn/article/ppigco.html
    多年建站经验

    多一份参考,总有益处

    联系快上网,免费获得专属《策划方案》及报价

    咨询相关问题或预约面谈,可以通过以下方式与我们联系

    大客户专线   成都:13518219792   座机:028-86922220