Title
Create new category
Edit page index title
Edit category
Edit link
Ingestion Optimization using Partitioning
Concept
- Impacts on query performance by organizing tables into partitions based on values of specific columns
- Impacts merge job performance, supports all data types
For date data types, user can extract date parts
(example: Year, Month, Day)

How it works
- Creates a folder for each primary partition value on HDFS
- Records if the table will be partitioned/moved to a specific folder according to their partition value
- Number of partitions created = number of distinct values of primary partition column
Best practices
- Select a column which is mostly used in business queries
- Ensure the incremental delta is distributed across fewer partitions. This helps in merge performance.
Skewed Partitions Example

Uniform Distribution of Data Example

- Creates partitions HR, Sales, Engineering
- Queries on Hive Table with Department as filtering condition will be faster
- Also in incremental delta merge, 3 parallel merge jobs will be executed, which helps in faster merging
Objective
In this session,
- Configure IW table to add PARTITION column and batch id
- Perform incremental ingestions with BAD PARTITION column and GOOD PARTITION column
Prerequisites
- A test Oracle schema called PARTITIONING_DEMO is present, which can be a source schema for the source to test partitioning
- An Oracle table called PARTITION_TEST is already created in the Oracle schema PARTITIONING_DEMO using the following table definition from partition_test.sql
xxxxxxxxxxCreate table Partition_test (Batch_id Number(5,0),Part_id Number(10,0) Primary Key,Part_name Varchar2(15) NOT NULL,Part_date Date,City Varchar2(15),Zipcode Number(5,0));- An oracle source called PARTITIONING_DEMO_username, with the above schema as a source should exist in the Infoworks
- Schema crawl is performed successfully for the IW Oracle source
- The source data zip file must be downloaded from the following link directly or using wget if needed
xxxxxxxxxxwget https://s3.amazonaws.com/iw-training-data/partitioning_demo_data/partitioning_demo_data.zipConfigure IW table to add PARTITION column
- Now load the oracle data with 300000 records using the following PL/SQL code : partition_test_initial_load.sql
xxxxxxxxxx// Click to edit codeThe above PL/SQL populates the PARTITION_TEST table with 300,000 random records uniformly distributed across 20 cities and Part_date values falling in between 2015-12-01 and 2015-12-02
- After the Oracle source data is loaded with 300000 records, we now configure the IW table PARTITION _TEST as below, with
Ingest type: Batch-ID-based Incremental Load
Natural keys: PART_ID
Batch-ID column: BATCH_ID (NUMBER)

- Now that source table is populated we will perform Initialize And Ingest Now on IW table PARTITION_TEST configured by creating table group called PARTITION TEST_DATA_GROUP
- Once the initial full load is done successfully, we will configure the IW table PARTITION_TEST by adding a PARTITION column
- In the Source Configuration, click on Configure button for the table PARTITION_TEST

- Check Natural Key as PART_ID and scroll down in the Table Configuration to the section Target Configuration
- Check the Partition hive table option

- Once Partition hive table option is checked, a drop down called Partition Hive Table on appears, from which we can select the PARTITION column
- And change the Number of Secondary Partitions to 1
- Click on Save Configuration, after selecting the required PARTITION column

- Click Yes, to confirm the configuration changes
- Click Yes, Reorganize Data Now to start reorganizing the data with the select PARTITION column
Perform Incremental Ingestions with BAD PARTITION column and GOOD PARTITION column
- In this section, first, we will perform the incremental ingestion with BAD PARTITION column which is CITY (VARCHAR2) column, as from the stored procedure we can see that the more values are randomized making the distribution non-uniform
- After the normal full load is done using the Initialize and Ingest Now button in table group, configure the table to add the partition key CITY, as advised in the previous section
- Now that table is configured with partition column, PART_DATE, we will add the incremental data of 100000 records in source Oracle table by executing the below-stored procedure: partition test_incremental_load.sql
insert into partition_test values(2,300001,’Charlie’,TO_DATE(‘2015-12-01′,’yyyy-mm-dd’),’Dallas’,20000);beginfor i in 1 .. 99999loopinsert into partition_test values ( 3,(select max(PART_ID)+1 from partition_test),’Henry’,TO_DATE( TRUNC( DBMS_RANDOM.VALUE(TO_CHAR(DATE ‘2015-12-01′,’J’),TO_CHAR(DATE ‘2015-12-02′,’J’))),’J’),(SELECTCASE round(dbms_random.value(1,2))WHEN 1 THEN ‘LOUISVILLE’WHEN 2 THEN ‘KENTUCKY’END FROM dual),(SELECT round(DBMS_RANDOM.VALUE(10000,10005)) from dual));end loop;UPDATE partition_test SET PART_NAME = ‘TEST’ where PART_ID >= 0 and PART_ID < 100001;commit;end;- From the stored procedure we can see that the incremental data added is uniformly distributed across 2 CITY values and the PART_DATE values are randomly generated between 2015-12-01 and 2015-12-012
- Once 100000 records are added to the Oracle source table PARTITION_TEST, click on Ingest Now button in the table group to start the incremental ingestion with CITY as PARTITION column
- For this incremental ingestion job with PARTITION key as CITY, it took 6 minutes and 12 seconds

- Repeat steps in the section Configure IW table to add PARTITION column and add CITY as the PARTITION column in the final step.
Truncate and load Oracle source data with the initial load of 30000 records from the PL/SQL code: partition_test_initial_load.sql
Perform Initialize and Ingest Now again which ingests the initial load of 300000 records
Configure the table with PART_DATE as partition column and Reorganize the data
- Now that, initial load is done and the table is configured with good partition key PART_DATE, load the incremental data again in Oracle source use the PL/SQL code stored procedure: partition_test_incremental_load.sql
- Perform incremental ingestion by clicking on the Ingest Now button in Table group window
- For this incremental ingestion job with PARTITION key as PART_DATE, it took 2 minutes and 17 seconds

Congratulations! You have incrementally ingested PARTITION_TEST tables with two different Partition columns PART_DATE & CITY and from the ingest job times, confirmed PART_DATE as the good column to partition on. You can now identify a good partition column and configure the table to add the partition.
Session Summary
- Configured the table to add the Partition column
- Performed incremental ingestions with two different partition columns
- Identified the good and bad partition columns
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
(C) 2015-2022 Infoworks.io, Inc. and Confidential