Best Practices
Best Practices
Title
Message
Create new category
What is the title of your new category?
Edit page index title
What is the title of the page index?
Edit category
What is the new title of your category?
Edit link
What is the new title and URL of your link?
General Guidelines on Data Pipelines
Copy Markdown
Open in ChatGPT
Open in Claude
This article explains the general guidelines to be followed in Data Transformation.
Denormalization
- Store the denormalized target tables whenever possible for faster querying.
- Data storage cost is cheaper compared to query processing cost.
- By maintaining denormalized data, multiple I/O operations for joining using foreign keys are avoided which helps in performance benefits.
Execution Engine Performance Tuning
- The settings of execution engine impacts query performance significantly. Ensure the execution engine is set according to the cluster capacity and the type of queries to be executed.
- For example, the default value of the Hive configuration variable hive.mapjoin.smalltable.filesize is 25MB. This parameter helps determine the small-table to fit in memory for map-join. You must set this according to the available resources on the cluster.
- Similarly, other optimizations supported by the execution engine must be used for best query performance. For example, Hive supports SMB map join optimization to join bucketed and sorted tables. Set the relevant Hive configurations before executing queries involving such joins for optimum performance.
Table Statistics
- Gathering statistics in advance for all tables used in performance-critical or high-volume join queries provides significant cost-based join optimizations.
- Stats are computed for data transformation targets during pipeline build in most of the cases. However, if queries involve joining with Infoworks sources or external data lake tables, precompute statistics for these tables.
- Column statistics are expensive to calculate and provide insignificant benefit while querying denormalized tables.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard
Last updated on
Next to read:
Data Transformation Target ConfigurationsFor more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
(C) 2015-2022 Infoworks.io, Inc. and Confidential
Discard Changes
Do you want to discard your current changes and overwrite with the template?
Archive Synced Block
Message
Create new Template
What is this template's title?
Delete Template
Message