Introduction

Infoworks DataFoundry

Enterprise Data Operations + Orchestration

Infoworks Enterprise Data Operations and Orchestration platform automates the development and operationalization of data pipelines from source to consumption in support of business intelligence (BI), machine learning (ML) and artificial intelligence (AI) analytics applications.

The Challenge

Leverage the Strategic Value of Your Company’s Data

Businesses want to redefine their operations and customer experience to be competitive in an increasingly digital world. Winning in this environment is highly dependent on the ability of an organization to harness data effectively.

However, businesses struggle to keep up with the demand for new analytics use cases in support of their changing business models. The reality is that over 80% of big data projects fail to deploy to production because project implementation is a complex, resource-intensive effort that takes months or even years. The technology is complicated and the people who have the necessary skills are either extremely expensive or difficult to find. In addition, the market is rapidly evolving with constantly changing technologies, while organizations simultaneously move from on-premise to cloud, multi-cloud or hybrid implementations, very often orchestrating their data and data pipelines across multiple environments.

The Solution

DataFoundry for Databricks

Enterprise Data Operations and Orchestration (EDO2) refers to the systems and processes that enable businesses to organize and manage data from disparate sources and process the data for delivery to analytic applications. EDO2 systems aim at shorter development cycles, increased deployment frequency, and more dependable releases of data pipelines in close alignment with business objectives.

Infoworks DataFoundry is the only EDO2 software system that automates the development and operationalization of data pipelines from source to consumption in support of business intelligence (BI), machine learning (ML) and artificial intelligence (AI) analytics applications. Historically, data integration platforms have provided point solutions for each step in the development and management of data pipelines and workflows. In contrast, DataFoundry integrates these modules into a fully unified system running on the Databricks managed Spark platform that provides a holistic and agile environment for delivering data, data pipelines and data workflows that scale elastically as your needs fluctuate.

DataFoundry for Databricks lays the foundation for the digital transformation of business, with a complete solution that provides:

• Agility: The fastest and most automated path to launch analytics use cases running on Databricks at scale.

• Flexibility: The only system that enables businesses to manage and orchestrate enterprise data operations in one venue and freely choose the best place to run specific applications without recoding.

• End-to-end services: The only integrated system to manage data operations and orchestration from data sources through to consumption by analytics applications.

• Extensibility: Architected to adapt to new business requirements and technologies.

Infoworks in Action

Infoworks DataFoundry has been deployed in production by large enterprises to run business critical applications. Infoworks’ customers have successfully implemented complex, large-scale analytics use cases in days instead of months with minimal resources. Some examples of these successes are:

Fortune 10 Retailer: Advanced Data Application

Implemented near-real-time, machine learning business process in 19 days:

Synchronized business process data from Teradata every 10 mins
Achieved a data availability SLA of 15 minutes
Implemented by 2 engineers in 19 days from requirements to production

Leading CPG Company: Self-Service BI and Analytics

Reduced development cycle from 6 months to 1 week:

7 data sources, 3 years of production data
8 pipelines with all transformation logic
8 optimized data models and 3 cubes
13 reports and dashboards

Infoworks Platform

sent by email

Data Foundry Capabilities

Infoworks DataFoundry for Databricks provides a complete solution that automates end-to-end data workflows from source to consumption as well as the on-going operational management of those workflows. DataFoundry is an EDO2 system that is platform independent and delivers the following automated capabilities:

Data Ingestion and Synchronization

Data Source Crawling and Ingestion

Automatically crawls data sources, ranging from flat files, XML, JSON to relational databases such as Teradata, Oracle, and SQL Server.

Google crawls the web to get web data; likewise, the Infoworks DataFoundry crawls data sources and ingests source data in a high-performance parallel process, while automatically preserving data precision.

Metadata Synchronization

Learns the metadata and infers data relationships for the data ingested from external data sources as well as data sets created using Infoworks. It also tracks end-to-end data lineage so that users can trace data elements back to the original source systems and perform downstream impact analysis.

Data Synchronization

Continuously synchronizes source data from enterprise databases, data warehouses, and file sources. Changing data is captured from the source systems using log-based and query-based methods. The changed data is merged with the base data in a high-performance continuous merge process.

Automatically handles slow-changing-data and schema changes and creates current and historical tables.
Supports export functionality to other enterprise's operational and data warehouse systems.
Supports streaming, batch and incremental mode of data synchronization and export.

Data Transformation and Pipeline Design

Provides self-service data preparation using an interactive, drag-and-drop data transformation capability with support for SQL-based and other transformations. Users work with data in a collaborative, suggestion-based interface that reduces or eliminates dependence on IT skills.

Advanced Analytics Integration

Integrates data pipelines with advanced analytics algorithms from libraries such as SparkML & R, with no need for coding. Builds trained models or import pre-trained models into data pipelines.

Orchestration and Production Operations Management

Designs end-to-end work-flows and orchestrate in production with fault-tolerant, distributed execution. Migrates from development environments to production across big data or cloud platforms with single-click operations.

Portability

Infoworks automation also makes it easy to move from an on-premise Hadoop platform to the cloud, or from one cloud environment to another. One Infoworks customer moved an entire set of production workflows from Microsoft Azure to Google Cloud Platform in less than a day.

Enterprise-Grade Security Integration

The Infoworks DataFoundry provides security integration for user authentication and data security policies. It supports Single-sign-on/LDAP integration, Kerberos based authorization. It supports encryption for data in motion and at rest.

Demo

See the Demo for a brief understanding of the Infoworks DataFoundry functionalities.

Last updated on