Usage Reporter

Infoworks Usage Reporter utility is part of the Infoworks DataFoundry platform used to measure, record and report data regarding the resource and capability utilization within the Infoworks DataFoundry platform. The Usage Reporter records and reports a single metric of AWS EMR Cluster Core Count to support the core-hour model of billing.

The Usage Reporter supports three main commands: produce, report, cleanup. Each of these commands can be executed only once or at configurable intervals. This enables you to either run the service as a daemon or use external scheduling utilities to collect and report metrics automatically.

  • The produce command performs API calls to AWS APIs using credentials available in the system, stores the core count value with timestamp and related information in the database and logs the values to a file. This command can be executed only once (usually called from an external scheduler like Cron or to test/validate) or executed in daemon mode where the application performs the collection at configurable intervals. You can use the log files to view the measured values.
  • The report command is used to produce reports of measurements between specific dates. The report command uses the data stored in the database and not the log files. This report can be encrypted using Infoworks keys for proof of integrity.
  • The cleanup command clears historical values from the database and it is irreversible.

The AWS EMR core count metric is calculated as follows:

AWS EMR Core Count Metric = Number of cores of running worker nodes + Number of cores of running task nodes

The Usage Reporter is installed in the Infoworks DataFoundry installation folder and includes the following files:

IW_HOME/

  • bin/iw-metrics.sh: Entry point for all operations
  • conf/metrics-collector.json: Main configuration file
  • conf/metrics-collector-log4j2.xml: The log configurations
  • lib/metrics-collector: All binaries including dependencies

CLI Usage

Following are the CLI usage options:

iw-metrics.sh``-p|-r|-c [ -d -a -o [report file] -n -s <YYYY-MM-DD> -e <YYYY-MM-DD> ]

  • -p: produce command produces (reads) metrics and stores (and logs). Only one of -f, -c, -p commands can be used.
  • -r: report command fetches data from storage to (-o)utput file for optional time range (-s,-e). Only one of -f, -c, -p commands can be used.
  • -c: clear command clears historical metrics data for optional time range (-s,-e). It is irreversible. Only one of -f, -c, -p commands can be used.
  • -d: if set, executes in daemon mode (periodic execution). The execution interval is configurable.
  • -a: if set, fetches configurations from system automatically, and it fails if unable to get any one of the required configuration. The default value is false.
  • -o: output file path (use absolute path if unsure) for the report command. If the file already exists, it is not overwritten and the program exits with error. To overwrite, use the overwrite-report-file configuration.
  • -n: encrypts the report output file.
  • -s: start time period for report and clear command in the YYYY-MM-DD format. The default value is the 1st of current month.
  • -e: end time period for report and clear command in the YYYY-MM-DD format. The default value is the current date.

Configurations

For suggested path of execution (in EMR context with system roles in EC2 IAM), the program picks up the cluster ID and authentication information (if any) from system configuration, and hence no configurations are required to be set manually.

NOTE: Some configurations in the configuration files are available as CLI arguments. These CLI argument values override the value in configuration files.

Following are the configuration files:

IW_HOME/

  • conf/metrics-collector.json: Standard JSON file which includes the configuration keys.
  • conf/metrics-collector-log4j2.xml: Log4j configuration to control the program logs and CSV metric logs.

NOTE: To disable CSV metrics log, change the log level to DEBUG for the logger with the name, io.infoworks.metrics-logger.

Following are the configurations in the metrics-collector.json file:

  • report-file: path of the output file for the report command (-o in CLI).
  • encrypt-output: indicates whether to encrypt output file (-e in CLI).
  • csv-header: adds header to CSV report file. The default value is true.
  • daemon: indicates whether to execute in daemon mode (-d in CLI).
  • poll-intvl-mins: daemon mode execution interval. The default value is 60 mins and the suggested value is 10 mins or more.
  • auto-config: fetches configurations from system automatically and it fails if unable to get any one of the required configurations. The default value is false.
  • record-change-only: if set to true the Usage Reporter saves metrics only when the value is different from the previous reading and ignores this reading if the values are same. The default value is false.
  • overwrite-report-file: overwrites the report file, if exists. The default value is false.
  • emr-cluster-id: cluster ID of the target EMR cluster. If the Usage Reporter is run on an EMR cluster, the value is fetched from the system.

Common Use cases

Following are some common use cases with commands:

  • Continuously produce AWS core count data on an EMR Cluster at fixed intervals.
  • Read historical metrics between two dates into a specific file and encrypt the output.
  • Cleanup all data stored in the database between two dates.

Production

Use Case: Continuously produce AWS core count data on an EMR Cluster.

CLI Script: ./iw-metrics.sh -p -d -a

where,

  • -p: produce command measures and saves metrics.
  • -d: daemon mode produces periodically.
  • -a: auto-detects cluster ID using metadata service.

NOTE: To adjust the interval between measurements, set the poll-intvl-mins configuration.

Report

Use Case: Read historical metrics between two dates into a specific file and encrypt the output.

CLI Script Command: ./iw-metrics.sh -r -n -o /path/to/output -s 2017-10-1 -e 2017-10-31

where,

  • -r: report command reads metrics from storage.
  • -o: path of output file.
  • -n: encrypts the output file.
  • -s: fetches data with timestamp from 2017-10-1 00:00:00.000.
  • -e: fetches data with timestamp till 2017-10-31 23:59:59.999.NOTE: Use the YYYY-MM-DD format, the time details of the timestamp are auto-modified.

Cleanup

Use Case: Cleanup all data stored in the database between two dates.

CLI Script Format: ./iw-metrics.sh -c -s 2017-10-1 -e 2017-10-31

where,

  • -c: clear command erases data from storage.
  • -s: clears data with timestamp from 2017-10-1 00:00:00.000.
  • -e: clears data with timestamp till 2017-10-31 23:59:59.999.NOTE: Use the YYYY-MM-DD format, the time details of the timestamp are auto-modified.

AWS Authentication

To measure the core-hour values, the Usage Reporter uses AWS SDK to interact with the AWS services. NOTE: Authentication information is required for AWS access.

An instance profile with authentication information and configurations will be stored in the default location, ~/.aws/credentials, if run on an EMR Cluster. If not, creating an instance profile is the suggested authentication mechanism. For more details on AWS Authentication, see AWS Credentials Javadoc.

If you cannot create an instance profile, use environment variables as defined in AWS Credentials Javadoc or use the credentials file. The legacy credentials file can be created under the user home directory of the metrics collector with the following details:

Copy

NOTE: Ensure that this file is only readable for the Infoworks user for security (400 file permission).

AWS EMR Cluster ID

The Usage Reporter uses the AWS SDK to fetch the user data. The Java operation is equivalent to the following shell command:

Copy

Disabling CSV Metrics Log

By default, all metrics produced are also recorded in the CSV log for administration purposes. To disable CSV logging perform the following:

  1. Edit the log4j configuration file, IW_HOME/conf/metrics-collector-log4j2.xml.
  2. Search the following: <Logger name=”io.infoworks.metrics-logger” level=”info”>
  3. Modify it as follows: <Logger name=”io.infoworks.metrics-logger” level=”error”>

Storage Space

By default, when producing in daemon mode, the application writes the data to the database and log file. Log files are only kept for 30 days (configurable in metrics-collector-log4j2.xml). Cron jobs can be scheduled to clear out older data using the clear command.

Date Format

The script obtains inputs in the YYYY-MM-DD(“%Y-%m-%d”) format and fills the time depending on the context. Dates are stored in the ISO 8601 format in the database and logs are adjusted to UTF time zone.

The reports are generated with UTC timestamps. For example, if a report is generated with arguments start time 2018-01-01 and end time 2018-01-31, the report will contain timestamps between 2017-10-1 00:00:00.000 and 2017-10-31 23:59:59.999.

Encryption and JCE

The Usage Reporter uses 256 bit encryption and hence Unlimited Strength JCE Policies must be installed.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard