Registering Hive UDF

User Defined Functions (UDF) allow you to create functions to process records or groups of records. Infoworks allows you to register a UDF logic which can be used as a function in the transformation nodes.

Registering Hive UDF

Following are the steps to register a Hive UDF:

  • Using Beeline, login to the HiveServer with credentials of a user having access to UDF. For example, beeline -u jdbc:hive2://mycluster.com:10000 -n hive -p
  • In the Hive prompt, select a database using the following command: USE <database name>;
  • Run the registration command that corresponds to the cluster configuration method used to find the jar. The methods include Direct Jar Reference and Adding Jar and Registering UDF.
  • Verify that the UDF is registered, using the following command: SHOW FUNCTIONS;
  • To use the UDF in SQL queries, use the following reference: <database>.<udf_name>

Direct Jar Reference

For this method, the JAR location must be included in the command. For example, CREATE FUNCTION udftypeof AS 'com.mycompany.hiveudf.TypeOf01' USING JAR 'S3:///warehouse/tablespace/managed/TypeOf01-1.0.jar';

NOTE UDF registered using this approach will be available in other sessions as well.

Adding Jar and Registering UDF

For this method, the jar must be available in the Hive classpath so that the classloader can find the jar.

In Beeline, run the command to register the UDF.

  • Add jar to the current Hive session using the following command: add jar S3:///warehouse/tablespace/managed/TypeOf01-1.0.jar
  • Register UDF using the following command: CREATE FUNCTION udftypeof AS 'com.mycompany.hiveudf.Typeof01';

NOTE Jars added using this approach will be available in the current session only. To permanently register the jar, add the jar to hive.aux.jars.path in the Hive site.xml file and restart the Hive server.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard