Registering Hive UDF
User Defined Functions (UDF) allow you to create functions to process records or groups of records. Infoworks allows you to register a UDF logic which can be used as a function in the transformation nodes.
Registering Hive UDF
Following are the steps to register a Hive UDF:
- Using Beeline, login to the HiveServer with credentials of a user having access to UDF. For example,
beeline -u jdbc:hive2://mycluster.com:10000 -n hive -p
- In the Hive prompt, select a database using the following command:
USE <database name>;
- Run the registration command that corresponds to the cluster configuration method used to find the jar. The methods include Direct Jar Reference and Adding Jar and Registering UDF.
- Verify that the UDF is registered, using the following command:
SHOW FUNCTIONS;
- To use the UDF in SQL queries, use the following reference:
<database>.<udf_name>
Direct Jar Reference
For this method, the JAR location must be included in the command. For example, CREATE FUNCTION udftypeof AS 'com.mycompany.hiveudf.TypeOf01' USING JAR 'S3:///warehouse/tablespace/managed/TypeOf01-1.0.jar';
Adding Jar and Registering UDF
For this method, the jar must be available in the Hive classpath so that the classloader can find the jar.
In Beeline, run the command to register the UDF.
- Add jar to the current Hive session using the following command:
add jar S3:///warehouse/tablespace/managed/TypeOf01-1.0.jar
- Register UDF using the following command:
CREATE FUNCTION udftypeof AS 'com.mycompany.hiveudf.Typeof01';