Installation using CSD & Parcel
This tutorial requires a running CDP 7.1.9++ platform with admin access to Cloudera Manager.
Note that this has been written for CDP-7.1.9.3 and DATAGEN-1.0.0, for future releases, please change the repository to point to the new release.
List of links for last release
To get the links corresponding to the Datagen Version you want and your CDP Version, go to the S3 repository and navigate to find links of wanted CSD and Parcels:
https://datagen-repo.s3.eu-west-3.amazonaws.com/index.html
Note: Advise is to always go to latest Datagen Version (currenlty 1.0.0)
Setup CSD
Go to Cloudera Manager and make a wget of this:
wget https://datagen-repo.s3.eu-west-3.amazonaws.com/1.0.0/CDP/7.1.9/csd/DATAGEN-1.0.0.7.1.9.jar
Make a copy of the downloaded jar file into /opt/cloudera/csd/:
cp DATAGEN-*.jar /opt/cloudera/csd/
chown cloudera-scm:cloudera-scm /opt/cloudera/csd/DATAGEN*
chmod 774 /opt/cloudera/csd/DATAGEN*
Restart Cloudera Server:
systemctl restart cloudera-scm-server
Setup Parcel
Go to Cloudera Manager, in Parcels > Parcel Repositories & Network:
Add this public repository to Cloudera Manager: https://datagen-repo.s3.eu-west-3.amazonaws.com/1.0.0/CDP/7.1.9/parcels/
Save & Verify to make sure URL is correct, you should have:
It is now possible to download Datagen parcel:
Then distribute it:
And finally activate it:
At the end, result should be:
Add Service wizard
Go Home in Cloudera Manager and pick the cluster where you want to install Datagen.
Click on Actions > Add a Service.
Now, it is possible to add Datagen as a Service to CDP:
Start the Add Wizard by clicking on Continue.
Select the Ranger dependency, if you are running Ranger (and you should), so Datagen can automatically creates policies in Ranger.
Select where to places Datagen servers (best is to start with only one and scale up later if needed):
Review changes, they all should be filled in automatically, however it is recommended to set properly the ranger properties (they could be removed later):
You should end up with:
Restart CMS before going on: Clusters > Cloudera Management Service , then Actions > Restart.
Start Service
Before launching commands, it is required to install jq with following commands:
yum install jq
In Actions > Start.
Once command pop up launched, you can browse Role Log and click on Full Log File:
and verify it started well, you should have: