Data Generation

To generate data, click on Data Generation > Generate on left panel.

This UI should be seen:

To generate data, form must be filled:

  • Model: Pick up a model among the ones available.
  • Connector: Once model has been chosen, a connector drop-down list appear. Pick up the connector where to generate data to.
  • Connectors Configuration: Once a connector has been selected some required and optional configurations appears. To know in details, check section ‘Connectors Configuration’
  • Number of Batches: How many batches of generation to run. (By default, one batch will retain all data generated in-memory and then flushes it before going to the next)
  • Number of Rows per Batch: How many rows to generate by batch ? (To size according to server’s memory as all these rows will be retained in-memory)
  • Number of Threads: How many threads to launch data generation in parallel ? The more you set, the faster generation should be. (To size according to server’s CPU)
  • Credentials: To add credentials of any kind that will be used to authenticate toward the connector. (So generating to AWS will require AWS access keys, toward HDFS, probably a keytab). By default, Datagen has only credentials set in its configuration file.

Note: Total number of rows generated will be Number of Batches * Number of Rows Per Batch

Finally, button Generate Data will create a pop-up showing that command has been well sent, a quick recap and a progress bar that needs to be refreshed with Refresh Button:

This pop-up can be closed and generation can be watched in Commands.