Options

There are many options available depending on the connector where to generate data and what format used (if this is a file).

Most of them are optionnal (unless specified as mandatory), as they have default values.

General

  • DELETE_PREVIOUS :
  • ONE_FILE_PER_ITERATION :

ADLS

  • ADLS_MAX_CONCURRENCY :
  • ADLS_MAX_UPLOAD_SIZE :
  • ADLS_BLOCK_SIZE :

CSV

  • CSV_HEADER :

HDFS

  • HDFS_REPLICATION_FACTOR :

HBase

These are MANDATORY:

  • HBASE_PRIMARY_KEY :
  • HBASE_COLUMN_FAMILIES_MAPPING :

HIVE

  • HIVE_TABLE_TYPE :
  • HIVE_TABLE_FORMAT :

  • HIVE_TABLE_PARTITIONS_COLS :
  • HIVE_TABLE_BUCKETS_COLS :
  • HIVE_TABLE_BUCKETS_NUMBER :

  • HIVE_THREAD_NUMBER :
  • HIVE_ON_HDFS :
  • HIVE_TEZ_QUEUE_NAME :

KAFKA

This one setting is MANDATORY:

  • KAFKA_MSG_KEY :

  • KAFKA_MESSAGE_TYPE :
  • KAFKA_REPLICATION_FACTOR :
  • KAFKA_PARTITIONS_NUMBER :
  • KAFKA_JAAS_FILE_PATH :
  • KAFKA_ACKS_CONFIG :
  • KAFKA_RETRIES_CONFIG :

OZONE

  • OZONE_REPLICATION_FACTOR :

SOLR

  • SOLR_SHARDS :
  • SOLR_REPLICAS :
  • SOLR_JAAS_FILE_PATH :

KUDU

These are MANDATORY:

  • KUDU_PRIMARY_KEYS :
  • KUDU_HASH_KEYS :
  • KUDU_RANGE_KEYS :

  • KUDU_REPLICAS :
  • KUDU_BUCKETS :
  • KUDU_BUFFER :
  • KUDU_FLUSH :

PARQUET

  • PARQUET_PAGE_SIZE :
  • PARQUET_ROW_GROUP_SIZE :
  • PARQUET_DICTIONARY_PAGE_SIZE :
  • PARQUET_DICTIONARY_ENCODING :
  • LOCAL_FILE_NAME :

Example

Below is a full example with all set:

  "Options": {
    
    "ONE_FILE_PER_ITERATION":  true,
    "DELETE_PREVIOUS":  true,

    "HBASE_PRIMARY_KEY":  "bool,progression",
    "HBASE_COLUMN_FAMILIES_MAPPING": "c:randomName,abbreviation,size,bool,progression,percentage,limitedName,userEmail,department;d:bytesLittleArray,bigSize,startDate,bytesArray,hash,birthdate,name,country,restrictedHash,category;e:longPercent,onePlusOne,onePlusTwo,formula_1,condition_2,recording_time",

    "SOLR_SHARDS":  1,
    "SOLR_REPLICAS":  1,
    "SOLR_JAAS_FILE_PATH":  "/tmp/solr.jaas",

    "KUDU_REPLICAS":  1,
    "KUDU_PRIMARY_KEYS":  "size,category,department",
    "KUDU_HASH_KEYS":  "size,department",
    "KUDU_RANGE_KEYS":  "category",
    "KUDU_BUCKETS":  32,
    "KUDU_BUFFER":  100001,
    "KUDU_FLUSH":  "MANUAL_FLUSH",

    "KAFKA_MSG_KEY": "bigSize",
    "KAFKA_MESSAGE_TYPE":  "json",
    "KAFKA_REPLICATION_FACTOR":  3,
    "KAFKA_PARTITIONS_NUMBER":  3,
    "KAFKA_JAAS_FILE_PATH":  "/tmp/kafka.jaas",
    "KAFKA_ACKS_CONFIG":  "all",
    "KAFKA_RETRIES_CONFIG":  3,

    "HIVE_THREAD_NUMBER":  1,
    "HIVE_TABLE_TYPE":  "EXTERNAL",
    "HIVE_TABLE_FORMAT":  "PARQUET",
    "HIVE_ON_HDFS":  true,
    "HIVE_TEZ_QUEUE_NAME":  "root.default",
    "HIVE_TABLE_PARTITIONS_COLS": "person_department",
    "HIVE_TABLE_BUCKETS_COLS": "city",
    "HIVE_TABLE_BUCKETS_NUMBER": 32,

    "CSV_HEADER":  true,
    
    "PARQUET_PAGE_SIZE":  1048576,
    "PARQUET_ROW_GROUP_SIZE":  134217728,
    "PARQUET_DICTIONARY_PAGE_SIZE":  1048576,
    "PARQUET_DICTIONARY_ENCODING":  true,

    "OZONE_REPLICATION_FACTOR":  3,

    "HDFS_REPLICATION_FACTOR":  3,

    "ADLS_MAX_CONCURRENCY": 4,
    "ADLS_MAX_UPLOAD_SIZE": 16777216,
    "ADLS_BLOCK_SIZE": 8388608
  }