Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is the parent class for all source classes. It is implemented using abc Python module. All source modules must inherit this class and overwrite all methods.

...

Example of required source output:

Input csv CSV file:

Code Block
languagetext
Date,Team1,Team2,FT,HT
2012-08-18,Arsenal,Sunderland,0-0,0-0
2012-08-18,Fulham,,5-0,2-0

Is transformed to:

Code Block
languagetext
[{'Date': '2012-08-18',
  'FT': '0-0',
  'HT': '0-0',
  'Team1': 'Arsenal',
  'Team2': 'Sunderland'},
 {'Date': '2012-08-18',
  'FT': '5-0',
  'HT': '2-0',
  'Team1': 'Fulham',
  'Team2': ''}]
  • getJobId(). Used for job locking. Returns a unique identifier for the load. If the source is database, it returns a hash of the select statement, if the source is CSV, it returns the file name etc. For incremental loads, must be the same for all related loads.
  • getSchema(). Returns the header of the data source (list containing the name of the source columns). It is used to expand the wildcards inside the facets configuration file and check if the mapped columns exist on the source.
  • add_args(parser). Used to add source related parameters to the Tool argparse parser.