This is the parent class for all source classes. It is implemented using abc Python module. All source modules must inherit this class and overwrite all methods.
...
Example of required source output:
Input csv CSV file:
Code Block | ||
---|---|---|
| ||
Date,Team1,Team2,FT,HT 2012-08-18,Arsenal,Sunderland,0-0,0-0 2012-08-18,Fulham,,5-0,2-0 |
Is transformed to:
Code Block | ||
---|---|---|
| ||
[{'Date': '2012-08-18', 'FT': '0-0', 'HT': '0-0', 'Team1': 'Arsenal', 'Team2': 'Sunderland'}, {'Date': '2012-08-18', 'FT': '5-0', 'HT': '2-0', 'Team1': 'Fulham', 'Team2': ''}] |
- getJobId(). Used for job locking. Returns a unique identifier for the load. If the source is database, it returns a hash of the select statement, if the source is CSV, it returns the file name etc. For incremental loads, must be the same for all related loads.
- getSchema(). Returns the header of the data source (list containing the name of the source columns). It is used to expand the wildcards inside the facets configuration file and check if the mapped columns exist on the source.
- add_args(parser). Used to add source related parameters to the Tool argparse parser.