Leaf Code Documentation
This file forms a single leaf node used to collect metadata from the OpenMeteo API and transfer that information to DSN Central.
This node, must work in collaboration with a server containing the Distributed Scraping Network (DNS) API. The repository to access, create, and use this tool is within the Organization (look at the Distributed Scraping Network repository)
This node communicates with the DSN API to determine the date, location, and time of the observation , performs the request to the Open-Meteo API to gather environmental variables, format and send the collected data back to the API for storage.
Please review the DSN API Docs for more information: https://dsn-central.travisdawson.com/docs
Attributes:
Name | Type | Description |
---|---|---|
dsn_endpoint |
str
|
The endpoint of the DSN API |
weather_endpoint |
str
|
The endpoint to access the Open-Meteo historical API |
hourly_weather_var |
list
|
A list specifying all of the hourly weather variables to be collected per observation. |
daily_weather_var |
list
|
A list specifying all of the daily weather variables to be collected per observation. |
job_limit |
int
|
Specifying the number of observations to collect metadata for. The value is limited to 1000 per day. Please respect the Open-Meteo limits. |
rate_limit |
int
|
Specify the rate of requests to the Open-Meteo historical API |
date_check(job)
Date checker ensures the date of the Open Meteo request is the same day.
This method counters, and ensures correct formatting for the edge case whereby an observation occurs at Midnight, where the start and end dates are potentially different. If this is the case the start date is formatted to be equal to the date observed on.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job |
Json
|
The observation information critical to making the Open-Meteo API request in Json format as retrieved from the DSN API. |
required |
Returns:
Type | Description |
---|---|
Json
|
The job with the start_date correct modified in the Json object to be equal to the observed on date. |
Source code in app/main.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
determine_hour(weather_data, observed_on)
This method determines the hour in which the observation occurred, and returns the index at which the hourly weather variables can be retrieved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
weather_data |
Json
|
The Json of the collected weather data |
required |
observed_on |
str
|
The date and time in which the observation occurred in the format "%Y-%m-%d %H:%M:%S%z" |
required |
Source code in app/main.py
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
execute_request(job)
Method performs weather request to Open_meteo using Job info
The auto timezone parameter means that coordinates will be automatically resolved to local timezone The start_data and end_date are set a single hour apart encompassing the hour in which the observation occurred.
Note, if too many requests are sent to Open-Meteo a 403 (too many requests) response may be generated. If this is the case the rate limit is increased by 20%
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job |
Json
|
The Json information describing critical information for the Open-Meteo request. |
required |
Returns:
Type | Description |
---|---|
Json response
|
The Open-Meteo response to the API request, containing the requested metadata values. If the request fails, None is returned. |
Source code in app/main.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
get_job_info()
Method requests the current job (observation) to collect data for from the DSN
Returns:
Type | Description |
---|---|
Request Json
|
The contents of the request in data format if the GET request is successful. If it is not successful, None is returned and the stacktrace is printed. |
Source code in app/main.py
54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
job_complete_formatting(weather_data, job, weather_hour, hour_index)
Method formats the job and weather data for a POST request to the DSN API
Parameters:
Name | Type | Description | Default |
---|---|---|---|
weather_data |
Json
|
The Json of the collected weather data |
required |
job |
Json
|
The Json of the current job (observation) to collect weather data for |
required |
weather_hour |
int
|
An string specifying the hour of weather data to be collected in the format %Y-%m-%dT%H:%M |
required |
hour_index |
int
|
An integer specifying the index of the weather hour in order to collect the correct hour of weather data from the hourly weather lists. |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing the correctly formatted data to be POSTed to the DNS |
Source code in app/main.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
|
job_complete_response(weather_data, job)
Method POSTs the collected metadata back to the DSN for storage
This method formats both the job and weather information into an acceptable format for storage within the DSN. Two errors can occur within this method, simply printed to terminal as to continue to scraping process: an error in formatting the data for posting due to missing or incorrect data and an error in the POST request.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
weather_data |
Json
|
The Json weather data collected from the Open-Meteo request |
required |
job |
Json
|
The Json information detailing the observation. |
required |
Source code in app/main.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
progress_bar(start_time, job_no)
A progress bar displayed on terminal to indicate the progress of the scraping
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_time |
datetime
|
The date and time at which the scraping was started |
required |
job_no |
int
|
The number of jobs iterated through in this execution of the scraping node. |
required |
Source code in app/main.py
257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
|
scraping_node_process()
The overall method detailing the scraping node process.
Source code in app/main.py
274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
|
send_error_response()
This method send a POST request to the DSN indicating that an error occured at some point in the current job, and it could not be completed.
Source code in app/main.py
245 246 247 248 249 250 251 252 253 254 |
|