Awesome
Get daily weather station data (Global)
<!-- [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/scotthosking/get-station-data/master?filepath=ghcn_monthly_data.ipynb) -->A set of Python tools to make it easier to extract weather station data (e.g., temperature, precipitation) from the Global Historical Climatology Network - Daily (GHCND)
"The Global Historical Climatology Network daily (GHCNd) is an integrated database of daily climate summaries from land surface stations across the globe. GHCNd is made up of daily climate records from numerous sources that have been integrated and subjected to a common suite of quality assurance reviews. GHCNd contains records from more than 100,000 stations in 180 countries and territories. NCEI provides numerous daily variables, including maximum and minimum temperature, total daily precipitation, snowfall, and snow depth. About half the stations only report precipitation. Both record length and period of record vary by station and cover intervals ranging from less than a year to more than 175 years." source
More information on the data can be found here
<!-- See Examples: * [Get daily data](https://scotthosking.com/notebooks/ghcn_daily/) * [Get monthly data](https://scotthosking.com/notebooks/ghcn_monthly/) -->Installation
<!-- 1. **Install the package via [PyPi](https://pypi.org/project/get-station-data/)**: which tends to be the most user-friendly option: ```bash pip install get-station-data ``` -->- Install from the source code:
- Clone the repository source code:
git clone https://github.com/scotthosking/get-station-data.git
- Install along with its dependencies:
cd /path/to/my/get-station-data
pip install -v -e .
Worked through example
from get_station_data import ghcnd
from get_station_data.util import nearest_stn
%matplotlib inline
Read station metadata
stn_md = ghcnd.get_stn_metadata()
Choose a location (lon/lat) and number of nearest neighbours
london_lon_lat = -0.1278, 51.5074
my_stns = nearest_stn(stn_md,
london_lon_lat[0], london_lon_lat[1],
n_neighbours=5 )
my_stns
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>station</th>
<th>lat</th>
<th>lon</th>
<th>elev</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<th>52113</th>
<td>UKE00105915</td>
<td>51.5608</td>
<td>0.1789</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
<tr>
<th>52165</th>
<td>UKM00003772</td>
<td>51.4780</td>
<td>-0.4610</td>
<td>25.3</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>52098</th>
<td>UKE00105900</td>
<td>51.8067</td>
<td>0.3581</td>
<td>128.0</td>
<td>ROTHAMSTED</td>
</tr>
<tr>
<th>52191</th>
<td>UKW00035054</td>
<td>51.2833</td>
<td>0.4000</td>
<td>91.1</td>
<td>WEST MALLING</td>
</tr>
<tr>
<th>52131</th>
<td>UKE00107650</td>
<td>51.4789</td>
<td>0.4489</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
</tbody>
</table>
</div>
Download and extract data into a pandas DataFrame
df = ghcnd.get_data(my_stns)
df.head()
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>station</th>
<th>year</th>
<th>month</th>
<th>day</th>
<th>element</th>
<th>value</th>
<th>mflag</th>
<th>qflag</th>
<th>sflag</th>
<th>date</th>
<th>lon</th>
<th>lat</th>
<th>elev</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>UKE00105915</td>
<td>1959</td>
<td>12</td>
<td>1</td>
<td>TMAX</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>1959-12-01</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
<tr>
<th>1</th>
<td>UKE00105915</td>
<td>1959</td>
<td>12</td>
<td>2</td>
<td>TMAX</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>1959-12-02</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
<tr>
<th>2</th>
<td>UKE00105915</td>
<td>1959</td>
<td>12</td>
<td>3</td>
<td>TMAX</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>1959-12-03</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
<tr>
<th>3</th>
<td>UKE00105915</td>
<td>1959</td>
<td>12</td>
<td>4</td>
<td>TMAX</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>1959-12-04</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
<tr>
<th>4</th>
<td>UKE00105915</td>
<td>1959</td>
<td>12</td>
<td>5</td>
<td>TMAX</td>
<td>NaN</td>
<td></td>
<td></td>
<td></td>
<td>1959-12-05</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
</tbody>
</table>
</div>
Filter data for, e.g., a single variable
var = 'PRCP' # precipitation
df = df[ df['element'] == var ]
### Tidy up columns
df = df.rename(index=str, columns={"value": var})
df = df.drop(['element'], axis=1)
df.head()
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>station</th>
<th>year</th>
<th>month</th>
<th>day</th>
<th>PRCP</th>
<th>mflag</th>
<th>qflag</th>
<th>sflag</th>
<th>date</th>
<th>lon</th>
<th>lat</th>
<th>elev</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<th>93</th>
<td>UKE00105915</td>
<td>1960</td>
<td>1</td>
<td>1</td>
<td>2.5</td>
<td></td>
<td></td>
<td>E</td>
<td>1960-01-01</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
<tr>
<th>94</th>
<td>UKE00105915</td>
<td>1960</td>
<td>1</td>
<td>2</td>
<td>1.5</td>
<td></td>
<td></td>
<td>E</td>
<td>1960-01-02</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
<tr>
<th>95</th>
<td>UKE00105915</td>
<td>1960</td>
<td>1</td>
<td>3</td>
<td>1.0</td>
<td></td>
<td></td>
<td>E</td>
<td>1960-01-03</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
<tr>
<th>96</th>
<td>UKE00105915</td>
<td>1960</td>
<td>1</td>
<td>4</td>
<td>0.8</td>
<td></td>
<td></td>
<td>E</td>
<td>1960-01-04</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
<tr>
<th>97</th>
<td>UKE00105915</td>
<td>1960</td>
<td>1</td>
<td>5</td>
<td>0.0</td>
<td></td>
<td></td>
<td>E</td>
<td>1960-01-05</td>
<td>0.1789</td>
<td>51.5608</td>
<td>137.0</td>
<td>HAMPSTEAD</td>
</tr>
</tbody>
</table>
</div>
df.drop(columns=['mflag','qflag','sflag']).tail(n=10)
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>station</th>
<th>year</th>
<th>month</th>
<th>day</th>
<th>PRCP</th>
<th>date</th>
<th>lon</th>
<th>lat</th>
<th>elev</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<th>83938</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>22</td>
<td>0.0</td>
<td>2016-12-22</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>83939</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>23</td>
<td>1.4</td>
<td>2016-12-23</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>83940</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>24</td>
<td>0.0</td>
<td>2016-12-24</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>83941</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>25</td>
<td>1.0</td>
<td>2016-12-25</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>83942</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>26</td>
<td>0.0</td>
<td>2016-12-26</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>83943</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>27</td>
<td>0.0</td>
<td>2016-12-27</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>83944</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>28</td>
<td>0.2</td>
<td>2016-12-28</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>83945</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>29</td>
<td>0.4</td>
<td>2016-12-29</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>83946</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>30</td>
<td>0.0</td>
<td>2016-12-30</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
<tr>
<th>83947</th>
<td>UKE00107650</td>
<td>2016</td>
<td>12</td>
<td>31</td>
<td>0.4</td>
<td>2016-12-31</td>
<td>0.4489</td>
<td>51.4789</td>
<td>25.0</td>
<td>HEATHROW</td>
</tr>
</tbody>
</table>
</div>
Save to file
df.to_csv('London_5stns_GHCN-D.csv', index=False)
Plot histogram of all data
df['PRCP'].plot.hist(bins=40)
<matplotlib.axes._subplots.AxesSubplot at 0x11ae36898>
Plot time series for one station
heathrow = df[ df['name'] == 'HEATHROW' ]
heathrow['PRCP'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x81f0d7240>