Read Data from a Vaisala WXT536#


Vaisala_WXT536

Overview#

The Vaisala WXT536 is an multi-parameter weather sensor that measures atmospheric temperatures, pressure, humdity, wind speed and direction, and precipitation.

High quality atmospheric observations allow for understanding of the environmental conditions at the location of the instrument, and allow us to estimate transport of gas and particles through the atmosphere.

In combination with additional sensors attached to a CROCUS Level 3 Node, we are then able to evaluate the concentration of pollutants at the node, the transport of these pollutants into and out of the area of the node, and estimate removal of these pollutants from the atmosphere from preciptiation.

Prerequisites#

All WXT-536s are installed into a Waggle Edge Computing server as apart of CROCUS’s Level 1 Nodes.

The Waggle Edge Computing server allows for real-time modification of sensors parameters and dynamical modification of data through edge processing codes.

Concepts

Importance

Notes

Intro to Waggle Sensors

Helpful

Familiarity with Waggle Sensors

Waggle/Sage Data Client

Necessary

Accessing Data

Working with Pandas Dataframes

Necessary

Understanding Data Object

  • Time to Learn: 60 minutes

Interact with WXT-536 Data#

We will be accessing data from a CROCUS Level 3 Node (W08D), which is located at Northeastern Illinois University, which has the same WXT sensor as the one deployed at the Argonne Testbed for Multiscale Observational Science (ATMOS), shown below.

CROCUS Level 1 Node (W057)

Imports#

We need some core packages to work with the data.

import sage_data_client
import matplotlib.pyplot as plt
import pandas as pd

from matplotlib.dates import DateFormatter

Searching for the Data#

Please check the Waggle Node W08D User Interface to determine if the vaisala-wxt536 plugin is working

The vaisala-wxt536 plugin allows for the Waggle Sensor to interpret data sent from the instrument to the Waggle Node and transit this data to cloud storage for investigation.

To Query data from a Waggle Sensor, we will need:

  1. Start of time period we are interested in (YYYY-MM-DDTHH:MM:SSZ format)

  2. End of time period we are interested in (YYYY-MM-DDTHH:MM:SSZ format)

  3. Waggle Node Number (e.g. W08D)

  4. Sensor Name (e.g. ‘vaisala-wxt536’)

# Query the SAGE data client to access the data
# Note: We need to know the time period we are interested in
df = sage_data_client.query(start="2023-07-02T12:00:00Z",
                            end="2023-07-02T13:00:00Z", 
                            filter={
                                "vsn": "W08D",
                                "sensor": "vaisala-wxt536"
                            }
)
df = df.set_index("timestamp")
# Detemine the number of rows within the dataset
df.index
DatetimeIndex(['2023-07-02 12:00:00.029156535+00:00',
               '2023-07-02 12:00:00.107252119+00:00',
               '2023-07-02 12:00:00.206164123+00:00',
               '2023-07-02 12:00:00.283311181+00:00',
               '2023-07-02 12:00:00.364271449+00:00',
               '2023-07-02 12:00:00.442495483+00:00',
               '2023-07-02 12:00:00.525232643+00:00',
               '2023-07-02 12:00:00.610763159+00:00',
               '2023-07-02 12:00:00.697874627+00:00',
               '2023-07-02 12:00:00.778449770+00:00',
               ...
               '2023-07-02 12:59:59.200688231+00:00',
               '2023-07-02 12:59:59.280641431+00:00',
               '2023-07-02 12:59:59.361137582+00:00',
               '2023-07-02 12:59:59.441958059+00:00',
               '2023-07-02 12:59:59.520813355+00:00',
               '2023-07-02 12:59:59.600882716+00:00',
               '2023-07-02 12:59:59.684669667+00:00',
               '2023-07-02 12:59:59.760763484+00:00',
               '2023-07-02 12:59:59.841247667+00:00',
               '2023-07-02 12:59:59.920791677+00:00'],
              dtype='datetime64[ns, UTC]', name='timestamp', length=350864, freq=None)
# Determine number of columns within dataset
df.columns
Index(['name', 'value', 'meta.host', 'meta.job', 'meta.missing', 'meta.node',
       'meta.plugin', 'meta.sensor', 'meta.task', 'meta.units', 'meta.vsn',
       'meta.zone'],
      dtype='object')
# Determine the number of unique measurements within 'name' column
print(set(df['name']))
{'wxt.env.humidity', 'wxt.env.temp', 'wxt.rain.accumulation', 'wxt.heater.volt', 'wxt.wind.direction', 'wxt.wind.speed', 'wxt.heater.temp', 'wxt.env.pressure'}

Danger

Data will only be returned for valid dates. Otherwise, dataframe object will contain nothing

# Query the SAGE data client to access the data
# Note: We need to know the time period we are interested in
no_data_df = sage_data_client.query(start="2099-04-10T12:00:00Z",
                                    end="2099-04-10T15:00:00Z", 
                                    filter={
                                        "vsn": "W08D",
                                        "sensor": "vaisala-wxt536"
                                    }
)
no_data_df
timestamp name value

Investigate the Dataset#

To investigate the dataset, it is helpful to know how to access specific rows or columns within the data

We will then display the ambient temperature of the dataset, in coordination with wind speed and direction

# isolate the environmental temperature data
df.loc[df['name'] == 'wxt.env.temp']
name value meta.host meta.job meta.missing meta.node meta.plugin meta.sensor meta.task meta.units meta.vsn meta.zone
timestamp
2023-07-02 12:00:00.029156535+00:00 wxt.env.temp 20.6 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
2023-07-02 12:00:00.107252119+00:00 wxt.env.temp 20.6 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
2023-07-02 12:00:00.206164123+00:00 wxt.env.temp 20.6 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
2023-07-02 12:00:00.283311181+00:00 wxt.env.temp 20.6 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
2023-07-02 12:00:00.364271449+00:00 wxt.env.temp 20.6 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
... ... ... ... ... ... ... ... ... ... ... ... ...
2023-07-02 12:59:59.600882716+00:00 wxt.env.temp 20.3 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
2023-07-02 12:59:59.684669667+00:00 wxt.env.temp 20.3 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
2023-07-02 12:59:59.760763484+00:00 wxt.env.temp 20.3 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
2023-07-02 12:59:59.841247667+00:00 wxt.env.temp 20.3 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
2023-07-02 12:59:59.920791677+00:00 wxt.env.temp 20.3 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core

43858 rows × 12 columns

Plot the Temperature#

Let’s start by plotting the temperature. This is stored as the wxt.env.temp variable, with units of degrees Celsius. We use Pandas directly here, calling .plot() with some extra parameters.

fig, axs = plt.subplots(figsize=(12, 4))

# Define Format of how the date is displayed
date_form = DateFormatter("%Y-%m-%d \n %H:%M:%S")

# Note: Use the Pandas DataFrame 'loc' functionality to search the object for the ambient temp
df.loc[df['name'] == 'wxt.env.temp'].value.plot(
                                   xlabel='UTC Time \n [HH:MM:SS]',
                                   ylabel='Ambient Temperature \n [Degree Celsius]',
                                   title='CROCUS Level 3 Node (W08D) - Vaisala WXT536',
                                   ax=axs
                                  )

axs.xaxis.set_major_formatter(date_form)
axs.grid(True)
../../_images/bcf638301e7be094c565234828dfeab1210db0ff48401e815e4f82d4a1be8708.png

Plot the Rainfall Accumulation#

One of the variables in the dataset is wxt.rain.accumulation. This is a total of the rainfall since the instrument was first configured. We can use this to determine the event total using an additional calculation. The total rainfall can be calculated by subtracting the total by the first value of the event period.

We start by subsetting our accumulation variable.

accumulation = df.loc[df['name'] == 'wxt.rain.accumulation']

After subsetting our field, we calculate the event total, as described in the previous section.

single_event_total = accumulation.value - accumulation.value.values[0]

Now we can plot the results! Adding in proper units and labels.

fig, axs = plt.subplots(figsize=(12, 4))

# Define Format of how the date is displayed
date_form = DateFormatter("%Y-%m-%d \n %H:%M:%S")

# Note: Since we already subset the data, we use our processed single_event_total variable
single_event_total.plot(xlabel='UTC Time \n [HH:MM:SS]',
                        ylabel='Precipitation Accumulation \n [mm]',
                        title='CROCUS Level 3 Node (W08D) - Vaisala WXT536',
                        ax=axs
                       )

axs.xaxis.set_major_formatter(date_form)
axs.grid(True)
../../_images/acdff4ffc802fd068498fc0e21ec122dcd41552525ffef5d51477214b072fedb.png

Plot the Wind Speed/Wind Direction#

Wind speed and direction are important meteorological variables. We can plot two variables on a single plot, with different y-axes, allowing us to see how wind speed and direction vary across time.

fig, axs = plt.subplots(figsize=(12, 4))

# Define Format of how the date is displayed
date_form = DateFormatter("%Y-%m-%d \n %H:%M:%S")

# Note: Use the Pandas DataFrame 'loc' functionality to search the object for the wind speed
df.loc[df['name'] == 'wxt.wind.speed'].value.plot(
                                           xlabel='UTC Time \n [HH:MM:SS]',
                                           ylabel='Wind Speed \n [meters per second]',
                                           title='CROCUS Level 3 Node (W08D) - Vaisala WXT536',
                                           ax=axs,
                                           color='C0'
                                          )
# Set the left hand y-axis label and color to match variable displayed
axs.set_ylabel('Wind Speed \n [meters per second]', 
               color='tab:blue',
               )
# Change left hand y-axis tick colors to match variable displayed
axs.tick_params(axis='y', labelcolor='tab:blue')

ax2 = axs.twinx()  # instantiate a second axes that shares the same x-axis

# Note: Use the Pandas DataFrame 'loc' functionality to search the object for the wind direction
df.loc[df['name'] == 'wxt.wind.direction'].value.plot(
                                                xlabel='UTC Time \n [HH:MM:SS]',
                                                title='CROCUS Level 3 Node (W08D) - Vaisala WXT536',
                                                ax=ax2,
                                                color='C1'
                                               )
ax2.set_ylabel('Wind Direction \n [Degrees from North]', 
               color='tab:orange',
               )
ax2.tick_params(axis='y', labelcolor='tab:orange')

axs.xaxis.set_major_formatter(date_form)
axs.grid(True)
../../_images/bf3daad6ca976ae8f6c597333040e13df987c483dbba475549f81bf0341e3bda.png

Display Diurnal Temperature Cycle#

Let’s Grab all temperature data available to visualize the daily cycle

Info

If you are only interested in a specific measurement, you can also query specific variables using the ‘name’ filter

# Query the SAGE data client to access the data; use 'name' filter

df_temp = sage_data_client.query(start="2023-07-02T00:00:00Z",
                                 end="2023-07-03T00:00:00Z", 
                                    filter={
                                        "name" : 'wxt.env.temp',
                                        "vsn": "W08D",
                                        "sensor": "vaisala-wxt536"
                                    }
)
df_temp
timestamp name value meta.host meta.job meta.missing meta.node meta.plugin meta.sensor meta.task meta.units meta.vsn meta.zone
0 2023-07-02 00:00:00.045553673+00:00 wxt.env.temp 23.1 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
1 2023-07-02 00:00:00.139277863+00:00 wxt.env.temp 23.1 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
2 2023-07-02 00:00:00.247003531+00:00 wxt.env.temp 23.1 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
3 2023-07-02 00:00:00.333327344+00:00 wxt.env.temp 23.1 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
4 2023-07-02 00:00:00.413374638+00:00 wxt.env.temp 23.1 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
... ... ... ... ... ... ... ... ... ... ... ... ... ...
1052501 2023-07-02 23:59:59.631203637+00:00 wxt.env.temp 17.9 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
1052502 2023-07-02 23:59:59.713241159+00:00 wxt.env.temp 17.9 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
1052503 2023-07-02 23:59:59.790870013+00:00 wxt.env.temp 17.9 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
1052504 2023-07-02 23:59:59.872975312+00:00 wxt.env.temp 17.9 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core
1052505 2023-07-02 23:59:59.951280016+00:00 wxt.env.temp 17.9 000048b02d3ae277.ws-nxcore waggle-wxt536 -9999.9 000048b02d3ae277 registry.sagecontinuum.org/jrobrien/waggle-wxt... vaisala-wxt536 waggle-wxt536 degree Celsius W08D core

1052506 rows × 13 columns

fig, axs = plt.subplots(figsize=(12, 4))

# Define Format of how the date is displayed
date_form = DateFormatter("%Y-%m-%d \n %H:%M:%S")

# Note: Use the Pandas DataFrame 'loc' functionality to search the object for the ambient temp
df_temp.set_index('timestamp').value.plot(xlabel='UTC Time \n [HH:MM:SS]',
                                          ylabel='Ambient Temperature \n [Degree Celsius]',
                                          title='CROCUS Level 3 Node (W08D) - Vaisala WXT536',
                                          ax=axs
                                          )

axs.xaxis.set_major_formatter(date_form)
axs.grid(True)
../../_images/1f771df992028a7b27981a98f5902fa549530948e57b09fbe15e7e09d32074f4.png

Save Your Selected Dataset#

Warning

Be careful! Saving data in a comma separated values (CSV) file format takes up a lot of space!

# Instead of saving every value explored previously, I am just going to save the variable I am interested in (i.e. temperature)
# Comma Separated Values (CSV) files are common text files that can be read with any text editor (or Microsoft Excel / Mac Numbers)
df_temp.to_csv('wxt536_W08D_ambient_temp.csv')
# Let's check the size of that file using simple unix commands
!ls -lh 'wxt536_W08D_ambient_temp.csv'
-rw-r--r-- 1 runner docker 242M Dec 27 20:37 wxt536_W08D_ambient_temp.csv
  • 242M MB is a large file for just one environmental parameter.

  • It would be much larger if we saved all the observations for an hour (or day, or week, or month, etc)

Info

Pandas DataFrames support multiple file formats for saving data, including binary Parquet files. This allows us to compress the data and save space

df_temp.to_parquet('wxt536_W08D_ambient_temp.parquet',
                   use_deprecated_int96_timestamps=True)
# Let's check the size of the parquet file
!ls -lh 'wxt536_W08D_ambient_temp.parquet'
-rw-r--r-- 1 runner docker 7.3M Dec 27 20:37 wxt536_W08D_ambient_temp.parquet
  • Only 7.2MB saving the same data as this format!

  • Less than 4% of the space taken up by the same data in CSV format!