It is often convenient and efficient to write raw data to files. Converting numbers to strings is expensive and increases file size several fold. In scientific computing, where the data is often only a few types of very large arrays, raw data is not that hard to convert into forms that enable data processing, and short descriptors can be written once to aid the conversion. However when you have more complex data, like when writing a time series with a variety of types, this shifts a lot of effort to the data analysis side. It also makes software updates dangerous because of the required consistency between logging and analysis.
This has come up several times for me when working on embedded projects, and I don’t have a great solution but I can share what I’ve settled on in C/C++. Please comment with suggestions if you have ‘killed’ this problem.
I have a microcontroller wired up to a number of sensors running at rates between 500Hz and 1Hz and I’m logging the information to a SD card. Writes are batched with a double buffering system such that they occur less often than the worst case SD card latency (~250ms). Each buffer has a series of structs prefixed by a typecode.
The first pain point here is the typecode. Given a list of packet definitions, how do you enforce that all the typecodes are unique, so that the most impaired contributor won’t mess anything up?
With python, you can do something like the following:
import inspect
class packet_A:
def __init__(a):
self.a = a
class packet_B:
def __init__(a, b):
self.a = a
self.b = b
# After the definitions, register typecodes
def filter(name):
return "packet" in name
class_list = [obj for obj in globals().values() if
inspect.isclass(obj) and
obj.__module__ == __name__ and
filter(obj.__name__)]
counter = 1
for cls in class_list:
cls.typecode = counter
counter += 1
When anyone adds a packet definition the packet will get its own unique typecode. Modifying class properties at runtime can make debugging harder, but a search for ‘typecode’ will reveal this file and hopefully make it clear what is going wrong.
Until reflection comes to C++, this style of thing is off the table. One can do something similar with macros and the compiler extension __counter__, but it’s unclear enough that doing things by hand is better. I just make sure that the pattern is visible in header file where packets are defined:
struct packet_rtc {
struct timer_time t;
RTC_TimeTypeDef sTime;
RTC_DateTypeDef sDate;
};
struct packet_vbatt {
struct timer_time t;
uint16_t vbatt_cnts;
};
struct packet_imu {
struct timer_time t;
uint16_t readings_cnts[4];
};
__inline__ uint8_t typecode(struct packet_rtc p) {
return 1;
};
__inline__ uint8_t typecode(struct packet_vbatt) {
return 2;
}
__inline__ uint8_t typecode(struct packet_imu) {
return 3;
}
Now that that is ‘solved’, is there an automated way to find the structure of these packets (member offsets and sizes)? I’ve been fighting with this for a bit and playing with different C++ reflection libraries; ideally all the work would be done at compile time, as the information is all available then. In the end I settled on automatically generating the descriptor generating file using a python c++ header parsing library:
import cxxheaderparser.simple
header_text = open("Core/Inc/packet.hpp","r").read().replace("__inline__", "")
data = cxxheaderparser.simple.parse_string(header_text)
work_string = ""
for class_type in data.namespace.classes:
class_name = class_type.class_decl.typename.segments[0].name
to_format_string = "%s %d"
format_args = f'"{class_name}", sizeof({class_name})'
for field in class_type.fields:
if isinstance(field.type, cxxheaderparser.types.Array):
element_type = field.type.array_of.typename.segments[0].name
element_num = field.type.size.tokens[0].value
element_name = field.name
to_format_string += " %s %d"
format_args += f', "{element_type} {element_name}[{element_num}]", offsetof({class_name},{element_name})'
elif isinstance(field.type, cxxheaderparser.types.Type):
element_type = field.type.typename.segments[0].name
element_name = field.name
to_format_string += " %s %d"
format_args += f', "{element_type} {element_name}", offsetof({class_name},{element_name})'
to_format_string += "\\n\\r"
work_string += f' cx = snprintf(buff, sizeof(buff), "{to_format_string}", {format_args});\n'
work_string += f' CDC_Transmit_FS((uint8_t*)buff, cx);\n\n'
h_file_text = """
void data_description();
"""
c_file_text = ("""#include "datadescriptor.hpp"
#include <cstddef>
#include <cstdio>
#include "usbd_cdc_if.h"
#include "packet.hpp"
void data_description() {
char buff[200];
int cx;
""" + work_string +
"}\n")
with open("Core/Src/datadescriptor.cpp", "w") as f:
f.write(c_file_text)
with open("Core/Inc/datadescriptor.hpp", "w") as f:
f.write(h_file_text)
Then the function data_descriptor can be called on the embedded system to generate a string description with member types, names and offsets, which can be written at the beginning of the log and parsed to aid in data analysis.
I don’t find this a very satisfying solution, but it is better than writing out each string by hand for every packet type.