C++ screw yous: Definition order matters
So I was finishing up an implementation, getting ready to close a ticket, when I decided to use another commit to update old comments, and generally making the code look prettier, and one of the things I noticed was that a class declared several string constants in a manner such as this:
/* myclass.hxx */
class MyClass
{
public:
static const std::string string_before;
static const std::string string_after;
};
Which then on the .cxx
file they were defined pretty straightforward along
with other stuff (a map, but for simplicity I’m using a list here) that used
those strings, generating a number of constant static objects:
/* myclass.cxx */
#include "myclass.hxx"
//Definition of static strings declared in the class
const std::string MyClass::string_before = "Hello";
const std::string MyClass::string_after = "World!";
//Constant structure that uses
//previously declared strings
static const std::list<std::string> some_list ={
MyClass::string_before,
MyClass::string_after,
};
Then somewhere down the line, a piece of code would iterate over that last struct, not unlike this test code below:
/* myclass.cpp */
void print_list()
{
for(const std::string &s : some_list)
{
std::cout << "-> " << s << std::endl;
}
}
The output of calling print_list()
is the expected:
-> Hello
-> World!
For some reason, it made sense to keep the string definitions close to other definitions, so, considering the declaration of these constants was on the header file, moving a definition around shouldn’t change the program, right?
Wrong
After changing the definition from the original:
/* myclass.cxx */
#include "myclass.hxx"
//Definition of static strings declared in the class
const std::string MyClass::string_before = "Hello";
const std::string MyClass::string_after = "World!";
//Constant structure that uses
//previously declared strings
static const std::list<std::string> some_list ={
MyClass::string_before,
MyClass::string_after,
};
To this:
/* myclass.cxx */
#include "myclass.hxx"
//Constant structure that uses
//previously declared strings
static const std::list<std::string> some_list ={
MyClass::string_before,
MyClass::string_after,
};
//Definition of static strings declared in the class
const std::string MyClass::string_before = "Hello";
const std::string MyClass::string_after = "World!";
The program started crashing due to keys not found or duplicated keys, all in a commit that consisted only of cosmetic changes and documentation. Somehow, changing the order of definition (but keeping the same declaration order) managed to change the program.
If we ran the print_list()
function again, we would notice a different output:
->
->
Yup, two empty strings, and to add insult to injury, trying to print the (supposedly) same strings directly from the function, by adding
std::cout << "-> " << MyClass::string_before << std::endl;
std::cout << "-> " << MyClass::string_after << std::endl;
Yields the odd output:
->
->
-> Hello
-> World!
Which means MyClass::string_before
and MyClass::string_after
have two
very different values depending on where they were defined and where they were
used. Taking it one step further to check if it was indeed the definition order
what was causing it, defining the strings on each side of the list:
/* myclass.cxx */
#include "myclass.hxx"
const std::string MyClass::string_before = "Hello ";
static const std::list<std::string> some_list ={
MyClass::string_before,
MyClass::string_after,
};
const std::string MyClass::string_after = "World!";
Now yields the output:
-> Hello
->
OK, but why?
Static memory gets initialized at startup to all-zero, that makes all the defined static objects’ members zero. The behaviour of these objects is not guaranteed at this point.
Then, each static object is initialized in the order they were defined (defined in section 3.6.2 of the standard) not declared.
In the case of all strings declared before the list, both strings are valid by the time the list needs initializing however, when they are defined after the list, the list constructor creates a copy of an invalid string object, which by chance is a “valid” empty string. Oddly enough the compiler doesn’t find anything objectionable with usage-after-instantiate.
Why not an issue in C?
Strings in C are pointers, so doesn’t even matter when initialization happens, and literal constants are generally done with macros for a number of reasons, so these are not affected either.
In any case, I’m adding this to the collection of “C++ screw yous” which I guess I’m starting right this moment.