I'm looking for some thoughts on a structure I'm implementing. On the surface it's pretty simple, it's two pointers, one points to "implementation" which is analogous to a vtable, the other points to "instance" which is an object to apply the implementation to. This part isn't particularly complicated, the challenge is that I want reads and writes to it to be "atomic", such that an incomplete value will never be read. The basic issue is detailed below;
Code:
struct Interface
{
void* implementation,* instance;
};
Interface val;
// Thread 0
val = {A, A};
// Thread 1
val = {B, B};
// Thread 2
auto myCopy = val;
myCopy must always be either {A, A} or {B, B} but never a mix of A and B, as a mix of values could cause implementation to be used on the wrong type, and instance isn't guaranteed to be a type with a vtable or other identifying feature that could be used to ensure that implementation is appropriate for it, so a runtime check isn't possible. Also I'd prefer to avoid locking entirely as this object is meant to be a primitive.
My current safe implementation looks something like this;
Code:
struct Interface
{
void* implementation,* instance;
atomic<int> writeLock, readCount;
// Load a copy atomically
Interface LoadSafe();
// Store a copy atomically
void StoreSafe(Interface&);
};
// Load works by optimistically reading the values, and then checking whether they were being written before saving them
// I'm not expecting anyone to rely on these guarantees for their behaviour, so I expect contention to be low in practice
// Hence the optimistic loading
Interface Interface::LoadSafe()
{
const int triesTillYield = 3, triesTillWait = 6;
int tryCount = 0;
Interface copy; // the result
bool failed; // whether the current result is invalid
do
{
if (tryCount > triesTillYield) // Try to avoid heavy contention with some waiting
{
std::this_thread::yield(); // Not sure if this is a good idea, might be more efficient to just go straight to a spin
if (tryCount > triesTillWait)
while (this->writeLock.load()) {} // Wait for contention to clear in extreme circumstances (probably won't happen in the wild)
}
this->readCount.fetch_add(1); // Enter read lock (Might need a fence? I think the acquire on the writeLock should suffice)
copy.implementation = this->implementation; // optimistic read
copy.instance = this->instance;
failed= this->writeLock.load(memory_order::memory_order_acquire) != 0; // check if value was being written while it was being read
this->readCount.fetch_sub(1); // Exit read lock
tryCount++;
} while (failed); // If value was written while it was read, discard result and try again
return copy;
}
void Interface::StoreSafe(Interface& other) // Other is assumed to be a safe copy for simplicity
{
while (this->writeLock.exchange(1, memory_order::memory_order_acquire)) {} // Mutual exclusion of writers
this->implementation = other.implementation;
this->instance = other.instance;
atomic_thread_fence(memory_order::memory_order_release); // Ensure changes are posted
while (value.readCount.load()) {} // Wait for concurrent readers to receive change
this->writeLock.store(0, memory_order::memory_order_release); // release write lock
}
// Once a copy of the interface has been taken it can be used regularly without locking
I've tested this, and over the course of 40 million load/stores (80 mil total ops) concurrently between 8 threads (ryzen cpu) I got no invalid values from the reads. Of course this is no guarantee that I wouldn't get any invalid reads if I did 1 billion ops or used all 16 threads, but I don't have all century, and that still wouldn't be absolute proof. The testing results are interesting, on my cpu both the naïve broken version with default assignment and the safe version take the same amount of time to complete, so I'm not as worried about the performance impact as I was originally.
What I'm wondering is if there is a simpler way of doing this. The only constraints are that a load must never produce a incomplete value, and that when multiple stores are performed at least one store must succeed. It seems like it shouldn't be too hard to just assign two values, but here we are.
Edit; I did a little more profiling; I ran a test with 14 threads doing random stores and loads to a shared variable (50/50 chance of doing a store or load) 1 million times each, and 650 bad reads occurred in the naïve solution, while the safe solution produced 0 bad reads. The naïve solution took about 55 seconds to complete, while the safe solution clocked in at 50 seconds. Kinda strange that the more complex function operates slightly faster. Also of note, no thread had to spin on the main read loop more than 18 times while loading. I'm surprised I'm getting such good results.