So I was rereading the thread and I don't know why I didn't reply to this then, but what was this? I have no idea what the code block in the middle is supposed to be aside from out of context nonsense, but is this supposed to be suggesting that hardware threading isn't provided by windows? That at least is provably false.
I have 25+ years of writing internals software. It is obvious you don't know what a scheduler is. (hint it does't matter what the language is)
Kiss my ass and learn to use SoftIce (old school) or windbg.
_asm cli
(NTCREATEFILE)(SYSTEMSERVICE(ZwCreateFile))=NewNtCreateFile;
_asm sti
Code:
NTSTATUS NewNtCreateFile(
PHANDLE FileHandle,
ACCESS_MASK DesiredAccess,
POBJECT_ATTRIBUTES ObjectAttributes,
PIO_STATUS_BLOCK IoStatusBlock,
PLARGE_INTEGER AllocationSize OPTIONAL,
ULONG FileAttributes,
ULONG ShareAccess,
ULONG CreateDisposition,
ULONG CreateOptions,
PVOID EaBuffer OPTIONAL,
ULONG EaLength)
{
NTSTATUS rc;
// you can do whatever you want here
return rc;
}
Get bent.
Yes i'm being an asshole.
Simple C# test to demonstrate the benefits of multithreading;
C#:
static void Main(string[] args)
{
const int sumCount = 1_000_000; // Number of times to add to the sum
const int tryCount = 10; // Number of tries, just to even out performance
const int threadCount = 16; // Number of threads for the threaded test
// If you aren't on a ryzen use a lower count to get more representative results
decimal sum = 0; // Our sum, decimal so that it can contain the result
object syncRoot = new object(); // Used to synchronize the sum
Stopwatch timer = new Stopwatch();
Action testFunc = () => // It's lazy, but it's easier to write the test apparatus with lambdas instead of a for-purpose class, doesn't matter though, they compile to the same thing
{
decimal localSum = 0;
for (int i = 0; i < sumCount; i++) // Sum locally to avoid contention
{
localSum += i;
}
lock (syncRoot)
sum += localSum; // Add the local sum under a lock. Happens once, allowing the vast majority of work to happen freely
};
Action resetTest = () =>
{
sum = 0;
timer.Reset();
};
SemaphoreSlim semaphore = new SemaphoreSlim(0, threadCount);
ThreadStart threadedTestFunc = () =>
{
semaphore.Wait();
testFunc();
};
long syncTime = 0, asyncTime = 0;
for (int j = 0; j < tryCount; j++) // Run our tests
{
{
timer.Start(); // Run our test synchronously
for (int t = 0; t < threadCount; t++)
testFunc();
timer.Stop();
syncTime += timer.ElapsedMilliseconds;
}
resetTest();
{
Thread[] threads = new Thread[threadCount];
for (int t = 0; t < threadCount; t++) // Set up our async test
{
threads[t] = new Thread(threadedTestFunc);
threads[t].Start(); // Run it, it'll wait for the semaphore
}
timer.Start(); // Start recording before releasing
semaphore.Release(threadCount);
for (int t = 0; t < threadCount; t++)
threads[t].Join(); // Wait for results before finishing our recording
timer.Stop();
asyncTime += timer.ElapsedMilliseconds;
}
resetTest();
}
syncTime /= tryCount; // Normalize by test count
asyncTime /= tryCount;
Console.WriteLine($"Synchronous test took {syncTime}ms");
Console.WriteLine($"Asynchronous test with 16 threads took {asyncTime}ms");
Console.WriteLine($"Performance ratio was {(float)syncTime/(float)asyncTime} with {threadCount} threads");
}
The results were as follows;
Synchronous test took 412ms
Asynchronous test with 16 threads took 34ms
Performance ratio was 12.117647 with 16 threads
Keep in mind that 8 of those 16 threads aren't hardware threads per say, rather they just have resource sharing and fast context switching, so 12 times faster completion is actually better than expected.
I have no idea what this argument about the scheduler is, being that it doesn't change the fact that the task was finished faster by more threads anyway, and in a fashion roughly proportional to the number of threads. I mean, note that the program is actually designed to avoid contention and so it actually runs concurrently, if the argument is that you don't get a performance benefit when everything has to be accessed under mutex, or through streams then yeah, no shit.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
In general, it seems like there's a lot of fear directed towards concurrent programming, but processors are getting more cores, not faster, so everyone's going to have to adapt at some point. It's not that bad once you get your feet on the ground. Main principle is either to protect things by limiting their access, or failing that by mutex. Where applicable you want to leverage atomics, but those should usually be used in well tested generic algorithms, not developed for-purpose unless it's really justified or trivial.
On the topic of schedulers, my current project is a scheduler of sorts, meant to order and execute jobs asynchronously. The main principles of it are waypoints and syncpoints. Waypoints order execution, first they are ordered against each other by contracts such as
before x
,
during x
, or
after x
, and then jobs are ordered against waypoints by similar contracts. Syncpoints determine when it is safe to execute a scheduled job, with each job having it's own set of syncpoints which are checked against the syncpoints of running jobs for conflict. This has the benefit of eliminating the risk of deadlock, while allowing unbound parallel performance, and allowing a scalable means of ordering tasks.
The immediate intended application of this is a game engine with an entity based scheme. As an example, a particular ai job might be scheduled to occur during the ai-main waypoint, and would specify as syncpoints that it needs readonly access to component data, navigation, and transforms, and that it needs full access to a subset of the ai system. So then ai jobs affecting unentangled types of ai would be free to run concurrently, and without having to state an explicit ordering. The systems themselves need to be concurrency safe of course, but the jobs using them are free to eschew locks in favor of contracts. If that sounds too difficult, specifying no syncpoints would cause the job to run synchronously, and then at a later point a more specific set could be specified to ease restrictions.