Programming thread

What would make you think c++for loop unrolling is different than c loop unrolling?
They’re not other than stack definition. It all gets boiled down to the same operations. I was more-so asking what techniques I can use for parallel program optimization in C on a large very expensive and noisy system.
 
Last edited:
  • Thunk-Provoking
Reactions: UERISIMILITUDO
They’re not other than stack definition. It all gets boiled down to the same operations. I was more-so asking what techniques I can use for parallel program optimization in C on a large very expensive and noisy system.
1. Optimize your algorithm because you shouldn't be optimizing nitty gritty shit until that's as good as you can get it.

2. Its usually about memory access optimizations but that can differ between architectures. This means understanding DMA and caching on your platform. At this point parallelizing your work is breaking it into sub problems that are correctly sized so that you are minimizing the cost of transferring memory. You will need to figure out how to appropriately chunk your problem and you should understand loop skewing. It also means understanding your cache layout if that's applicable. It may be faster to do redundant computations in some work because memory transfer is slow as fuck (even on chip caches).

3. If you are computationally bound you need to understand your architectures parallel operations. Many operations can be performed simultaneously. Division is usually slow avoid it. Multiplication, bit shifts, bit ops, and add subs are usually fast.

You need to understand your architecture though. Basically everything I said can be useful or totally irrelevant depending on what it looks like.
 
I don't think most open source projects receive any donations.

But some open-source developers offer paid consulting or support services related to the open-source software they develop, to companies that use the software (see e.g. SQlite).

There are also dual-licensed projects like the Qt software library, where on the one hand it's available for free under the GPL license, but on the other hand you can also buy a commercial license if you're a company that can't or doesn't want to abide by the restrictions of the GPL.

All of the above only really works for open-source projects that are importantenough for companies to want to pay for them.
For the average Joe open source developer, the only common way to "monetize" one's niche open-source work is to present it as a "portfolio" to prospective employers to help you get hired for closed-source work.
My idea was something simple. It was mostly gonna be a general tool kit that people can use and I could add to it over time. Maybe have a way to get feedback and see what people want me to add to it and have a way for people to donate.

And then overtime just make a few script files that people can plug into any project and it just does common things but shortened so it can make your code simple. For example, I can make an object that has a bunch of commonly used regular expressions or an object with commonly used functions. Or an object that formats numbers.

Then I can write documentation on how to use these tools.

It would basically just be a quality of life thing.

I know stuff like that probably exists already, but if I can make something that people actually use and monetize it somehow that would be pretty cool.
 
  • Thunk-Provoking
Reactions: y a t s
1. Optimize your algorithm because you shouldn't be optimizing nitty gritty shit until that's as good as you can get it.

2. Its usually about memory access optimizations but that can differ between architectures. This means understanding DMA and caching on your platform. At this point parallelizing your work is breaking it into sub problems that are correctly sized so that you are minimizing the cost of transferring memory. You will need to figure out how to appropriately chunk your problem and you should understand loop skewing. It also means understanding your cache layout if that's applicable. It may be faster to do redundant computations in some work because memory transfer is slow as fuck (even on chip caches).

3. If you are computationally bound you need to understand your architectures parallel operations. Many operations can be performed simultaneously. Division is usually slow avoid it. Multiplication, bit shifts, bit ops, and add subs are usually fast.

You need to understand your architecture though. Basically everything I said can be useful or totally irrelevant depending on what it looks like.
The architecture should be x86 on enterprise Linux if I remember correctly, I’m supposed to be doing HPC kernels for GPGPUs and FPGAs. I’ve never worked directly with hardware outside of memory.
 
What does one need to know to write C optimization running on a super computer. Asking for a friend. I only ever learned about loop unrolling on higher level C++.
While I am not too sure about the architecture in question, here is some general optimization advice. First, as has been said, optimize at the highest level first; a poorly chosen algorithm written as efficiently as possible will still under-preform an appropriate algorithm written poorly. Second, memory access and especially syscalls are leagues and bounds more expensive than arithmetic, so you will want to fully utilize the cache by packing structs as best you can, generally by sorting the members in the struct definition by size descending.
 
What does one need to know to write C optimization running on a super computer. Asking for a friend. I only ever learned about loop unrolling on higher level C++.
What kind of of super computer are we talking about? The little ones like an IBM zOS mainframe or something else? My advice is to see if they have a recently maintained proprietary math library for that platform, and use it instead of the opensource version. Pretty much do as much as you can of that. That's really the only thing that comes to mind, and that's only going to help you on nonstandard cpu architectures though.
 
Last edited:
  • Like
Reactions: UERISIMILITUDO
What kind of of super computer are we talking about? The little ones like an IBM zOS mainframe or something else? My advice is to see if they have a recently maintained proprietary math library for that platform, and use it instead of the opensource version. Pretty much do as much as you can of that. That's really the only thing that comes to mind, and that's only going to help you on nonstandard cpu architectures though.
Like 12 rack rows and 2 tape storage rows. I’m unsure as to what it actually looks like working with it, that’s a good idea though they have a bunch of documentation.
 
  • Like
Reactions: UERISIMILITUDO
One cool thing I didn't realize about C is that you can just load dlls directly and start executing them, it's useful because I'm writing a bootstrapper and it can just compile the c code into a dll and immediately start executing it's function pointers, which I'm a bit of a C noob so I didn't have any experience with dlls because I mostly learned from programming Java. I read Some Were Meant for C (PDF) and there's an example of doing instrumentation of C code inside C code (just can't instrument yourself or it's an infinite regression) and it got me thinking about this meta level programming that's doable in C, which is really neat.
 
My idea was something simple. It was mostly gonna be a general tool kit that people can use and I could add to it over time. Maybe have a way to get feedback and see what people want me to add to it and have a way for people to donate.

And then overtime just make a few script files that people can plug into any project and it just does common things but shortened so it can make your code simple. For example, I can make an object that has a bunch of commonly used regular expressions or an object with commonly used functions. Or an object that formats numbers.

Then I can write documentation on how to use these tools.

It would basically just be a quality of life thing.

I know stuff like that probably exists already, but if I can make something that people actually use and monetize it somehow that would be pretty cool.
Just start packaging bosnian shitcoin miners with your software packages, and watch as the money rolls in
 
But some open-source developers offer paid consulting or support services related to the open-source software they develop, to companies that use the software
And some fucking assholes first release features open source, see what features people actually use from their software, then lock them behind an insanely expensive $2,000 enterprise license like Gravitee did. This sort of bait and switch is annoying
 
And some fucking assholes first release features open source, see what features people actually use from their software, then lock them behind an insanely expensive $2,000 enterprise license like Gravitee did. This sort of bait and switch is annoying
Its odd because its not even necessary

Things like mattermost are doing entirely fine selling enterprise licenses even though they are fully open source. If you provide even small convenience features then most businesses will just pay you because its the culture and understanding that normally as a business you pay for the enterprise version and not the free poor people thing
 
Last edited:
Leaving aside YandereDev jokes, would you say that this is accurate?

The hypothetical is between if/elif (in Python) checking something like x = 1000000 by starting with if x == 0: and ending in 1000000, so worst case scenario, compared to the same process but with a match case.

matchcase.png


As always, if someone decides to answer, thanks in advance.
 
The time it takes python to parse the switch case in its interpreter would be more than the time it would take a language compiled machine code to check all million switch cases even without compiler optimizations. Usually it's best in Python to use a list comprehension in cases where you want to iterate a large number of times because the iteration can be done in a single batch call inside the interpretor.
 
Leaving aside YandereDev jokes, would you say that this is accurate?

The hypothetical is between if/elif (in Python) checking something like x = 1000000 by starting with if x == 0: and ending in 1000000, so worst case scenario, compared to the same process but with a match case.

View attachment 6766732

As always, if someone decides to answer, thanks in advance.
Is this chat-gpt response?
It feels intuitively wrong as pattern matching is fancier and more strict syntax for if/else anyway, so there should not be meaningful difference between if/else chains vs it in performance. Especially without optimizing compiler.
From the answer it seems like it's a response about C style switch, where indeed could be optimized to jump table and be O(1).

This screenshot rubs me wrong way.
 
  • Agree
Reactions: Concentrate Juice
Thanks for the replies.
Is this chat-gpt response?
It feels intuitively wrong as pattern matching is fancier and more strict syntax for if/else anyway, so there should not be meaningful difference between if/else chains vs it in performance. Especially without optimizing compiler.
From the answer it seems like it's a response about C style switch, where indeed could be optimized to jump table and be O(1).

This screenshot rubs me wrong way.
Yes, it's from ChatGPT.

The question was about Python, specifically about using it with simple integers:
x = 1000000

if x == 0:
print("whatever")
elif x == 1:
print("whatever")
......
elif x == 1000000:
print("Yay!")


# As opposed to:
match x:
case 0:
print("whatever")
case 1:
print("whatever")
......
case 1000000:
print("Yay!")


(Didn't use the code tag here so the indentations went away, but you get the idea)
 
  • Like
Reactions: ADHD Mate
Match case is more scalable so yanderedev would benefit from switching over. I think it's technically faster too but it's negligible unless your code is terrible. Using timeit, I checked x=1000 for match case and if/elif and match case tended to do better but within the margin of error.
 
Last edited:
  • Like
Reactions: UERISIMILITUDO
I don't know about Python, but YandereDev is writing in C#, where the compiler will most certainly do the MSIL equivalent of a jump-table optimization.
Most of his nonsense just wastes human time (as he chases bugs through his labyrinth of spaghetti code) as opposed to computer time.
 
Back