Programming thread

I assumed he was using Python, I only know the surface level memes and thought it was one of those 2d anime dating sims, not some 3d Unity game until looking it up just now.
Here's someone on Stack Overflow who apparently tested the same in 2022 and the result is that it's equivalent to if/elif.

I come late to the party, but I feel like I can add something useful to this thread.
A while back, I published an article on Medium about Python's match-case performance, analyzing the generated byte code and performing benchmarks and comparisons. If you want, you can read the article here. I think it's worth your time.
However, I'm going to summarize it here.


Many programming languages implement their switch-case statements as if they were an if-else sequence. Sometimes the switch-case can be optimized into an efficient lookup table, but that's not always the case.


In Python, this optimization is never performed, thus always resulting in a series of condition checks.


From the article, a speed comparison between if-else and match-case:
Code:
Average time for match_case:  0.00424 seconds
Average time for if_else:     0.00413 seconds


I come late to the party, but I feel like I can add something useful to this thread.
A while back, I published an article on Medium about Python's match-case performance, analyzing the generated byte code and performing benchmarks and comparisons. If you want, you can read the article here. I think it's worth your time.
However, I'm going to summarize it here.
Many programming languages implement their switch-case statements as if they were an if-else sequence. Sometimes the switch-case can be optimized into an efficient lookup table, but that's not always the case.
In Python, this optimization is never performed, thus always resulting in a series of condition checks.
From the article, a speed comparison between if-else and match-case:
Average time for match_case: 0.00424 seconds
Average time for if_else: 0.00413 seconds

As you can see, they are almost equal.
Plus, if you disassembled the two statements, you would find that they generate nearly identical byte code.
Just a difference to point out, if-else checks call the objects' __eq__() method while match-case does the comparisons internally.
Lastly, here's a benchmark that also includes a hash table (dictionary):

Code:
Average time for match_case:  0.00438 seconds
Average time for if_else:     0.00433 seconds
Average time for dict_lookup: 0.00083 seconds

So, if you're keen on performance, you should use hash tables. Although match-case is similar to a C-style switch-case, it's not: match-case is meant to be used for structural pattern matching, and not for replacing performant hash and lookup tables.
 
  • Like
Reactions: ADHD Mate
Trying to understand the question as best I can here: C-style switches are O(1) because they're optimizable to a jump-table right? Because the predicate cases in a C-switch are constant numerical comparisons.

I don't know shit about Python internals, but it appears that Python switches are in fact constant comparisons, so it is probably optimizable to a jump-table. The wrench comes into play with guard clauses: optional validations that determine whether or not execution follows down that path. To me, naively, this would lead me to believe that the broad-phase is O(1) to determine which switch branches are valid, then worst-case O(N) to determine from the remaining branches which to follow.

As I was writing and researching for this, I see that someone else already posted that match case is not optimized in Python, which is rather unfortunate. It really should be optimizable to a jump-table followed by a series of branches which no-op and fall-through if the branch doesn't execute. I guess the pattern matching aspect must induce some overhead I'm not accounting for?



I was recently writing some code that really cemented the Lisp "data is code, code is data" mantra to me, and it consistently blows my mind how much better of a programmer I feel like today compared to even just last year, or three months ago. I haven't bothered to work on that language idea I was toying with - not sure if it's something I want to put the time into for a toy - but researching about different languages, S-expressions, and writing compilers has really opened my eyes and pushed my programming knowledge over the edge. I can see why the meme arose that if you can't write your own compiler, you're not a real programmer.

The code in question was a home-made test framework for Lua. I wanted to write tests in a declarative style, but I didn't want to impose on the test-writer to have to pass around a context variable for state. I thought I was at an impasse until I realized I could bind a shared closure (Lua upvalue) to represent state and at that moment it was like my third eye opened and no programming task was insurmountable. Being able to bind state to the function itself instead of lugging it around as a parameter feels insanely powerful in order to write really slick APIs, while still not having to go the lengths to become Object-Oriented. It might not sound like much, but this was the secret sauce that made me realize that any code as a series of function calls can just be conceptualized in the same way as data, since ultimately all functions are just a set of inputs and outputs, and with that step I feel like I've fully crossed over into understanding FP.

Code is data. Data is code.
 
Trying to understand the question as best I can here: C-style switches are O(1) because they're optimizable to a jump-table right? Because the predicate cases in a C-switch are constant numerical comparisons.
C-style switches can be optimized to a jump table, or they can just end up as plain old if/elseif/elseif... chains. Depends on what your compiler decides, and these days the compiler is quite smart.
A switch with cases for every number 0..255 would most probably end up as a table, one with just two cases for 13 and 432984 probably wouldn't.
 
Leaving aside YandereDev jokes, would you say that this is accurate?

The hypothetical is between if/elif (in Python) checking something like x = 1000000 by starting with if x == 0: and ending in 1000000, so worst case scenario, compared to the same process but with a match case.

As always, if someone decides to answer, thanks in advance.

I'll never cease to laugh at the people who think YandereDev's performance issue was because he had to many if/else statements.

Yeah bro, I'm sure the compiler is too retarded to know how to optimize those many case statements.
Shit like this is why they require you to take Compiler Design course at some universities, so there would be less amount of retarded people spewing nonsense.
 
It's usually good you just have to gaslight it sometimes to get a good response, chatgpt is the best tool imo for learning anything if you know how to prompt it
How can you gaslight it if you are learning? It seems like it would require you to already know what you want to learn.
LLMs should be only used as semantic search to reference actual on topic literature.
There is real danger in subtle errors that are presented confidently.
 
I'll never cease to laugh at the people who think YandereDev's performance issue was because he had to many if/else statements.

Yeah bro, I'm sure the compiler is too retarded to know how to optimize those many case statements.
The thing is, it wouldn't matter even if he had the world's dumbest compiler that assembled everything exactly as written, it wouldn't cause enough delay for a human player to notice. Particularly for people of a certain age, it's just not intuitive that computing power is effectively infinite for many purposes. YandereDev had to go above and beyond to get performance issues, like recalculating the pathfinding for the entire universe every frame, or letting the frame rate go up to a trillion FPS on static screens.

EDIT: and how could I forget, using massive store-bought 3D models designed for photorealistic closeups on tiny objects. Of course that tennis ball needs every bit of fuzz modelled!
 
Last edited:
I've been having a blast with Google Gemma 27B coding for the last few days. I'm running it on kobold.cpp, and started out using SillyTavern. With a rtx3090 and a 3080 I was able to run a single 16k context chat that let me develop the whole crud layer and liqibase scripts.

Then I moved onto idea.continue and started using that when the project couldn't fit into a 16k context. If you only have 24GB VRAM maybe start out this route first, a 8k context is good enough and can run with less memory.

My favorite stuff is using it to make unit/integration tests, and SQL queries.

It doesn't get it right all the time, but it gets it right enough that a good dev can recognize the little stuff and fix it. Then after you tell it the mistake, it seems not to make it as much anymore.
 
Last edited:
I'll never cease to laugh at the people who think YandereDev's performance issue was because he had to many if/else statements.

Yeah bro, I'm sure the compiler is too retarded to know how to optimize those many case statements.
Shit like this is why they require you to take Compiler Design course at some universities, so there would be less amount of retarded people spewing nonsense.
I know, this video usually gets recommended in response to that:
https://www.youtube.com/watch?v=LleJbZ3FOPU
 
Leaving aside YandereDev jokes, would you say that this is accurate?

The hypothetical is between if/elif (in Python) checking something like x = 1000000 by starting with if x == 0: and ending in 1000000, so worst case scenario, compared to the same process but with a match case.

View attachment 6766732

As always, if someone decides to answer, thanks in advance.
Why not just return number % 2 == 0; ?
 
Why not just return number % 2 == 0; ?
Modulo has an implicit division which is a very slow instruction

If specifically detecting evenness, better to just mask for the 1 bit
'number & 1' would return whether its odd or not

So you could invert that if you needed an evenness function
!(number&1)
 
Modulo has an implicit division which is a very slow instruction

If specifically detecting evenness, better to just mask for the 1 bit
'number & 1' would return whether its odd or not

So you could invert that if you needed an evenness function
!(number&1)
I just feel like the obvious answer to the problem is usually the best.

Its basic math. Even numbers divided by two, do not have a remainder. Its like elementary school math dude. Im dying on this hill, doing it any other way is fucking stupid.
 
Modulo has an implicit division which is a very slow instruction

If specifically detecting evenness, better to just mask for the 1 bit
'number & 1' would return whether its odd or not

So you could invert that if you needed an evenness function
!(number&1)
Simple, concise, and well reasoned. Kudos.

Just for an autistic level of clarity, C# doesn't do truthy/falsy ints, so it would be return number & 1 == 0;

I just feel like the obvious answer to the problem is usually the best.

Its basic math. Even numbers divided by two, do not have a remainder. Its like elementary school math dude. Im dying on this hill, doing it any other way is fucking stupid.
You're already wrapping the boolean expression as a predicate function. Why not just leave a quick comment above your return statement that explains the bit mask?

Using divisibility rules to speed up calculations isn't stupid whatsoever. It's intuitive and efficient. That's why the prime factor theorem/FTA underpins a lot of modern cryptography such as RSA.
 
That's why the prime factor theorem/FTA underpins a lot of modern cryptography such as RSA.
I'm just gonna concede in this case because I am not familiar with some of the shit you guys are saying so you probably know more.

Im still pretty new at this. Only been coooding for a year so far.
 
  • Like
Reactions: UERISIMILITUDO
I'm just gonna concede in this case because I am not familiar with some of the shit you guys are saying so you probably know more.

Im still pretty new at this. Only been coooding for a year so far.
Unironically play TIS-100, you'll get a crash course on how these operations work. Division is simple to you and me, but a computer is treating it as repeated subtraction and counting the steps until it hits a threshold. That's going to have a cycle cost potentially massively in excess of a simple comparison after a bitwise and.
 
I just feel like the obvious answer to the problem is usually the best.

Its basic math. Even numbers divided by two, do not have a remainder. Its like elementary school math dude. Im dying on this hill, doing it any other way is fucking stupid.
I'm just gonna concede in this case because I am not familiar with some of the shit you guys are saying so you probably know more.

Im still pretty new at this. Only been coooding for a year so far.
I think they're talking about bitwise operators.

So odd numbers in binary will end in 1, and even ones will end in 0. Bitwise operators (from my understanding) perform the (and, or, xor, etc) calculations based on the binary, like in the next example for AND:

bitwise.png

So if you were to do the calculation with 1 instead (instead of 3 in the example), would be with ....001, so AND will only give a 1 if the least significant bit of the other is also 1, and 0, if the other is 0.


So for:
print(6 & 1)
print(7 & 1)
print(8 & 1)
print(9 & 1)
print(10 & 1)


You'd get:
0
1
0
1
0



Which I believe is argued here to be faster (or requires less) in order to determine if a number is odd/even, because you're not dividing anything.
 
Why not just return number % 2 == 0; ?

Modulo has an implicit division which is a very slow instruction

If specifically detecting evenness, better to just mask for the 1 bit
'number & 1' would return whether its odd or not

So you could invert that if you needed an evenness function
!(number&1)
All C compilers will optimize foo % 2 to foo & 1. I'd be surprised if Python doesn't do the same.
Same with foo /= 2 to foo >>= 1.
 
Using divisibility rules to speed up calculations isn't stupid whatsoever. It's intuitive and efficient. That's why the prime factor theorem/FTA underpins a lot of modern cryptography such as RSA.
All C compilers will optimize foo % 2 to foo & 1. I'd be surprised if Python doesn't do the same.
Same with foo /= 2 to foo >>= 1.
I figured it would be obvious for a compiler to optimize number % {{ 2^n }} to number & {{ 2^n - 1 }}, but then I started wondering if there could be conditions where that doesn't happen.
C:
//// mod2.c

int is_even(int x) {
    return x % 2 == 0;
}
C:
//// and1.c

int is_even(int x) {
    return (x & 1) == 0;
}
I used the two C source files above with the Clang compiler (Ubuntu clang version 19.1.6), supplying various options for the latter:
  1. -c: compile to object file
  2. -c -O0: compile to object file with no optimization
  3. -c -g: compile to object file with debug info
  4. -c -g -Og: compile to object file with debug info and debug optimization
  5. -c -O1: compile to object file with optimization level 1
  6. -c -O2: compile to object file with optimization level 2
  7. -c -O3: compile to object file with optimization level 3
The commands with more than zero optimization specified gave the same output:
Code:
0: 89 f8 ---- movl %edi, %eax
2: f7 d0 ---- notl %eax
4: 83 e0 01 - andl $0x1, %eax
7: c3 ------- retq
It inverts the input value, ANDs it with 0x1, and returns the result - pretty straightforward stuff.

The commands with zero or unspecified optimization gave different output:
Code:
//// and1.c
00: 55 -------- pushq %rbp
01: 48 89 e5 --- movq %rsp, %rbp
04: 89 7d fc --- movl %edi, -0x4(%rbp)
07: 8b 45 fc --- movl -0x4(%rbp), %eax
//// above is the same, below is different
0a: 83 e0 01 --- andl $0x1, %eax
0d: 83 f8 00 --- cmpl $0x0, %eax
10: 0f 94 c0 --- sete %al
13: 24 01 ------ andb $0x1, %al
15: 0f b6 c0 - movzbl %al, %eax
18: 5d --------- popq %rbp
19: c3 --------- retq

//// mod2.c
00: 55 -------------- pushq %rbp
01: 48 89 e5 --------- movq %rsp, %rbp
04: 89 7d fc --------- movl %edi, -0x4(%rbp)
07: 8b 45 fc --------- movl -0x4(%rbp), %eax
//// above is the same, below is different
0a: b9 02 00 00 00 --- movl $0x2, %ecx
0f: 99 --------------- cltd
10: f7 f9 ----------- idivl %ecx
12: 83 fa 00 --------- cmpl $0x0, %edx
15: 0f 94 c0 --------- sete %al
18: 24 01 ------------ andb $0x1, %al
1a: 0f b6 c0 ------- movzbl %al, %eax
1d: 5d --------------- popq %rbp
1e: c3 --------------- retq
These basically do exactly what the C source code says it should do: and1.c ANDs the input with 0x1, while mod2.c divides the input by 2 and checks the remainder.

Honestly, this was pretty obvious in hindsight.

Moral of the story: if you're not specifying some optimization-level flags, even -Og when you're debugging, then you're a nigger. -Og enables some compiler passes that collect debug information, which are normally disabled under -O0 (the default), so you're just nigging yourself, really.
 
Leaving aside YandereDev jokes, would you say that this is accurate?

The hypothetical is between if/elif (in Python) checking something like x = 1000000 by starting with if x == 0: and ending in 1000000, so worst case scenario, compared to the same process but with a match case.

View attachment 6766732

As always, if someone decides to answer, thanks in advance.
Even on a high-end consumer CPU, a switch statement with 4 million cases is likely going to exceed a processor's L1 cache and thus it will incur a noticeable performance hit simply due to it having to move shit in and out of the instruction cache.

This is probably not an issue in python because there's way bigger things weighing down performance but I could see it maybe being a problem in C#.
 
  • Agree
Reactions: UERISIMILITUDO
Back