Programming thread

I wouldn't say multicore processing in a single process is collectively "settled" as a good idea, as a necessary feature. Most people go "hey, yeah, more cores is more better, right?"

I think people misunderstand what it would really bring to the table that multiple processes wouldn't do easier.

The only reason it's 100% necessary is if you absolutely have to read and write data between threads/processes as fast as possible. Like, I'd want to hear an explanation for why ordinary IPC won't work before I grudgingly admit that genuine multiprocessing is necessary.

Otherwise I'd just tell you to run $cores processes and call it a day.

Promise based systems are more about how you conceptualize multiple tasks at once, rather than being a performance thing.

Also I really hate threaded code. At least situations where you're working directly with mutexes for anything more complicated than a single threadsafe queue.

I still have nightmares hunting down obscure, impossible-to-trigger deadlocks in pthreads code.

Yeah, promises are nicer than ordinary threaded code. That's all I interpret them as being; a nice way to conceptualise a worker thread.

Given they are designed around being decoupled from your main work unit, it seems like a perfect situation to utilise multiple cores. And yet, in Python we can't, because unless you're only using them for I/O wait it's still going to be constrained by the GIL.

If there's a problem with Python and web development, it'd be that the overall byte interpreter is slow. Not the GIL. (But I doubt it.)

I wouldn't say the GIL is "slow", but it is indeed limiting. The interpreter is just another reason why I don't consider it as a backbone for heavy lifting.

Well, there's php-fpm. But I hate php anyway.

php-fpm can spin up new PHP processes in advance of handling a CGI request. php-fpm cannot let you persist process memory and database handles between requests. You're still going to need to bootstrap the application every time.

There are experimental PHP event loops in the works that can handle a request-response lifecycle (e.g. ReactPHP), but they are a large departure from the typical PHP backend.
 
The performance is good
The Python is evil

Real talk though, why would anybody use a toy language like python for anything but one time scripts? :smug:
Using a GIL is like tying your legs together; it doesn't matter if you were fast, you still aren't going to be running anywhere
 
Given they are designed around being decoupled from your main work unit, it seems like a perfect situation to utilise multiple cores. And yet, in Python we can't, because unless you're only using them for I/O wait it's still going to be constrained by the GIL.
Yeah, I just see them as a way to organize code around a traditional event loop.

Build a graph of promises and hook them up to the event loop.

In fact, the single threaded aspect is a huge boon in many situations because it guarantees atomicity when you're in a promise handler. You can fuck around with variables and you know with substantial certainty how things will turn out.
Using a GIL is like tying your legs together; it doesn't matter if you were fast, you still aren't going to be running anywhere
Multiprocessing in a single process is of dubious value in most situations. The complexity it adds to code easily wipes away any benefits you'd get from better performance. You really need to justify why it's necessary, not vice versa.

Unix processes and IPC is a much, much simpler model and is probably 99% of the performance you need.
 
Multiprocessing in a single process is of dubious value in most situations. The complexity it adds to code easily wipes away any benefits you'd get from better performance. You really need to justify why it's necessary, not vice versa.

Suppose you need to apply a procedure across a large set of data --processing a mesh, or loading an image for example-- throw that shit on another thread and continue with your main procedure, asking for the results back once it's absolutely necessary. Boom. If you were smart about it, you just got a huge performance boost for no overhead. Python can't do that. It can pretend to do that, but no matter how many libraries you add, it is just fundamentally incapable.

Alternatively you can have multiple threads doing the same thing but with different parameters --handling multiple web requests for example-- and have them share some resources. Having a single process is going to make resource sharing much simpler.
 
Suppose you need to apply a procedure across a large set of data --processing a mesh, or loading an image for example-- throw that shit on another thread and continue with your main procedure, asking for the results back once it's absolutely necessary. Boom. If you were smart about it, you just got a huge performance boost for no overhead. Python can't do that. It can pretend to do that, but no matter how many libraries you add, it is just fundamentally incapable.
Processing a mesh is computationally intensive and requires a second core. However my argument is that a second process and IPC is almost as efficient as a single process, except in the most bizarre, strenuous circumstances, where you should probably be using C anyway.

And processes+IPC don't require you to worry about deadlocks to the same extent as a thread would. Deadlocks are such a serious issue (because threaded code is not nearly as deterministic as single threaded code) that you'd wipe out any convenience you'd get from using a single process the first time you have to debug a deadlock.

Loading an image is an IO operation which works with a single threaded event system.
Alternatively you can have multiple threads doing the same thing but with different parameters --handling multiple web requests for example-- and have them share some resources. Having a single process is going to make resource sharing much simpler.
Multiple web requests are mostly IO driven. If a given operation is CPU bound, then like I said, use a second process.
 
Maria has usurped MySQL as far as FOSS is concerned and is often excluded from distro bundles now Oracle sealed it as a corporate product. A quick glance at the GitHub repositories show 105 PRs vs. 2, 4 days since last commit compared to 3 months. Thus, MySQL development is pretty much dead. Bug #11472 is 13 years old as of a month ago.

Keep in mind people who say they are using MySQL may actually be using Maria without even realising it. I just installed mariadb to my machine to make sure this was true; it creates /usr/bin/mysql and a /usr/bin/mariadb which is a symlink to the former. Go figure.
 
It still seems like serious overkill to me. Why would you use a second process when you could just use a thread?

Because the thread(s) is/are blocking on that processor waiting for the timeslice, while the other thread on the other processor is free to run independent of that action.

If we are talking multicore here.

If you are talking single core, then there is an advantage to using a second process as it will get a higher priority in the timeslice. Nothing that I would write home about though.
 
Last edited:
Because the thread(s) is/are blocking on that processor waiting for the timeslice, while the other thread on the other processor is free to run independent of that action.

If we are talking multicore here.

That's only if we're talking about a crippled programming language like python's. In a different language those threads could potentially be running on multiple cores, giving you the same effect as putting the tasks in different processes, but with a bunch of advantages, including reduced overhead and easier resource sharing.
 
It still seems like serious overkill to me. Why would you use a second process when you could just use a thread?
Because threaded code is hard to write and easy to fuck up. It looks superficially easy, but as your design grows, if you have to coordinate multiple locks, you will fuck things up some day. Heh, and a given piece of code is never as easy to write as you think it is when you first start the project. Shit will get more complicated.

A threading fuckup is a magnitude more annoying to debug and deal with.

This is because threads+locks form a network of dependencies where the clashes might be in the design (ie I have a threaded queue piping data into this one thread, and it's waiting for another threaded queue, and somewhere down the line, they both rely on some global lock, like over a database handle), but because thread scheduling isn't deterministic, the graph doesn't always get locked up.

You might not even know where all the locks in your code are. They could be in client libraries you're using.

So you'll have code running smoothly for a few months, and then in some weird scheduling conditions, the threads all try to grab the same locks in a different order, your process stops, everything shits itself, and no one knows why. You restart it, everything's fine... for 2 days, then it crashes twice within an hour, and then goes on for another few days. And you can't replicate it locally with any sort of consistency. Ghost bugs.

The unpredictability of threading is the problem. And more specifically, it jizzes that unpredictability all over your code's face, instead of keeping it safely elsewhere, like in the kernel, where big corporations sponsor dipshits to handle it.

Whereas with a single thread (and just scaling up using OS processes), you keep that unpredictability out of your process. You can know, for absolute certain, that your code is locking up only in one single place. And you can debug that.

When you've got a single, stable unit, you can scale up, at least one instance of your app for each core. For most web app responses, each request shouldn't take that long. Even if you're reading an image from the hard drive or manipulating some json (<

It's really something you have to experience firsthand to really get a taste for how obnoxious it is. Years ago, I tried to implement a game engine using threads. Conceptually it was neat, in that I could just fire off a thread to handle each event. However the handlers had to coordinate access to the object graph and various objects. When it worked, it worked fine (a bit jittery though) but when it didn't, it was like a three stooges slapfight over resources.

But this is just me complaining about directly writing low level threaded code dealing with locks and threads directly. There's plenty of much nicer multiprocessing models that could use threads under the hood. If you use a higher level library, that's definitely workable.

Hell, you could write libraries that appear like threading, but use processes in the background. (It'd be difficult, but possible.) Ultimately it's not about the actual implementation, but the design of the library. Threading libraries suck.
Because the thread(s) is/are blocking on that processor waiting for the timeslice, while the other thread on the other processor is free to run independent of that action.

If we are talking multicore here.

If you are talking single core, then there is an advantage to using a second process as it will get a higher priority in the timeslice. Nothing that I would write home about though.
I'm pretty sure time slices are implemented more or less the same in the kernel, whether for threads or for processes. At least in Linux.
That's only if we're talking about a crippled programming language like python's. In a different language those threads could potentially be running on multiple cores, giving you the same effect as putting the tasks in different processes, but with a bunch of advantages, including reduced overhead and easier resource sharing.
The resource sharing is specifically the problem.

Edit: What's even worse are when people enable timeouts on their locks. In which case, you won't get deadlocks that lock up the system, you'll just get reduced performance over time, as your threads fight for the same resource, lock up, the lock attempt expires and fails, and it tries again and eventually gets the resource.

Ultimately these quarrels over resources do need to get resolved somewhere, but I think it's best to keep it as far away from your application logic as possible.

Double edit: shit like this: https://www.logicbig.com/tutorials/core-java-tutorial/java-multi-threading/thread-deadlock.html

Just spread out in a rats nest of complicated production code.
 
Last edited:
That's only if we're talking about a crippled programming language like python's. In a different language those threads could potentially be running on multiple cores, giving you the same effect as putting the tasks in different processes, but with a bunch of advantages, including reduced overhead and easier resource sharing.

I have 25+ years of writing internals software. It is obvious you don't know what a scheduler is. (hint it does't matter what the language is)

Kiss my ass and learn to use SoftIce (old school) or windbg.

_asm cli
(NTCREATEFILE)(SYSTEMSERVICE(ZwCreateFile))=NewNtCreateFile;
_asm sti

Code:
NTSTATUS NewNtCreateFile(
                    PHANDLE FileHandle,
                    ACCESS_MASK DesiredAccess,
                    POBJECT_ATTRIBUTES ObjectAttributes,
                    PIO_STATUS_BLOCK IoStatusBlock,
                    PLARGE_INTEGER AllocationSize OPTIONAL,
                    ULONG FileAttributes,
                    ULONG ShareAccess,
                    ULONG CreateDisposition,
                    ULONG CreateOptions,
                    PVOID EaBuffer OPTIONAL,
                    ULONG EaLength)
{
      
 NTSTATUS rc;

    
// you can do whatever you want here
 
        
       
        return rc;
}
Get bent.


Yes i'm being an asshole.
 
Last edited:
Everyone's too smart here.

Also this
Screenshot_20180706-082950.png
 
After a few days of working with Golang, it's not bad. I could see myself using it instead of Python for even small scripts. Coming from C++ as a compiled language of choice having standard libraries for stuff like JSON and HTTP is pretty good.

The compiler is weirdly specific about formatting though, like it forces you to use their specific if else style.
Code:
if { ...
} else {
...
}
 
Probably because, although they'll never admit it, it's Google's internal jerk off project for Google's internal jerk off uses and they don't give a shit about other users, so enforcing their company wide coding style seems like a good idea.
 
Probably because, although they'll never admit it, it's Google's internal jerk off project for Google's internal jerk off uses and they don't give a shit about other users, so enforcing their company wide coding style seems like a good idea.
At least Docker's written in Go. That's a pretty prominent piece of software.
 
Hugo, the static site generator I used on https://resetera.kiwifarms.net/ was also written in Golang. Hugo is pretty great, by the way.

The new ReeEra site is all standard library Golang minus httprouter. Null gave me a single core VPS with 512MB of RAM so I had to really consider performance in order to compete with the previous static site content. I think I achieved this by generating cache data on a background thread controlled by timers and serving from this.

I'm not really a webdev, how is this really different from using something like memcached? Wouldn't this be faster since it's already deserialized and in statically-typed memory?
 
Hugo, the static site generator I used on https://resetera.kiwifarms.net/ was also written in Golang. Hugo is pretty great, by the way.

The new ReeEra site is all standard library Golang minus httprouter. Null gave me a single core VPS with 512MB of RAM so I had to really consider performance in order to compete with the previous static site content. I think I achieved this by generating cache data on a background thread controlled by timers and serving from this.

I'm not really a webdev, how is this really different from using something like memcached? Wouldn't this be faster since it's already deserialized and in statically-typed memory?
Nice.

Also, no, it's not different from memcached (or redis). Memcached/redis just provide a standard interface that various pieces of software can share and optimize.
 
Back