Programming thread

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
1712322963138.png
 

Is there a mathematical apparatus for this?​

I feel like there should be something, on the level of "vector algebra", that I just don't know exists.

I have a (python + celery) worker program that processes tasks from a (rabbitMQ) queue. Some tasks come from more important, better-paying customers and therefore have "higher priority" than others. A naive "always do the highest-priority task" prioritization would see no tasks from smaller customers getting completed until the end of the day.

What I currently do is have several queues and more workers assigned to just the high-priority customers. This is bad, because if the high-priority queues are empty, these workers then sit idle and eat virtual machine money.

I have several "high-level" (aka stupid) ideas how all of this should work but all of them involve "human" thinking, like looking at the big picture and saying "well duh". But a worker doesn't have access to big picture "well duh" statistics and I'd rather not make a centralized load balancer.

There are probably simple algorithms that can make a worker decide which task to take, like
"every queue has a task point cost, each task executed from a queue subtracts its cost from its point reserve and gives a point to each other queue".
But all I can find is scientific articles on load balancing that say "our method is 0.17% better than the method from that other article". I want something like this (this is the page that got me into programming, I had to win a demo of a text adventure and didn't feel like manually adding 5-digit edge costs in a 25x25 8-direction grid).
 
Last edited:

Is there a mathematical apparatus for this?​

I feel like there should be something, on the level of "vector algebra", that I just don't know exists.

I have a (python + celery) worker program that processes tasks from a (rabbitMQ) queue. Some tasks come from more important, better-paying customers and therefore have "higher priority" than others. A naive "always do the highest-priority task" prioritization would see no tasks from smaller customers getting completed until the end of the day.

What I currently do is have several queues and more workers assigned to just the high-priority customers. This is bad, because if the high-priority queues are empty, these workers then sit idle and eat virtual machine money.

I have several "high-level" (aka stupid) ideas involving how all of this should work but all of them involve "human" thinking, like looking at the big picture and saying "well duh". But a worker doesn't have access to big picture "well duh" statistics and I'd rather not make a centralized load balancer.

There are probably simple algorithms that can make a worker decide which task to take, like

But all I can find is scientific articles on load balancing that say "our method is 0.17% better than the method from that other article". I want something like this (this is the page that got me into programming, I had to win a demo of a text adventure and didn't feel like manually adding 5-digit edge costs in a 25x25 8-direction grid).
If you want something simple but effective, just use simple probability weighted to priority scores given to each job. If you pick a good ratio, you can balance things reasonably well by having the worker occasionally take lower value jobs based on some random number generated when it is looking for a new job.

This could be what you were getting at with
I feel like there should be something, on the level of "vector algebra", that I just don't know exists.
where you may be picturing some form of quantization into weighted queues.
 
Last edited:
I have a (python + celery) worker program that processes tasks from a (rabbitMQ) queue. Some tasks come from more important, better-paying customers and therefore have "higher priority" than others. A naive "always do the highest-priority task" prioritization would see no tasks from smaller customers getting completed until the end of the day.

What I currently do is have several queues and more workers assigned to just the high-priority customers. This is bad, because if the high-priority queues are empty, these workers then sit idle and eat virtual machine money.
I don't really understand the question here. You have come up with a method of assigning priorities to jobs. But then you find a reason why the priority system is producing undesirable results. By definition, this means that your low-priority jobs are higher priority than you were accounting for. I don't think you can ask someone else for an out-of-the-box solution to what sounds like it's going to depend very highly on parameters that only you can set weights to.

Also... if you have multiple workers running queued jobs, why are they pulling from separate queues? If you just create one job queue, and it's sorted in order of priority, then you won't have workers sitting idle just because "their" queue emptied out. They'll just grab the highest-priority job in the queue, even if it's lower priority. Just make sure that two workers can't try to grab the same job at the same time.

And finally, if the result (low-priority jobs never get done) is objectionable, why not just have one worker that will always pick the oldest job in the queue, instead of the job with the top priority. Or have all workers have some random chance that they'll pick the oldest job instead of the best paying job. Or, instead of having a fixed priority, have the priority of a job gradually increase the longer it's been waiting in the queue.
 
Also... if you have multiple workers running queued jobs, why are they pulling from separate queues? If you just create one job queue, and it's sorted in order of priority, then you won't have workers sitting idle just because "their" queue emptied out. They'll just grab the highest-priority job in the queue, even if it's lower priority. Just make sure that two workers can't try to grab the same job at the same time.
Because this seems like a video game (or maybe some interview problem) rather than actual code
 
  • Islamic Content
Reactions: Safir
I don't really understand the question here. You have come up with a method of assigning priorities to jobs.
I didn't, this is why I put quotation marks to indicate these are not real mathematical priorities but user-story, human-language priorities.

I don't think you can ask someone else for an out-of-the-box solution to what sounds like it's going to depend very highly on parameters that only you can set weights to.
Don't be autistic, this is a common natural-language constraint that humans have been dealing with with since before there were computers. You have an important client whose tasks need to be favored but not to the exclusion of everything else.

Also... if you have multiple workers running queued jobs, why are they pulling from separate queues? If you just create one job queue, and it's sorted in order of priority, then you won't have workers sitting idle just because "their" queue emptied out. They'll just grab the highest-priority job in the queue, even if it's lower priority. Just make sure that two workers can't try to grab the same job at the same time.
I'm new on this project. It uses celery with rabbitmq underneath (originally it was just rabbitmq and a custom class, I rewrote it to use celery because the boss told me to (he's a computer scientist). I don't have a table of tasks I can query, these are all celery tasks. I can't write a priority heap message broker. Rabbitmq doesn't have priority queues, what it calls priorities is extremely homosexual:
If priority queues are what you want, this information previously stated values between 1 and 5 are highly recommended. If you must go higher than 5, values between 1 and 10 are sufficient (keep it to a single digit number) because currently using more priorities consumes more CPU resources by using more Erlang processes.

And finally, if the result (low-priority jobs never get done) is objectionable, why not just have one worker that will always pick the oldest job in the queue, instead of the job with the top priority. Or have all workers have some random chance that they'll pick the oldest job instead of the best paying job. Or, instead of having a fixed priority, have the priority of a job gradually increase the longer it's been waiting in the queue.
It's fucking rabbit, I don't have a table of tasks I can query. It would've been easy if I had a view of all the tasks and had to write a priority function on it, I've done it a lot (orders for human dispatchers, moderation queues for media censors).
 
Last edited:
It's fucking rabbit, I don't have a table of tasks I can query. It would've been easy if I had a view of all the tasks and had to write a priority function on it, I've done it a lot (orders for human dispatchers, moderation queues for media censors).
Peek the non priority queue and check if the top has been waiting too long and process it if it has, else let it go back into the queue and pull one off the priority queue.
(looks like in rabbit this is done by saying "yes, I'll ack the message" and then not acking it so it's requeued)
also: https://www.rabbitmq.com/docs/nack
Obviously the queues need the appropriate settings so the messages are requeued and not dropped.
 
Don't be autistic, this is a common natural-language constraint that humans have been dealing with with since before there were computers. You have an important client whose tasks need to be favored but not to the exclusion of everything else.
What I was trying to get at is that it sounds like your code is working as designed, but when you're rating its performance you're using additional factors that the code itself was not designed to take into consideration. Your natural-language constraint is different from the constraint that's actually in your code, and that's your problem.

The naive "always do the highest-priority task" should work, if old tasks get increased in priority so that they won't get excluded entirely. Because that factor is important to you, but currently isn't important to your code.
 
  • Like
Reactions: y a t s

Is there a mathematical apparatus for this?​

I feel like there should be something, on the level of "vector algebra", that I just don't know exists.
If you are asking about something like Dijkstra Algorithm, I think you are basically asking about Linear Programming. These are solved as "vector algebra" as you called them but you don't have to think of it that way for most solvers.

Dijkstra Algorithm is useful for the travelling salesman problem but there are tons of other Linear Programming Problems.
If you know beforehand what the order is you can create a linear program to solve this problem.

What you could do is have decisions variables to decide what time to assign workers to a task, constraints ensuring that more workers are not assigned to tasks than you have available and that tasks take the required amount of time and an objective function to maximize with the higher priority customers given higher objective coefficient values in proportion to the priority.

If you didn't want lower priority customers to be done only at the end of a shift you could split the problem into chunks of an hour or two and solve multiple linear programming problems.

Linear Programming is NP hard but for a problem like this unless you have an absurd amount of orders to fill it should be able to find an optimal solution.

If this is what you are looking for and you need help figuring out as it can be hard to grasp especially with the mathematical notation you can ask and I'll explain. I can also point you to paid and open source LP solver libraries in Python. The paid ones are usually more performant but you have to pay for them.
 
  • Like
Reactions: Belisarius Cawl
Some tasks come from more important, better-paying customers and therefore have "higher priority" than others. A naive "always do the highest-priority task" prioritization would see no tasks from smaller customers getting completed until the end of the day.
This doesn't need math, it just needs a tool that can have multiple workers taking care of jobs. Good thing you're using Celery!

The only reason to use celery with a message broker instead of just a queue directly in memory is so you can scale and have multiple celery workers doing multiple jobs while being connected to the same broker.

So if you're using celery in the first place, get a pool of workers and assign more workers to your high priority clients, done.

EDIT: I hit the wrong key before typing it all out.

Have your high priority workers handle lower priority jobs if there's nothing for them to handle for a specific amount of time. Once a high priority task comes in again, handle those until you hit the timeout again.

Or deploy workers as needed and take them down when unused.

And boom, all your problems solved.
 
Look into scheduling, this is similar to how operating systems decide which processes to run.
Frankly, it's almost exactly the same. It's been a long time since I've read it but I remember the discussion of scheduling in Operating Systems: Three Easy Pieces being very lucid (and the book overall) and there's a free version here (plus ... you know what to do):


This may be a stretch but I'm also reminded of some of the literature from evolutionary computing. The following comes from An Introduction to Genetic Algorithms (Mitchell):

"A common selection method in GAs is fitness−proportionate selection, in which the number of times an individual is expected to reproduce is equal to its fitness divided by the average of fitnesses in the population. (This is equivalent to what biologists call 'viability selection.')

A simple method of implementing fitness−proportionate selection is 'roulette−wheel sampling' (Goldberg 1989a), which is conceptually equivalent to giving each individual a slice of a circular roulette wheel equal in area to the individual's fitness. The roulette wheel is spun, the ball comes to rest on one wedge−shaped slice, and the corresponding individual is selected. In the n = 4 example above, the roulette wheel would be spun four times; the first two spins might choose chromosomes B and D to be parents, and the second two spins might choose chromosomes B and C to be parents. (The fact that A might not be selected is just the luck of the draw. If the roulette wheel were spun many times, the average results would be closer to the expected values.)"

So basically what I'm trying to say here is that if OP can figure out how to make the really important jobs occupy x% of the slots on a virtual roulette wheel, and the less important jobs the remainder (or even split into three or more categories), OP can just "spin the wheel" and the less important jobs will have their time sooner or later while the priorities are still respected on average.
If you want something simple but effective, just use simple probability weighted to priority scores given to each job. If you pick a good ratio, you can balance things reasonably well by having the worker occasionally take lower value jobs based on some random number generated when it is looking for a new job.
Oh shi

I swear I just read this now
I didn't, this is why I put quotation marks to indicate these are not real mathematical priorities but user-story, human-language priorities.
Without splitting hairs or getting bogged down in philosophy, either way you'll have to explain the priorities such that a computer can understand them
The naive "always do the highest-priority task" should work, if old tasks get increased in priority so that they won't get excluded entirely. Because that factor is important to you, but currently isn't important to your code.
That reminds me of an interesting mechanic in the game Star Traders: Frontiers. So the simplified version is that for some missions there will be a card game where five cards are drawn and one of them will be the "Mission Success" card, initially rated at 20%. (The other four will be various risks, opportunities, gains, disasters.) Of course most of the time you will not draw "Mission Success" first. But then on the next round that card will be weighted to 22%, then in subsequent rounds 24%, 26%, etc. until drawing it becomes a near certainty over all of those rounds. (I think once I had to go to over 50%.) That might also be a way to handle low-priority jobs for OP.
 
Last edited:
Frankly, it's almost exactly the same. It's been a long time since I've read it but I remember the discussion of scheduling in Operating Systems: Three Easy Pieces being very lucid (and the book overall) and there's a free version here (plus ... you know what to do):

https://pages.cs.wisc.edu/~remzi/OSTEP/
This may be a stretch but I'm also reminded of some of the literature from evolutionary computing. The following comes from An Introduction to Genetic Algorithms (Mitchell):

"A common selection method in GAs is fitness−proportionate selection, in which the number of times an individual is expected to reproduce is equal to its fitness divided by the average of fitnesses in the population. (This is equivalent to what biologists call 'viability selection.')
Discussion of operating systems and genetic algorithms in this thread? Now we're really speaking my language lol.
I always thought genetic programming seemed like a really cool counterpart to neural nets when it comes to iteratively self-improving over time. I used to frequently discuss such algorithms with a biology prof buddy I had over many lunch breaks.

So basically what I'm trying to say here is that if OP can figure out how to make the really important jobs occupy x% of the slots on a virtual roulette wheel, and the less important jobs the remainder (or even split into three or more categories), OP can just "spin the wheel" and the less important jobs will have their time sooner or later while the priorities are still respected on average.
My thought with grouping into queues is that true fairness and equality in queuing systems are annoying in practice to the end users and affect incentives towards the extrema. So a floating window/range for each queue priority level based on average volume of job types would help ensure customers/users on the lower extremum of the priority spectrum have a sort of luck based incentive to roll the dice instead of waiting and saving up to pay more some time later. You could even dither a bit while sorting from the scores for even more randomness.

Oh shi

I swear I just read this now
lol one big takeaway from my time working in math and CS is that averages and the law of large numbers are more much powerful than many people think. I run into a lot of similar problems when doing signal processing work.
 
Last edited:
Discussion of operating systems and genetic algorithms in this thread? Now we're really speaking my language lol.
I always thought genetic programming seemed like a really cool counterpart to neural nets when it comes to iteratively self-improving over time. I used to frequently discuss such algorithms with a biology prof buddy I had over many lunch breaks.
They're not mutually exclusive. My understanding is that early recurrent neural networks used an evolutionary computing approach because traditional backpropagation no worky and also that NEAT (Neuroevolution of Augmenting Topologies), where the entire topology of the network is permitted to evolve, is a popular approach to evolutionary computing in neural networks today.
 
Last edited:
Have your high priority workers handle lower priority jobs if there's nothing for them to handle for a specific amount of time. Once a high priority task comes in again, handle those until you hit the timeout again.
How do I force a worker to consume a task from a particular queue at runtime with celery? If I could do this, it'd solve 90% of the problem (I'd still need to watch and tune priorities). I don't need working code, please just point me to the celery feature that does this.

I personally can't spin up more workers, it's lightyears above my pay grade. Plus it sounds like overkill to have a separate program watching the queues, starting and creating processes, when the solution may be as easy as passing a string to a function (but what function????).

A simple method of implementing fitness−proportionate selection is 'roulette−wheel sampling'
Yeah, I was thinking about something like this, or buckets to emulate this without randomization.

Frankly, it's almost exactly the same. It's been a long time since I've read it but I remember the discussion of scheduling in Operating Systems: Three Easy Pieces being very lucid (and the book overall) and there's a free version here (plus ... you know what to do):
Thank you! I'll start reading the book now.

Without splitting hairs or getting bogged down in philosophy, either way you'll have to explain the priorities such that a computer can understand them
I know! I can make a http microservice that explicitly ranks all queued tasks and responds with the next task to do whenever a worker hits it (I've done it a lot for humans). But I'd rather emulate the behavior with scores or probabilities, it's not a generic case where Anything Can Happen and tasks can be reranked every which way, a separate service looks like overkill.

When I worked for a gig worker service, it could be that an assigned gig worker didn't check in for the job and the order had to be emergencied so human dispatchers would look into it and assign someone else. Here we have predictable flows of nearly identical tasks. We don't (yet) have an SLA to meet and SLA-related fines to minimize.

(
When I worked for Roscomnadzor, the nepo baby "scrum master" there wanted me to "optimize queues with AI". He asked two questions:
  • "Would it help if we have two queues, for long tasks and for short tasks, so that long tasks don't hold back short tasks?"
    • I asked him what share of the tasks could go overtime. "lmao none!" Then playing musical chairs won't help, all of them need to be done in order of expiration time, add more workers.
      • I modeled this to prove my point.
  • "Let's say, if we process 100'000 tasks per day and add 110'000 tasks per day, what will be the average length of the queue? Can you model that?"
    • ...I don't need to model that, I can just tell. There won't be an average length, it will grow day by day.
      • (I got fired soon after.)
)

There's actually more to the userstory, such as:
  1. the customer can declare a task "urgent" to prioritize among his own tasks
    1. me: this could be done with rabbit priorities, to push tasks to the front of the customer-specific queue
  2. "urgent" tasks should ideallybe executed before non-urgent tasks of other customers
    1. me: this is blatantly exploitable
  3. there should be a (non-monetary) mechanism to prevent customers from declaring every task as urgent
    1. me: even so, if some customers are sending urgent tasks and others are not, the latter will never see anything done until all the urgent tasks are complete, even a quota of urgent tasks per unit of time won't truly help -- if a customer doesn't use urgency then he'd better declare everything urgent, or as much as the quota allows, I don't think (2) is fixable.
(I'm not asking for help with these, I'm gonna read the book now.)

It's not an interview problem. I don't lie online except about my name/birthdate (1970.01.01) and sometimes sex. I'm blessed to not have to lie IRL either, except about prices of things I buy for mom. Yes mom, that phone was on sale. Yes, their flagship model was 50% off, I saw it in the 4pda thread, everyone in the thread went and bought one. Plus they said it was the display unit and discounted an extra 5%, but when I got to the store, they found an unopened box but kept the discount, pretty cool right? I should've bought one for myself, too bad there isn't a sale right now.
 
Back