Programming thread

You know how the "MA" stands for "Markov chain" in LZMA? Well, Fabrice Bellard made a text compressor from LLMs and it's incredibly powerful. It's part of the reason why I think of LLMs as extremely overpowered Markov chains.
I'm familiar with the acronym "Lempel-Ziv / Markov chain". Now that I think about it, can you give me a quick rundown on how what seems like a probabilistic model does lossless compression?
 
  • Semper Fidelis
Reactions: Creative Username
I'm familiar with the acronym "Lempel-Ziv / Markov chain". Now that I think about it, can you give me a quick rundown on how what seems like a probabilistic model does lossless compression?
Semper fi for asking a question that I can use as an excuse to write an informerald!
This is how I think it works. I might be slightly wrong but I'm generally confident that this is the basic outline of how you can use a Markov chain to help compress data:
  1. Computers are deterministic, at least most of the time. Therefore, probabilistic models, when operating on the same inputs, always produce the same output.
  2. This means you can use a fancy statistics model and then encode a compressed binary representation of something a lot like the following:
  3. "<weird markov chain dictionary> 1st most likely token, 4th most likely token, literal 'someweirdraretoken', 1st most likely token, 1st most likely token, 3rd most likely token..."
  4. ??? (The rest is left as an exercise to the reader in whatever entropy coding and LZ77 you want to add on top.)
  5. Save tons of money on bandwidth and storage and PROFIT!
The main thing here that makes it lossless is that you can cherry pick the nth most likely token instead of being a typical Markov chain that just picks the most likely or does a weighted pseudorandom selection. Instead of generating nonsense using your statistical model, you can encode a few faint signs that subtly nudge it into generating a specific text. Since the statistical model makes some decent guesses on its own and only needs a bit of nudging, the faint signs are less redundant than the original text itself. And when you have an alternate method of representing a series of symbols that has less redundancy, you have compression! Of course, there's no free lunch here and you can get data that severely trips up your Markov chain. It's inherent to all compression, though.

I could be wrong about how LZMA does it specifically. I don't know how LZMA works, it could be somehow merging the LZ and MA in an incredibly clever way I can't think of. If I was smart I would RTFS.
 
I mainly ask LLMs about matters of opinion then go and do further searches based on the output provided. Any statement of fact I vet immediately before repeating it elsewhere, or to myself. I still haven't used Copilot or the like
I also find it incredibly useful when you need to do something and you aren't exactly sure where to start. It will atleast give you an idea of what your options are.

Like, I had no clue how to parse an html document and manipulate the dom server side. I asked an ai, it told me about cheerio. I then took the time to go read about cheerio so I can understand it before adding it to my project.
 
Computers are deterministic, at least most of the time. Therefore, probabilistic models, when operating on the same inputs, always produce the same output.
Well, of course, that's why setting the "seed" for the random number generator to something fixed (like 42) is used to ensure reproducible outcomes in machine learning code. (Rimworld and other games use this approach too.) Is it along those lines?
 
Normally I use bootstrap but Ill check out tailwind.
You may enjoy semantic CSS frameworks such as PicoCSS. Writing your own (or modification) is good if you have to do frontend often.
AI is a literal demon and anyone retarded enough not to see that shall burn in the eternal flame of the impending technojihad (hopefully).
Perfectly accurate. Ask an AI user what they use it for and they will stare you in the eyes and show you a 1:1 copy of a stackoverflow response for "how do I reverse a string in jquery 2025".
Additionally, I find the "I simply use it for languages/libraries I am unacquainted with" to be idiotic. Are the programmers of today really so slow?
 
  • Like
Reactions: Zeftax
Well, of course, that's why setting the "seed" for the random number generator to something fixed (like 42) is used to ensure reproducible outcomes in machine learning code. (Rimworld and other games use this approach too.) Is it along those lines?
It's a combination of a fixed set of probability transitions for the Markov chain and a fixed set of choices from the probability distributions the model makes. You would only need to have a seed if you were selecting random tokens from the distributions, to generate randomized text. For compression, these selections are a part of the encoded bitstream. It provides a sequence of rough predictions like a phone keyboard autocomplete and you tell it which words to insert from the choices it suggests instead of inserting the the words entirely. If the predictions aren't hitting on much, you can just use literals or something.

You don't need randomness when you're using statistical things. The 1. item I put in there was a bit of a confusing mistake on my part.

I would like to add a semi-unrelated note that FLAC works by taking a lossy codec and encoding the difference between that lossy representation and the original data. Encoding data in unorthodox ways and using delta encodings can reduce redundancy quite well.
 
AI.jpg
 
I would like to add a semi-unrelated note that FLAC works by taking a lossy codec and encoding the difference between that lossy representation and the original data. Encoding data in unorthodox ways and using delta encodings can reduce redundancy quite well.
Fascinating but isn't it true though that FLAC is mostly excessive as far as human listening enjoyment is concerned? I can think of reasons for archival or analysis where truly lossless audio might matter but MP3, or better OGG, with a high enough kbps rate can't be told apart from FLAC audio and people are just cucking themselves into taking up all their storage with a lossless audio FLAC placebo.
 
AS LONG AS IT WORKS GOOD SAAR! You're why programmers are the lolcows of engineering. Kindly set up an ECMAsaar callback so you can asynchronously wait for the knock. Go directly to pajeet prison, do not pass GO, do not redeem 200 itunes gift card
No offense intended I just want to make a quick point.
Fascinating but isn't it true though that FLAC is mostly excessive as far as human listening enjoyment is concerned? I can think of reasons for archival or analysis where truly lossless audio might matter but MP3, or better OGG, with a high enough kbps rate can't be told apart from FLAC audio and people are just cucking themselves into taking up all their storage with a lossless audio FLAC placebo.
I just brought it up as an example of a smart technique based on an approximation and some extra data that makes it lossless. I don't have an opinion on whether FLAC is a gay waste of space or if it sounds way better than even the most highly cranked up artisanally encoded Opus audio. You could say the same things about how storing your photos in a lossless format is retarded, but people choose to do it anyway because they think the losslessness will help them. This is usually because they want to edit and reencode them; every lossy format will look/sound like shit if it goes through 20 cycles of the encoding process.
 
I just brought it up as an example of a smart technique based on an approximation and some extra data that makes it lossless. I don't have an opinion on whether FLAC is a gay waste of space or if it sounds way better than even the most highly cranked up artisanally encoded Opus audio. You could say the same things about how storing your photos in a lossless format is retarded, but people choose to do it anyway because they think the losslessness will help them. This is usually because they want to edit and reencode them; every lossy format will look/sound like shit if it goes through 20 cycles of the encoding process.
Yeah, you're right, and I'm thinking of "deep fried memes", where the whole point was to smear images in digital dirt on purpose, but maybe more practically or directly I'm thinking of the famous chant from Dawn of War:
That video is full of ugly squeaky audio artifacts but this one doesn't have them:
Obviously the latter still uses lossy compression so it seems like there is lossy audio good enough that it can be used again and again for almost any audience.
 
MP3, or better OGG, with a high enough kbps rate can't be told apart from FLAC
Speak for yourself. I've blind tested myself discerning 320kbps MP3 from FLAC. On hard samples, albeit, but I had a knack for hearing the loss, and I noticed it in my library in a few places. Vorbis/AAC aren't as bad for me, and Opus is fine at like 128kbps now that my ears are old. I started losing capacity to discern problem samples mid-200s with Vorbis.
 
Yeah, you're right, and I'm thinking of "deep fried memes", where the whole point was to smear images in digital dirt on purpose, but maybe more practically or directly I'm thinking of the famous chant from Dawn of War:
That video is full of ugly squeaky audio artifacts but this one doesn't have them:
Obviously the latter still uses lossy compression so it seems like there is lossy audio good enough that it can be used again and again for almost any audience.
As long as you don't re-encode the lossy audio, it'll remain just as high quality as the day it was encoded. The problem comes in when it gets smeared over 7 different video sites and reencoded 17 times; that's when it turns into absolute slop. Lossy compression is at its best when you have FLAC master sources that you turn into Opus or Ogg Vorbis when you need a smaller file size.
 
Speak for yourself. I've blind tested myself discerning 320kbps MP3 from FLAC. On hard samples, albeit, but I had a knack for hearing the loss, and I noticed it in my library in a few places. Vorbis/AAC aren't as bad for me, and Opus is fine at like 128kbps now that my ears are old. I started losing capacity to discern problem samples mid-200s with Vorbis.
You might not be wrong. I've looked at color samples of (mostly) women's cosmetics and read (mostly) guys expressing incredulity that two shades of lipstick are actually different. They are, and I can tell, and I apparently can actually see color differences better than most women despite being a colossal autistic / schizo faggot. (Hi Terry!)

As far as age is concerned, well, it happens to us all
Saar, cows are sacred in my culture saar
SAAR DO NOT REDEEM SAAAAAAAR
 
  • Like
Reactions: ${Sandy}
I wish I could've used Apple's Xcode on Windows.
I hate needing to sign into Microsoft accounts to use a computer.
I dislike being prompted to tie my Microsoft account to Visual Studios.
I feel like a pigeon in a magician's sleeve.
 
It's depressing that such a basic thing has been hidden and made so difficult to the point where a guide is necessary.
The guide is really:
  1. Use the "Shift + F10" keyboard shortcut to open Command Prompt on the first page of the initial setup.
  2. Type the following command to disable the internet connection requirement to set up Windows 11 and press Enter: oobe\bypassnro
All the rest is just fluff to get people to see the ads that are probably on that page.

But yes, it should have a "I like my privacy" button. It's bad enough you have to run a debloat script to get it remotely usable.
 
I hate big tech so much it's unreal.

  1. Use the "Shift + F10" keyboard shortcut to open Command Prompt on the first page of the initial setup.
  2. Type the following command to disable the internet connection requirement to set up Windows 11 and press Enter: oobe\bypassnro
So it's not designed to allow for not having account. It's just a fallback in case internet is not there.
Plain old hostile design. It bothers me so much that day by day technology is creeping everywhere, and is basically needed in order to function in society.
Yet many governments are perfectly happy to force usage of non-free tech, or at least greatly gimp any alternative means.

Just sign this EULA that you had never read, and they can update on a whim.
 
Back