Programming thread

and its name is POSIX


niggerlicious

Whether you think this type of behavior should happen is still a matter of intense debate decades later.
niggerlicious
If you count only rendered glyphs as being affected by BBCode tags, this is totally acceptable. It just reveals a stateful interpreter instead of some tree structure, I guess.
 
niggerlicious
If you count only rendered glyphs as being affected by BBCode tags, this is totally acceptable. It just reveals a stateful interpreter instead of some tree structure, I guess.
Every forum software's implementation is a little different, but xenforo's uses an interesting hybrid approach with regex and a traditional tag stack setup. I get the impression the inline tags (e.g. formatting) are inherently more flexible than the block style tags.

PHP:
protected $ast = [];
protected $astReference;
protected $tagStack = [];
protected $pendingText = '';
protected $plainTag = null;
protected $depth = 0;
protected $maxDepth = 20;
PHP:
while (preg_match(
    '#(?:\[(\w+)( |=|\])|\[/(\w+)])#i',
    $text, $match, PREG_OFFSET_CAPTURE, $position
))

I won't quote their entire parser setup, but you can find it pretty easily if you search for it.
 
any time I have to use regular expressions
There is never a situation where regular expressions are mandatory.
What should happen when you do something like this?
Code:
[B][I]niggerlicious[/B][/I]
This should be a hard error that rejects the input as malformed.

Regex works best as a search (and replace) tool for text editors. It is horrible in a programming language or as a programming language library. Regex as a tool for hard-coded string pattern matching is one of the biggest curses in modern programming, it should be so cumbersome and annoying to use that you simply don't use it that way, and it is a big shame that it isn't because many programmers miss out on the beauty of hand-parsing structured data.

Rather than offloading what is arguably a core programming skill to some horrible dense metaprogramming language with its own parser, compiler and runtime environment embedded into your program, just learn to parse structured data by hand and your software will be more reliable, more maintainable, more extensible and will most likely be more performant too.

Hate regex, always scrutinize its usage and remove it where possible.
 
Last edited:
This should be a hard error that rejects the input as malformed.
They key word here is should.

For retard-friendly text formatting, hard errors and input rejection only create headaches for users who have never used anything resembling markup. Do you want slightly fucky formatting or a million "bug" reports about rejected user inputs? Any problem that has to be solved by a giant flaming disclaimer at the top of some FAQ page is the result of shitty design and planning.
 
They key word here is should.

For retard-friendly text formatting, hard errors and input rejection only create headaches for users who have never used anything resembling markup. Do you want slightly fucky formatting or a million "bug" reports about rejected user inputs? Any problem that has to be solved by a giant flaming disclaimer at the top of some FAQ page is the result of shitty design and planning.
People who can't into nesting when faced with an error explaining why their shit is wrong are a genuine safety risk to themselves and to society. They will be immediately plugged into the matrix once it has been built, where they will be used as biological transistors in order to mine bitcoin.
 
People who can't into nesting when faced with an error explaining why their shit is wrong are a genuine safety risk to themselves and to society. They will be immediately plugged into the matrix once it has been built, where they will be used as biological transistors in order to mine bitcoin.
I 100% agree, but fighting the end-user and making their retardation apparent to them through bad design is a great way to write software that no one else ever uses. I was never a UX guy, but I learned pretty quickly that most users need to be spoonfed. Also, nesting is easy to fuck up if you copy and paste stuff a lot.

Practicality and pragmatism are all about trade-offs and compromises. This sort of thing is why you never see markup tags anymore. It's all variants of markdown now; markdown is easy.
 
I 100% agree, but fighting the end-user and making their retardation apparent to them through bad design is a great way to write software that no one else ever uses. I was never a UX guy, but I learned pretty quickly that most users need to be spoonfed. Also, nesting is easy to fuck up if you copy and paste stuff a lot.

Practicality and pragmatism are all about trade-offs and compromises. This sort of thing is why you never see markup tags anymore. It's all variants of markdown now; markdown is easy.
I would rather be told about a dumb mistake I made when copy-pasting, than it be accepted and potentially lead to buggy states that are hard to reason about and that were not intended by the programmer. Outside of the learned helplessness goycattle, I think most people want this too even if they don't concretely know why.

The <a><b></a></b> problem is trivially identifiable with a parser, and you can output helpful contextual error messages for the specific scenario that occurred. This is always better than quietly accepting it and either re-writing it like what a web browser does, or leaving it to cause problems elsewhere in the document.

The whole reason client-side web programming is annoying is because of the ambiguity and looseness the web was designed with. The web would be a better place and web programming would not be as shit as it is if web browsers stopped parsing and rendering a page the very moment the content violated the proper well-formed structure of a web document. Oh no, developers have to meet the bare minimum expectation of writing correct and well formed markup, oh the travesty! It's not like it is even that hard to do or check for yet most websites serve malformed HTML documents anyways. This is because the web browser environment inherently encourages retarded and broken development practices.
 
>Me don set up all de tings
>Me don tie pen in deh console to install deh packages
>Me add deh tings to json
>Spen all deh day be tinkin about how to be writin deh code
>Write deh code
>index.js don seh "Me tink white man code no rasta. Dey be so stupid me not gwaan don deh return console.error(err): even if dey don makin deh error mon. Dey tri "ctrl+c" den dey don do deh npm start ting me gwaan bomboclat me diyapuh"
>Me tri "ctrl+c" den dey don do deh npm start ting
>Deh ting bomblcat dey diyapuh
>Me don go Smoke someting
>Me still tink I fee bad
>Mfw

f2fd83b673c6e3cfd8963aea2ec6efc78d6a7df841d273f2252831025aa31611_1.jpg~2.jpg
 
Last edited:
re is mostly compatible with the PCRE format. What were you looking for in particular?
Several things have bit me over the years with Python's module versus PCREs:
  • Syntax for named capture groups (and other groups too, I think) is slightly different (and I end up using the shit out of named capture groups)
  • Python's standard library regex is both fucking slow, and at least last time I was thinking hard about it (maybe 3.6-3.7 era?) it didn't release the GIL which is also extremely retarded
  • I remember multiple times running into Python's re not being able to do something like lookbehind, recursion or negative matching (yes I know my regular expression engine is no longer describing a regular grammar, but those end up being really important for a lot of parsing jobs). This always ends up being extremely frustrating, as it's straightforward in Perl or with libpcre, and I just end up needing to install the Python PCRE bindings
In general re is just gimped and slow, and means that if you must use it, you start needing to move more shit into actual code if you've got to parse things with any level of complexity (which in Python means extremely fucking slow...Python's string implementation and functions are so god damn slow it's absurd). In the past, when I've had to use it where it didn't really belong (systems software), needing to parse firehose(s) of textual data, which I've run into problems with it on more occasions than I would like to admit, which meant reaching for the PCRE bindings or resorting to writing the stuff that needed to be fast in C and/or Cython.
 
Any language based on Meta Language is extremely good for writing interpreters. I would genuinely suggest Haskell, it's ungodly fast if you put the time and effort into learning the language + the quirks of how it compiles (like when to use foldl vs foldr vs foldl' vs foldr'). Basically you can unintentionally end up using a ton of memory on accident if you don't understand what the compiler is doing. Richard Bird's books are a great resource for this. But once you get it right it's fucking miraculous how fast it is. I love writing toy languages and at this point would only choose Haskell for that job. If you're the kind of person who would like to write a toy language you're also probably the type of person who will find Haskell fun to write.

Not Haskell, but had to learn Scheme for my project recently, and the amount of time I saved by writing algorithm was tremendous, that I don't think I'll bother doing it in other languages anymore. You also get the advantage of relying on Recursion instead of loops.
 
  • Like
  • Feels
Reactions: y a t s and Marvin
Not as big-brained as other stuff mentioned in this thread but I've started learning GIS in Python using the Automating GIS Processes course from the University of Helsinki. (There's a predecessor to the same, Geo-Python, but it's really very basic.) You can't get grades from a human if you're not a student at the university but it seems enough to set up a local environment and run jupyter lab to work through the exercises. I'm liking this approach so far. I tried QGIS earlier and I walked away with the impression that is a very solid and capable piece of software but using it reminded me of this old Calvin and Hobbes Sunday strip:
calvin-and-hobbes-spaceman-spiff-computer.png
There are a zillion toolbars when you open up QGIS and navigating the different dialogue boxes is a real pain in the ass. Yes QGIS has a Python interface but just doing things programatically outside of QGIS with other extant Python libraries seems more straightforward. For me and probably others here it's just easier to write programs than clicking through a bunch of complicated dialogues and of course that approach is a lot more reproducible and scalable. There are even more resources here:
I also want to learn how things are done in R so I have even more tools at my disposal. One other thing, though, don't get this book:
I wanted to believe a Packt book was good as I learn best from books but I "acquired" it and immediately notice the code examples were riddled with obvious errors. Even with a highly experienced author, it seems like Packt can barely ever get their shit together.
 
  • Informative
Reactions: Marvin and Safir
What should happen when you do something like this?
Code:
[B][I]niggerlicious[/B][/I]
This is a trick question!

The obvious and sole correct answer is to defer it to the browser’s HTML parser.
Code:
[b][i]niggerlicious[/b][/i]
becomes
Code:
<b><i>niggerlicious</b></i>
, because this is just BBCode.

Many of you skipped ahead to solving this for the HTML parser even though that wasn’t the question, and for that the solution is to bubble groups of tags and issue hard errors to users where this strategy fails. So when you have multiple tags that start and end together with only whitespace between them, you group them together, dictate an arbitrary ordering, and render that ordering with closing tags in reverse order of the opening ones.

That’s really the best you can do here.
 
The obvious and sole correct answer is to defer it to the browser’s HTML parser.
This depends on how you define BBCode. If it's "markup that compiles to HTML" (per bbcode.org), then mostly yes, and everything that displays prettily formatted BBCode documents must have a HTML parser (its own or a third-party one). But if it's taken to mean an independent language for text files (like Markdown now is, but initially wasn't), then you have to make a decision.

This should be a hard error that rejects the input as malformed.
In some applications, yes, but BBCode was intended as a writing format. There are two common scenarios that are not served well by throwing out a babby with the bbathwater:
  • you get a malformed BBCode document and need to edit it
  • you are writing a BBCode document by typing it manually
In these cases, it's good to have a preview panel with "best approximations" that doesn't throw a "kill yourself nigger" error whenever you open a tag.
 
  • Agree
Reactions: Marvin
Back