Programming thread

Considered HARMful · Dec 4, 2021

Cryonic Haunted Bullets said:
Did I say "static garbage collector"? Holy shit, I'm a fucking moron. What I meant to say was RAII. De-allocations are determined at compile time, outside of a few special cases like reference counters. All the goodness of managed memory, none of the GC pauses.

How does it differ from C++'s destructors? I ask because deterministic object creation and destruction is one of the crucial features of C++.

I shit on GC not only because of possible jitter, but also:

It's not a silver bullet some proponents pretend it to be. Well duh, I can also switch each and every raw pointer to std::shared_ptr, obliterating cache in process and introducing subtle resource leaks anyway (*). I don't need GC for that, thank you very much.
Many people, especially of the less bright or experienced kind (but also dishonest language peddlers, a.k.a. "evangelists") perpetrate the misconception that garbage collection is all about memory. That is an insultingly primitive perspective on the concept of a computing resource.

(*) I know I'm oversimplifying a bit - GC algorithms are a tad smarter than simple refcounting and can deal with some loops.

Cryonic Haunted Bullets said:
Modern C++ added these features around the same time as Rust was in development, and I'm sure there was significant cross-pollination between the two projects.

Oh I wouldn't be surprised at all if there was some ripping-off involved in one way or another (except for guarded accesses with .at(), which existed since about always). As a matter of fact, as much as I can't stand Python for anything beside some glue code, I am very pleased to admit that the C++ since C++11 has intensely pythonized itself, leading to much faster and - funnily enough - safer (meaning: less vulnerable to stupid coding mistakes) programming.

If it was the coming of ~~Christ~~ Rust, that caused the C++ language commitee to pick up the slack, then I can give it at least that. Just as LLVM+CLang coming into existence finally kicked the GCC guys to actually innovate a bit.

Cryonic Haunted Bullets said:
It's a matter of whether you prefer the battle-tested C++ ecosystem or the modern functional aspects of Rust. Because I don't use many C++ libraries, and have no experience with the language's newer features, I prefer the latter.

I see the landscape as this: I already know one vomit-inducing language which covers all of my use cases, and quite probably all of my future use cases (*), so why should I invest time and effort into another vomit-inducing language, which is years behind in libraries, tooling, and (from what I've heard) has sketchy compiling times (not that the C++ compiles fast, but I heard it's abysmal with Rust)? On top of that there's the packetification (which I view as a weak point) and the autistic rainbow LGBBQWERTY community.

And what do you mean by "modern functional aspects" in that context?

(*) Embedded programming sold separately.

Cryonic Haunted Bullets said:
I hope this can be the last reply in this chain, because, holy shit, this is the gayest argument I've ever had on the internet.

No way, this is some good discussion.

stares at error messages · Dec 4, 2021

Cryonic Haunted Bullets said:
Valid C++:

C++:

int* a = NULL; { int b = 1; a = &b; } // oh shit, a points to b, but b no longer exists!

Invalid Rust:

C-like:

let a : &u32 = { let b : u32 = 1; &b }; // compiler error: reference lives longer than referent

These examples are not equivalent. What the correct translation would be is,

C-like:

#![allow(unused)]
fn main() {
    use std::ptr;

    let mut p: *const i32 = ptr::null();
    {
        let x:i32 = 32;
        p = &x;
    }
    unsafe {
        if !p.is_null() {
            assert!(*p == 32);
        }
    }
}

<https://play.rust-lang.org/?version...on=2018&gist=9d09ab66c197da2aacf5eb7f92693f54> try it for your self.

Block expressions don't deref the last value given, it just maps assignment.

Blocks are always value expressions and evaluate the last operand in value expression context.

It's just a sugar for assignment. Block work the same way, only any thing created in block usually can't come out of that block. In your example, variable b is created, referenced, and then destroyed at the end of the block at the same time it's trying to be assigned to variable a. References in Rust are not allowed to over-live the scope they were created in. The only reason my corrected translation works because the reference was born first and lives longer then the variable x. You are allowed to have handles to data, where the original data is no longer is scope, but you are not allowed to have references to references that don't exist any more. If your exampled is re-stated, the block's values go out of scope, drop()-ed, before the variable a is ever assigned, which is why the compiler doses not like it.

I understand that these examples are meant to be trivial, but i32 is copy, so it doesn't need to be pointed to.

Shoggoth said:
Why would I be upset? This is a very creative idea, I'm just not sure it applies;
First, generational GC solves an interesting problem - not that you need to re-remember things, but that most things don't need to be remembered for long.
Another aspect of what you mention is CPU cache, where the hot stuff is easily accessed.
Finally, what you want can be implemented by memoizing with a TTL, perfectly doable in a functional language.

Thanks! I'm glad you like the idea. It would apply in how the free-ing algorithm works in the GC. The bad things I've always heard about GCs are

That weird problem where if you don't explicitly disconnect from a database in java, even if the object goes out of scope, there can be a problem with Java not forgetting the connection and having phantom data around.
When GCs work too hard trying to find the exact data to free they tend to crash. They start allocating more and more memory trying to recurse through all the pointers looking for just the right object to free, but the problem is an object leak. Then the GC keeps looking for something to free, but it's just cloging memory seaching for a leak that it can't see. I've seen my firefox after closing some windows pulse a gig of ram over 20 seconds until the thread dies. ... and then the memory gets really freed.

The solution is to be careless. Kill every thing you don't need. Don't worry about freeing the right object. Instead, just kill every thing you don't need and have a plan to get back what's missing if you discover you really needed something that got freed in the future.

stares at error messages · Dec 4, 2021

byuu said:
Isn't the real discussion something like smart pointers vs plain C pointers instead of pointers vs no pointers?
Surely Rust has some type of pointer or reference.

It has both. Rust does have pointers. If you want a smart pointer you are looking for a Box. Usually with pointers you just pass a & with a lifetime and the lifetime can be '_ meaning any, so when the returned value goes out of scope.

If you are thinking of raw pointers then you usually have to use unsafe. You can have safe const type pointers, but you can only read them with unsafe. You also need unsafe for mutating statics. Unsafe is a horrible word choice. It really should have been called unchecked or unknown or do_what_ever_your_want_compiler_aint_got_you_back_nigga.

Shoggoth · Dec 5, 2021

stares at error messages said:
Thanks! I'm glad you like the idea. It would apply in how the free-ing algorithm works in the GC. The bad things I've always heard about GCs are

That weird problem where if you don't explicitly disconnect from a database in java, even if the object goes out of scope, there can be a problem with Java not forgetting the connection and having phantom data around.

When GCs work too hard trying to find the exact data to free they tend to crash. They start allocating more and more memory trying to recurse through all the pointers looking for just the right object to free, but the problem is an object leak. Then the GC keeps looking for something to free, but it's just cloging memory seaching for a leak that it can't see. I've seen my firefox after closing some windows pulse a gig of ram over 20 seconds until the thread dies. ... and then the memory gets really freed.

The solution is to be careless. Kill every thing you don't need. Don't worry about freeing the right object. Instead, just kill every thing you don't need and have a plan to get back what's missing if you discover you really needed something that got freed in the future.

You should do some catch-up on Java's GC algos, they've progressed significantly. I've seen a memory leak once, and it was a stupid bug in a library.
Specifically, the generational hypothesis turns out to work well. Partitioning objects to short eden and tenured space lets you aggressively scan the short lived objects in a constrained environment while the long lived objects are treated differently. Objects get promoted from the eden space to tenured space after some time.
Javascript's GC and Java's GC are not comparable at all. Or OCaml, Haskell, Erlang, Ruby, Python, almost every language with GC has a good GC besides Javascript

Marvin · Dec 7, 2021

Null says no tech talk in the xenforo thread, moving this here:

polyester said:
Also, it's not about "micro-optimizations". The big advantage of C++ over Python, for most projects, is not that C++ is memory-near but that it's strongly statically typed.

I agree with your point about C++ over Python, though there's a lot of other, better languages you could substitute C++ for in that case.

C++ is statically typed, but it's not a particularly strong type system. Like it doesn't implement Hindley Milner, plus it's all optional, considering C-style typecasts.

polyester said:
2) Dynamic-typed languages like Python and JS have their own pitfalls, which lead to at least as many bugs.
There are whole classes of bugs which a C++ compiler can find at compile-time, but in Python/JS code will only show up in production.

Java and C# avoid both of those classes of pitfalls (by being statically typed AND using garbage collection). But they fucked it up by (among other things) forcing every object variable/field/parameter to accept the value "null", thus forcing programmers to check for that value everywhere, which of course they fail to do, thus leading to millions of unnecessary NullPointerException bugs around the globe.
(C had the same problem, because it makes you pass around pointers all the time, but C++ had like 90% fixed that by relying heavily on value types and non-nullable references. Java/C# were a real step back here.)

For maximum bug-safety, you'd ideally want a language that...

avoids manual memory management (so C is out, C++ is borderline)

is strongly statically typed (so Python, JS are out)

makes it easy and natural to pass around non-nullable object references (so C, Java, C#, Python, JS are out)

Is there, or will there ever be, such an ideal language (that doesn't fuck up in other ways)?
Is Rust it?
I don't known.

The ML family, particularly Ocaml, is pretty good.

In Ocaml, references are not nullable by default, and in fact, cannot be nullable. There's a separate type for handling situations where "this function might return a result, or maybe not", called option.

The option type takes a parameter type, which is whatever the payload might be.

It's in the standard library in Ocaml, but if you were going to define it, it's something like:

Code:

type 'payload option =
  | Some of 'payload
  | None

So for example, the standard library List.find_opt function takes a function and a list of items (let's say they're ints), and returns int option.

Here's an example use:

Code:

let is_even x =
  x mod 2 == 0

let is_twelve x =        
  x == 12

match List.find_opt is_even [3; 4; 5] with
| None      -> Printf.printf "no even numbers found\n"
| Some item -> Printf.printf "found even number: %d\n" item


match List.find_opt is_twelve [3; 4; 5] with
| None       -> Printf.printf "didn't find twelve\n"
| Some _item -> Printf.printf "found twelve\n"

There is no way to have a null reference error because int and int option are, properly speaking, unrelated types.

The Hindley Milner type system is pretty cool here, and over time, you learn to encode correctness constraints about your program into the type system itself. So for example, there's a result type, that's kind of like option, except for it also supports an error payload. If there's an operation that might fail, you can have it return result, which forces the client of the function to think about the possibility of the failure and how they want to handle it. You don't find yourself chasing down runtime exceptions thrown by strange library code, because the potential for the failure is encoded in the type signature of the function itself.

It's very refreshing to fuck around with some code, trying to get it to play nice with the type system, and then when it finally does compile, you know where it might fail and where it can't fail. Huge chunks of code are "safe" and "not safe".

I wrote a post about Ocaml a few years ago, might interest you.

MarvinTheParanoidAndroid · Dec 7, 2021

Should we have a poll for best programming language?

Considered HARMful · Dec 7, 2021

Marvin said:
C++ is statically typed, but it's not a particularly strong type system.

Hard disagree, C++ type system is quite strong and const-correctness is a thing.

Marvin said:
Like it doesn't implement Hindley Milner, plus it's all optional, considering C-style typecasts.

Just because there is an escape (compare http://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pascal.html, section 2.6), doesn't mean the type system is optional.

polyester · Dec 7, 2021

Marvin said:
C++ is statically typed, but it's not a particularly strong type system. Like it doesn't implement Hindley Milner, plus it's all optional, considering C-style typecasts.

It's true that the C++ type system is not as "deep" as (from what I understand) functional languages have, in the sense that you don't usually derive your custom types from other types using some complex rules or type meta-language, but rather just write a custom class and then your type is defined by what constructors/members/operators that class declares. Less theoretically pure, but quite pragmatic, and it goes a long way towards checking static type safety in practice.

It's also true that the C++ type system has some "holes" (reinterpret_cast etc.), and it overzealously performs implicit casts between numbers/bools/pointers which can lead to some pitfalls.

The explicit C-style cast syntax is the worst btw., because it's an all-in-one tool that perform (in C++ terminology) either a static_cast or a dynamic_cast or a reinterpret_cast depending on the types in question, and while the compiler has no problem understanding which one it has to use in any given situation, it's quite easy for humans readers of the source code to get confused about that.

Therefore, it's common for C++ teams to follow coding standards that forbid the use of explicit C-style casts and always make you write out one of the three C++ cast keywords. In most cases you want static_cast anyway, and if you find yourself reaching for one of the other two, it's an indication that you should think long and hard about whether you have designed your types and API sensibly.

Can't do much about the aforementioned *implicit* casts that C++ inherited form C, though.

However, encoding const-correctness into the static type system like C++ does is pretty sweet, and IMO lacking in Java & friends.
(Of course, functional languages like Haskell where everything is immutable don't need this, but that in turn imposes serious restrictions on what algorithms can be easily expressed in the language.)

Marvin said:
The ML family, particularly Ocaml, is pretty good.

I admit I don't know much about it.

I've looked into Haskell, with its "functional purity above all" dogma, and it seemed more like a plaything for mathematicians and participants of code golf competitions, rather than something to use in serious software projects.

Is OCaml more pragmatic?

And another question:
From your code examples, it looks like writing out type names in the source code is avoided in OCaml in favor of letting the compiler derive the types on its own.
Doesn't that get you into "What happens when the compiler is smarter than the programmer" kind of pitfalls?

Like, when the auto keyword was introduced in C++11, it was hailed as a great advancement. You can write it in certain places where you'd normally write out a type name, and the compiler will determine the correct type on its own!
But in practice, I've become wary of this keyword, because I don't just want the type to be clear to the compiler, I also want it to be clear to reader of the code (including myself, when I return to that piece of code later).
There are nice uses for it, like when the written-out typename would be too long and unwieldy and you don't really care about it because it's just the type of some intermediate object (like an iterator) where it's clear from context what it is and how it's used, e.g.:

Code:

std::multimap<SomeType, std::unique_ptr<SomeOtherType>>::iterator it = table.begin();
auto it = table.begin();  // same thing

But I definitely wouldn't use auto in C++ everywhere where it's allowed.

Does it feel different to you in OCaml?
i.e. that even in larger, real-world projects, it remains sufficiently clear to humans what the types of your functions/variables are?

Considered HARMful · Dec 7, 2021

polyester said:
Is OCaml more pragmatic?

It depends on what kind of software you want to write. I, for one, dislike OCaml after having to write a simple GUI app with SDL and OpenGL. I felt it got in the way more than it helped and I could express "functionalness" a lot easier in C++.

polyester said:
Like, when the auto keyword was introduced in C++11, it was hailed as a great advancement. You can write it in certain places where you'd normally write out a type name, and the compiler will determine the correct type on its own!
But in practice, I've become wary of this keyword, because I don't just want the type to be clear to the compiler, I also want it to be clear to reader of the code (including myself, when I return to that piece of code later).
There are nice uses for it, like when the written-out typename would be too long and unwieldy and you don't really care about it because it's just the type of some intermediate object (like an iterator) where it's clear from context what it is and how it's used, e.g.:

Shortening code (and/or making it more readable) is superficial feature of the auto keyword. What it really helps with is with writing more generic code, whereas you can do quite a refactoring of data structures or classes used, while using auto for, example, dependent types. If all the types were written by hand, you would have to scour the code for every occurence of old typename. With auto, you have that done, well, automatically for you. As a bonus you don't get unwanted type conversions which could result in erroneous code.

And don't even get me started on template metaprogramming.

Shoggoth · Dec 7, 2021

I have a question for the type theorists, how would you type a function to specify it doesn't block?
How would you type a function which takes two functions as arguments but only one of them ever is called?
Is there a type system which allows you to model this?

Considered HARMful said:
And don't even get me started on template metaprogramming.

You mean lisp's ugly cousin?

polyester · Dec 7, 2021

Shoggoth said:
how would you type a function to specify it doesn't block?

I don't know about type theory, but in conventional languages, you'd just make the function's return type something that represents either

the future computation result
(e.g. std::future<T> in C++, or something named along the lines of Promise or AsyncResult in other languages/frameworks),
or the asynchronous computation itself
(e.g. Task<T> in C#).

In C++ that's a normal type that the standard library happens to define, but you could have also defined yourself (and simply document that this is what it means).

In C#, it's actually integrated into the language to some extent, with the async keyword allowing you to call and chain Task<T>-returning functions in a way that makes asynchronous code more concise and readable.

Marvin · Dec 7, 2021

Considered HARMful said:
Hard disagree, C++ type system is quite strong and const-correctness is a thing.

Just because there is an escape (compare http://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pascal.html, section 2.6), doesn't mean the type system is optional.

Can't you cast away const correctness?

polyester said:
I admit I don't know much about it.

I've looked into Haskell, with its "functional purity above all" dogma, and it seemed more like a plaything for mathematicians and participants of code golf competitions, rather than something to use in serious software projects.

Is OCaml more pragmatic?

Yes.

If you're familiar with promises (like Javascript's promises), Haskell's shtick is that basically every value is an implicit promise. That, plus Hindley Milner typing (Haskell extends it with typeclasses, which I don't really know anything about) and immutability.

The idea is that, with immutability, since you can't assign anything to any variables, code cannot rely on the side effects of any other code having been run. As long as the inputs to the function are computed, the order doesn't matter.

I agree that this is excessively pure for most purposes. It's good for things like writing compilers, but probably not much general usage.

Ocaml has normal execution semantics, just with HM typing. Code is executed in the order you'd expect. Ocaml also does have side effects, in that you can mutate arrays and things like that, although your typical Ocaml program doesn't do it very often. It's mostly done for efficiency in specific situations.

Usually I'll write a program in a very functional style, and then benchmark to find places where I might want to implement some side effecting code for speed.

So for example, I was working on a REST API for a project, and logically, GET /someitems?count=5 is a very functional operation. I fetch the items from the database, and then I can map a function over that list to process them somehow and return it as JSON.

I noticed that my code was making a shitton of database requests (each object required a second database hit for some ancillary information), so I replaced the function that fetches that ancillary data with one that returns a closure that internally has a small cache. It spreads the ancillary database requests across all the main objects requested. (and it gets reset on each http request)

Most of my code consists of typical functional stuff like List.map ~f:(fun post -> public_json_of_post post) (fetch_db_posts()), except after benchmarking, I modified public_json_of_post to have a little internal cache.

polyester said:
And another question:
From your code examples, it looks like writing out type names in the source code is avoided in OCaml in favor of letting the compiler derive the types on its own.
Doesn't that get you into "What happens when the compiler is smarter than the programmer" kind of pitfalls?

Like, when the auto keyword was introduced in C++11, it was hailed as a great advancement. You can write it in certain places where you'd normally write out a type name, and the compiler will determine the correct type on its own!
But in practice, I've become wary of this keyword, because I don't just want the type to be clear to the compiler, I also want it to be clear to reader of the code (including myself, when I return to that piece of code later).

So there should be some clarification about how the type inferencing works in Hindley Milner type systems.

To start with, in the standard library, the built-in functions, all are inherently typed.

So for example, when I type out in the ocaml repl:

Code:

# string_of_int;;
- : int -> string = <fun>

# int_of_string;;
- : string -> int = <fun>

Two functions, string_of_int takes an integer and returns it formatted as a string and int_of_string takes a string and tries to parse an integer from it.

The way HM typing works, is that they trace all the calls in your program, and all the variables and where they come from, back ultimately until you end up with the return or argument to one of the already typed functions. And it calculates types from there.

The type system isn't guessing in a humanistic way what the types of the variables should be, rather it's determining factually what they must be. If the type checker fails, that means at some point you're trying to pass a string to a function expecting an int. ie your program doesn't make sense.

polyester said:
There are nice uses for it, like when the written-out typename would be too long and unwieldy and you don't really care about it because it's just the type of some intermediate object (like an iterator) where it's clear from context what it is and how it's used, e.g.:

Code:

std::multimap<SomeType, std::unique_ptr<SomeOtherType>>::iterator it = table.begin(); auto it = table.begin(); // same thing

But I definitely wouldn't use auto in C++ everywhere where it's allowed.

Does it feel different to you in OCaml?
i.e. that even in larger, real-world projects, it remains sufficiently clear to humans what the types of your functions/variables are?

I generally do my Ocaml development using a repl. I have a text editor open, and then a window with the repl.

I can type out the name of a function (either built-in or one I wrote) and it tells me the types of the inputs and outputs.

I don't often specifically type my variables. Usually I just rely on a consistent naming convention for my variables. Although sometimes if I'm really dealing with a mess of complicated types, I might type some of my variables to get everything straight in my head.

And yeah, shortening up long, compiler generated types is one reason I might type something. But you can also do more interesting things with the type system.

So here's an implemention of a bitstring, based on the built-in byte arrays:

Code:

module Bitstring = struct

  type t = bytes (* bytes are the built-in byte array type *)

  let of_bytes bytes =
    bytes

  let length bs =
    8 * Bytes.length bs

  let get bs idx =
    failwith "debug finish"

  let set bs idx value =
    failwith "debug finish"

end

This is a common Ocaml paradigm. You'll have a module called Noun and it has one single type called t and a bunch of functions in Noun that manipulate Noun.t's.

On this first try, it typed this module as:

Code:

module Bitstring :
  sig
    type t = bytes
    val of_bytes : 'a -> 'a
    val length : t -> int
    val get : 'a -> 'b -> 'c
    val set : 'a -> 'b -> 'c -> 'd
  end

All the 'a 'b, etc are placeholder types because the typechecker can't assign a type for them yet. So for example, you can call of_bytes with an int and since it just returns its argument, it returns an int. But if you call it with a string next, it'll return a string.

Second try, I typed everything out:

Code:

module Bitstring = struct

  type t = bytes (* bytes are the built-in byte array type *)

  let of_bytes (bytes : bytes) : t =
    bytes

  let length (bs : t) =
    8 * Bytes.length bs

  let get (bs : t) (idx : int) : bool =
    failwith "debug finish"

  let set (bs : t) (idx : int) (value : bool) : unit =
    failwith "debug finish"

end

And here's how that typing looks like now:

Code:

module Bitstring :
  sig
    type t = bytes
    val of_bytes : bytes -> t
    val length : t -> int
    val get : t -> int -> bool
    val set : t -> int -> bool -> unit
  end

Now, you can do neat things with the signatures. Technically speaking, signatures aren't strictly necessary. You can distribute just the source code of the module as a .ml file and that's fine. However, you can package the signature up in a .mli file and you can modify what is or isn't public.

So I could make a bunch of internal functions for working with bitstrings and just not export them.

And more interestingly, the fact that Bitstring.t is just an alias for bytes can be hidden through the signature.

If my .mli file for this looks like:

Code:

module Bitstring :
  sig
    type t (* note the missing type alias *)
    val of_bytes : bytes -> t
    val length : t -> int
    val get : t -> int -> bool
    val set : t -> int -> bool -> unit
  end

then all this signature says that a type known as Bitstring.t exists, and it tells us that we can create one by passing a bytes variable to Bitstring.of_bytes. It tells us nothing about how Bitstring.t's are implemented internally.

That being said, the compiler does know that it's just working on plain old byte arrays in the end. So the code it generates is as fast as doing native byte array work, just through a locked down, restricted interface.

The following code works or doesn't work depending on if the type alias is published:

Code:

let my_bytes = Bytes.of_string "abc"

let my_bitstring : Bitstring.t = my_bytes (* requires the type alias to be published *)
let my_bitstring : Bitstring.t = Bitstring.of_bytes my_bytes (* works regardless *)

let my_bitstring = Bitstring.of_bytes my_bytes (* without manual type annotation *)

Shoggoth said:
I have a question for the type theorists, how would you type a function to specify it doesn't block?
How would you type a function which takes two functions as arguments but only one of them ever is called?
Is there a type system which allows you to model this?

In a turing complete language, I don't think that's possible.

If you're asking what typecheckers assign to functions that never return, in Ocaml, it's an indeterminate type. So failwith is typed string -> 'a because it never returns.

Bongocat · Dec 7, 2021

I've been programming professionally long enough to know that anybody who says "real programmers do X" is a blowhard without any true idea of the scope what they're talking about. "Real programmers do" things to the best of their ability as efficiently and safely as they can and get a working product in somebody else's hands. There's a disproportionately large number of people that refuse to adapt and speed up their work output because "real programmers don't use tools".

benutz · Dec 7, 2021

Rusty Crab said:
I've been programming professionally long enough to know that anybody who says "real programmers do X" is a blowhard without any true idea of the scope what they're talking about. "Real programmers do" things to the best of their ability as efficiently and safely as they can and get a working product in somebody else's hands. There's a disproportionately large number of people that refuse to adapt and speed up their work output because "real programmers don't use tools".

Totes hard agree. Let me tell you about real programmers. This is the story of Mel, a real programmer:

A recent article devoted to the macho side of programming
made the bald and unvarnished statement:

Real Programmers write in FORTRAN.

Maybe they do now,
in this decadent era of
Lite beer, hand calculators, and "user-friendly" software
but back in the Good Old Days,
when the term "software" sounded funny
and Real Computers were made out of drums and vacuum tubes,
Real Programmers wrote in machine code.
Not FORTRAN. Not RATFOR. Not, even, assembly language.
Machine Code.
Raw, unadorned, inscrutable hexadecimal numbers.
Directly.

Lest a whole new generation of programmers
grow up in ignorance of this glorious past,
I feel duty-bound to describe,
as best I can through the generation gap,
how a Real Programmer wrote code.
I'll call him Mel,
because that was his name.

I first met Mel when I went to work for Royal McBee Computer Corp.,
a now-defunct subsidiary of the typewriter company.
The firm manufactured the LGP-30,
a small, cheap (by the standards of the day)
drum-memory computer,
and had just started to manufacture
the RPC-4000, a much-improved,
bigger, better, faster -- drum-memory computer.
Cores cost too much,
and weren't here to stay, anyway.
(That's why you haven't heard of the company,
or the computer.)

I had been hired to write a FORTRAN compiler
for this new marvel and Mel was my guide to its wonders.
Mel didn't approve of compilers.

"If a program can't rewrite its own code",
he asked, "what good is it?"

Mel had written,
in hexadecimal,
the most popular computer program the company owned.
It ran on the LGP-30
and played blackjack with potential customers
at computer shows.
Its effect was always dramatic.
The LGP-30 booth was packed at every show,
and the IBM salesmen stood around
talking to each other.
Whether or not this actually sold computers
was a question we never discussed.

Mel's job was to re-write
the blackjack program for the RPC-4000.
(Port? What does that mean?)
The new computer had a one-plus-one
addressing scheme,
in which each machine instruction,
in addition to the operation code
and the address of the needed operand,
had a second address that indicated where, on the revolving drum,
the next instruction was located.

In modern parlance,
every single instruction was followed by a GO TO!
Put that in Pascal's pipe and smoke it.

Mel loved the RPC-4000
because he could optimize his code:
that is, locate instructions on the drum
so that just as one finished its job,
the next would be just arriving at the "read head"
and available for immediate execution.
There was a program to do that job,
an "optimizing assembler",
but Mel refused to use it.

"You never know where it's going to put things",
he explained, "so you'd have to use separate constants".

It was a long time before I understood that remark.
Since Mel knew the numerical value
of every operation code,
and assigned his own drum addresses,
every instruction he wrote could also be considered
a numerical constant.
He could pick up an earlier "add" instruction, say,
and multiply by it,
if it had the right numeric value.
His code was not easy for someone else to modify.

I compared Mel's hand-optimized programs
with the same code massaged by the optimizing assembler program,
and Mel's always ran faster.
That was because the "top-down" method of program design
hadn't been invented yet,
and Mel wouldn't have used it anyway.
He wrote the innermost parts of his program loops first,
so they would get first choice
of the optimum address locations on the drum.
The optimizing assembler wasn't smart enough to do it that way.

Mel never wrote time-delay loops, either,
even when the balky Flexowriter
required a delay between output characters to work right.
He just located instructions on the drum
so each successive one was just past the read head
when it was needed;
the drum had to execute another complete revolution
to find the next instruction.
He coined an unforgettable term for this procedure.
Although "optimum" is an absolute term,
like "unique", it became common verbal practice
to make it relative:
"not quite optimum" or "less optimum"
or "not very optimum".
Mel called the maximum time-delay locations
the "most pessimum".

After he finished the blackjack program
and got it to run
("Even the initializer is optimized",
he said proudly),
he got a Change Request from the sales department.
The program used an elegant (optimized)
random number generator
to shuffle the "cards" and deal from the "deck",
and some of the salesmen felt it was too fair,
since sometimes the customers lost.
They wanted Mel to modify the program
so, at the setting of a sense switch on the console,
they could change the odds and let the customer win.

Mel balked.
He felt this was patently dishonest,
which it was,
and that it impinged on his personal integrity as a programmer,
which it did,
so he refused to do it.
The Head Salesman talked to Mel,
as did the Big Boss and, at the boss's urging,
a few Fellow Programmers.
Mel finally gave in and wrote the code,
but he got the test backwards,
and, when the sense switch was turned on,
the program would cheat, winning every time.
Mel was delighted with this,
claiming his subconscious was uncontrollably ethical,
and adamantly refused to fix it.

After Mel had left the company for greener pa$ture$,
the Big Boss asked me to look at the code
and see if I could find the test and reverse it.
Somewhat reluctantly, I agreed to look.
Tracking Mel's code was a real adventure.

I have often felt that programming is an art form,
whose real value can only be appreciated
by another versed in the same arcane art;
there are lovely gems and brilliant coups
hidden from human view and admiration, sometimes forever,
by the very nature of the process.
You can learn a lot about an individual
just by reading through his code,
even in hexadecimal.
Mel was, I think, an unsung genius.

Perhaps my greatest shock came
when I found an innocent loop that had no test in it.
No test. None.
Common sense said it had to be a closed loop,
where the program would circle, forever, endlessly.
Program control passed right through it, however,
and safely out the other side.
It took me two weeks to figure it out.

The RPC-4000 computer had a really modern facility
called an index register.
It allowed the programmer to write a program loop
that used an indexed instruction inside;
each time through,
the number in the index register
was added to the address of that instruction,
so it would refer
to the next datum in a series.
He had only to increment the index register
each time through.
Mel never used it.

Instead, he would pull the instruction into a machine register,
add one to its address,
and store it back.
He would then execute the modified instruction
right from the register.
The loop was written so this additional execution time
was taken into account ---
just as this instruction finished,
the next one was right under the drum's read head,
ready to go.
But the loop had no test in it.

The vital clue came when I noticed
the index register bit,
the bit that lay between the address
and the operation code in the instruction word,
was turned on ---
yet Mel never used the index register,
leaving it zero all the time.
When the light went on it nearly blinded me.

He had located the data he was working on
near the top of memory ---
the largest locations the instructions could address ---
so, after the last datum was handled,
incrementing the instruction address
would make it overflow.
The carry would add one to the
operation code, changing it to the next one in the instruction set:
a jump instruction.
Sure enough, the next program instruction was
in address location zero,
and the program went happily on its way.

I haven't kept in touch with Mel,
so I don't know if he ever gave in to the flood of
change that has washed over programming techniques
since those long-gone days.
I like to think he didn't.
In any event,
I was impressed enough that I quit looking for the
offending test,
telling the Big Boss I couldn't find it.
He didn't seem surprised.

When I left the company,
the blackjack program would still cheat
if you turned on the right sense switch,
and I think that's how it should be.
I didn't feel comfortable
hacking up the code of a Real Programmer.

This is one of hackerdom's great heroic epics, free verse or no. In a few spare images it captures more about the esthetics and psychology of hacking than all the scholarly volumes on the subject put together. For an opposing point of view, see the New Hacker's Dictionary entry for "Real Programmer".

[1992 postscript -- the author writes: "The original submission to the net was not in free verse, nor any approximation to it -- it was straight prose style, in non-justified paragraphs. In bouncing around the net it apparently got modified into the `free verse' form now popular. In other words, it got hacked on the net. That seems appropriate, somehow." The author adds that he likes the `free-verse' version better...]

Set the controls for "Optimum" - We're going all the fucking way!

Account · Dec 7, 2021

@Marvin since your post is too long

In a turing complete language, I don't think that's possible.

Pedantically speaking this is not necessarily the case. A turing complete type system could express "the amount of steps a turing machine takes to execute this function is bounded by (formula)". Granted, its a useless type system since you can't write a type checker for it, but its still expressible. People have written turing-complete type systems (for fun, I presume).

Shoggoth said:
I have a question for the type theorists, how would you type a function to specify it doesn't block?
How would you type a function which takes two functions as arguments but only one of them ever is called?
Is there a type system which allows you to model this?

You can kinda do both in Coq and other languages that have dependent type systems. Coq is a programming language where proofs about the behavior of functions are actual objects in the language. The type of a proof is a statement ("this function returns a number greater than 0") and the value of that type is the actual proof that shows a certain function obeys that property.

For the first question, I can't think of a type system that embeds the execution time of functions (since blocking functions execute for an arbitrary amount of time). The types of functions are written with the assumption that they terminate. For programming languages like OCaml, a nonterminating function would just have an arbitrary return value (they cannot return so it doesn't matter). In languages like Coq all functions must halt so blocking is not a problem (you can't directly do IO in Coq, but you can compile Coq to Ocaml, which can do IO). The closest thing I can think of is that you can write functions that "time out" after a certain number of steps (a "step indexed" function) and you can write a proof with type "Given any list of arguments A to the function F there exists a value N such that F(A) never times out" and pass that around as a function that doesn't block.

For the second question, from most type-theoretic standpoints functions only spit out values, so there is no point in specifying if a function has been called or not. I assume you ask this because saying if a function has been called yet or not is important if you consider functions that can make side effects. You can fudge side effects back in if each function takes an argument that contains the entire environment of execution and returns a (possibly modified) environment. I can't think (at least off the top of my head) think of a very general way to write this, since you have a million corner cases (how many times is the first function executed? What if the first function is equal to the second function? etc).

One example is

Python:

def append(f,g):
    x = f()
    l.append(x,g)

where the proof would be of type "for any functions f(),g, append(f,g) takes the environment E, applies all modifications f makes to E to make E2, and then returns the environment E2 with the only modification being that the list l has increased in size."

All of this might sound cool but programming with dependent types is like trying to teach a 2 year old calculus in Latin. You'll lose your sanity and productivity.

Considered HARMful · Dec 8, 2021

Marvin said:
Can't you cast away const correctness?

Sure you can, but I gather you know that already, so why do you ask?

Also: have you read what I linked?

polyester · Dec 8, 2021

Joe Shmo said:
The newer versions of C# (8.0+) now have a project setting which disallows null in reference types unless you explicitly declare them as nullable. And the code analyzer throws up warnings when you have possible null that could get assigned to a non-nullable type. It's a great feature, doesn't take much to get used to, and it helps prevent all those null reference exceptions. Here's a blog post about it.

Nice, that's how it should have been from the start.
(Should throw errors instead of warnings, actually - but I assume that's easily achieved with a compiler flag.)

And for those cases where you do want a nullable type, the C# notation int? is also much nicer than C++'s std::optional<int>.

What's this about them adding "annotations" to the .NET framework for this, though?
Shouldn't they just enable this feature in the .NET libraries themselves and then change String to String? in any place of the API that requires it?
Or would that break backwards compatibility too much?

Considered HARMful · Dec 8, 2021

polyester said:
And for those cases where you do want a nullable type, the C# notation int? is also much nicer than C++'s std::optional<int>.

For a few seconds I imagined what a riot it would be if int? could be paired with pointers (int*?), L-value references (int&?) and R-value references (int&&?). :story:

Autistic Joe · Dec 8, 2021

polyester said:
Nice, that's how it should have been from the start.
(Should throw errors instead of warnings, actually - but I assume that's easily achieved with a compiler flag.)

And for those cases where you do want a nullable type, the C# notation int? is also much nicer than C++'s std::optional<int>.

What's this about them adding "annotations" to the .NET framework for this, though?
Shouldn't they just enable this feature in the .NET libraries themselves and then change String to String? in any place of the API that requires it?
Or would that break backwards compatibility too much?

Not completely sure of all the details yet, we're just starting to upgrade to the newer versions so I'm still getting my feet wet. From what I've seen so far though, it does look like they updated all the .NET libraries, at least everything I've been working with in VS 2022. That blog was from a few years ago, so they've had plenty of time if any framework classes were missing this feature in the beginning.

Marvin · Dec 8, 2021

Considered HARMful said:
Sure you can, but I gather you know that already, so why do you ask?

I figured you could, but I wanted to confirm. I wasn't being snarky.

Considered HARMful said:
Also: have you read what I linked?

It's been awhile since I've read this article, but glossing over it, I remember a lot of the complaints and they're legitimate, but legitimate mostly in the context of the era and type of projects people were writing back then.

Specifically about 2.6: I'm not entirely sure what specifically he means by I/O systems. I guess he wanted to be able to easily read/write structures by just doing a memcpy and fwrite/fread.

With storage allocators, I'm guessing he'd want to be able to implement a generalized, custom malloc type thing (perhaps using memory mapping or something) that can service arbitrary structures.

In my experience, I think memory and type safety brings substantial benefits to certain types of projects, but you get the most benefit when they're not optional. It seems harsh, but that's been my experience.

So for example, at jobs in the past, I've worked with people using Python (which I hate for any non-trivial projects, but, y'know, money) and they tried to use Python's opt-in type checking. Python's data model is mushy and you won't ever get as strong a type system as other languages are capable of, but beyond just that, many projects in the real world simply don't have the discipline/budget to stick to recommended practices if they're not actually enforced. And I'm not talking linters or anything like that, I'm talking the language itself.

Everyone knew the benefits of adding type annotations, but in practice, they often left them out, simply because of the pressures of managers telling us we didn't have enough time to do optional stuff like documentation and adding type annotations. (This job was particularly shitty, but all jobs are subject to budget/managers to some degree.) (Also, no one maintains comments, this is why focusing on readable code first is important.)

I find that with strong, static typing, for most tasks and with some practice, programmers can build up a good stride and write code at more or less the same pace as they would in Python. Except unlike Python, when they're done the task, they find themselves with a much more reliable codebase.

Languages and things like type checkers are ultimately just tools, and not all tools are useful for everything. So there's certainly a time and a place to break the type system. Like in the case of Ocaml, for purposes you might want to write some code in C and link it in as an external library.

I find an Ocaml + C project to produce a far better result than a single language with a weaker type system that lets you do it all in one place.

Considered HARMful said:
It depends on what kind of software you want to write. I, for one, dislike OCaml after having to write a simple GUI app with SDL and OpenGL. I felt it got in the way more than it helped and I could express "functionalness" a lot easier in C++.

Ocaml is useful for projects where the concepts you're modeling in code don't change often. I wouldn't write a game in Ocaml. (In fact, I wouldn't write a game entirely in any statically typed language. For game dev or UI development, anything creative, I'd want a language where I can fuck around with the code on the fly, constantly. Dynamic experimentation is vital for something like that.)

So for example, one project I implemented in Ocaml was a tool for monitoring exercise data. I was consuming Google Fit's API. I would periodically scrape calorie data and store it in a database. Then I served up an endpoint that would do various calculations on that data. (In particular, I wanted to see a running total for the past X days, so I can build up a calorie deficit before a holiday or something.)

The value of a strongly, statically typed language for something like that is that you can build an extremely reliable program, and then as your problem (slowly) evolves, extend your program and still have extremely high trust that it's still reliable.

With a strong type system, as you extend it to add new features (say I had to handle a new source of calorie data), every part of the code that touches my original Calorie.t datatype will shit the bed when I extend it to support the new source. It cannot compile until I update all the relevant code.

This is, of course, how C++ or any other statically typed language works too. The main difference is that HM typing has a richer set of type related primitives, so you can extend the type system into more of the program. It covers a lot more than just public/private and constness.

This means that changing things is fairly slow and very deliberate. This works fantastically for some projects and it's impossibly useless for other projects.

Programming thread

Considered HARMful

stares at error messages

Readn' Tea Leaves

stares at error messages

Readn' Tea Leaves

Shoggoth

Marvin

MarvinTheParanoidAndroid

This will all end in tears, I just know it.

Considered HARMful

polyester

Considered HARMful

Shoggoth

polyester

Marvin

Bongocat

benutz

Account

A nondescript anime avatar account

Considered HARMful

polyester

Considered HARMful

Autistic Joe

Ontologically Autistic

Marvin