Programming thread

Thanks for the replies that other time.


I'm a noob in programming, this is the first time I create a recursive function on my own. I want to know if any of you have any tricks for coming up with those, and also if this code is retarded.

So basically it tries all possible combinations of words from a pool of characters, but without repeating them, so for "XYZ" it should give:
XYZ, XZY, YXZ, YZX, ZXY, ZYX
(so what I mean is that the combinations are with a factorial, not with pool ** length)

It uses the recursive to fill a matrix with indexes, and then transposes the matrix to get the pack of indexes that will make the word:
Python:
def all_strings(chars):
  chars, n_chars = list(chars), len(chars)
  numeric_string, cycles = [], [1]

  combinations = 1
  for i in range(1, n_chars + 1):
    combinations *= i
    cycles.append(combinations)
    numeric_string.append(i - 1)
  print(f"-- The number of combinations is {combinations} --\n")

  matrix = []
  for i in range(n_chars):
    matrix.append(list())

  def rec(leftover_pool, times):
    for x in range(times):
      if times == 1:
        matrix[times - 1].append(leftover_pool[0])
      else:
        hold = leftover_pool.pop(x)
        for i in range(cycles[times - 1]):
          matrix[times - 1].append(hold)
        rec(leftover_pool, times - 1)
        leftover_pool.insert(x, hold)

  rec(numeric_string, n_chars)

  matrix = list(zip(*matrix))
  matrix.sort()

  for batch in matrix:
    word = ""
    for index in batch:
      word += chars[index]
    print(word)

pool = "12345"
all_strings(pool)

With basic stuff in Python, how would you do this? That would be interesting to know, without using already existing libraries (so something that you make on your own). Also, is this a good use of a recursive? It took me two afternoons to think of a good one that would work.

As always, thanks to whoever decides to answer.
 
Last edited:
Thanks for the replies that other time.


I'm a noob in programming, this is the first time I create a recursive function on my own. I want to know if any of you have any tricks for coming up with those, and also if this code is retarded.

So basically it tries all possible combinations of words from a pool of characters, but without repeating them, so for "XYZ" it should give:
XYZ, XZY, YXZ, YZX, ZXY, ZYX
(so what I mean is that the combinations are with a factorial, not with pool ** length)

It uses the recursive to fill a matrix with indexes, and then transposes the matrix to get the pack of indexes that will make the word:
Python:
def all_strings(chars):
  chars, n_chars = list(chars), len(chars)
  numeric_string, cycles = [], [1]

  combinations = 1
  for i in range(1, n_chars + 1):
    combinations *= i
    cycles.append(combinations)
    numeric_string.append(i - 1)
  print(f"-- The number of combinations is {combinations} --\n")

  matrix = []
  for i in range(n_chars):
    matrix.append(list())

  def rec(leftover_pool, times):
    for x in range(times):
      if times == 1:
        matrix[times - 1].append(leftover_pool[0])
      else:
        hold = leftover_pool.pop(x)
        for i in range(cycles[times - 1]):
          matrix[times - 1].append(hold)
        rec(leftover_pool, times - 1)
        leftover_pool.insert(x, hold)

  rec(numeric_string, n_chars)

  matrix = list(zip(*matrix))
  matrix.sort()

  for batch in matrix:
    word = ""
    for index in batch:
      word += chars[index]
    print(word)

pool = "12345"
all_strings(pool)
With basic stuff in Python, how would you do this? That would be interesting to know, without using already existing libraries (so something that you make on your own). Also, is this a good use of a recursive? It took me two afternoons to think of a good one that would work.

As always, thanks to whoever decides to answer.
Good for a beginner, but your code is a bit more complex than it needs to be. Some observations:
- you don't really need the parameter times in rec seeing as it is always equal to the length of leftover_pool. This also makes it easier to see what you're trying to do with the matrix.
- instead of using this approach where you work out which character is at a particular position in every single permutation, you should use recursion to build up the permutations themselves, based on the principle that, permutation(string) = character + each permutation in permutation(rest of string), for each character in the string, with the base case of permutation(single letter) = single letter.
- though it will increase memory usage, you can use slices to remove a character from leftover_pool rather than imperatively popping a particular index and then adding it back, like so: rec(leftover_pool[0:x] + leftover_pool[x + 1:]). Though this will create a new array for each call to rec, it's a lot easier to read.
- once you've gotten a better grip on basic Python, you should try rewriting this using list comprehensions, which would make this a whole lot shorter, and, in this case, simpler, honestly. People tend to disparage list comprehensions because they're an easy way to overcomplicate things but they're a natural fit for a function like this.
 
Today I was trying to make my C++ application fault tolerant so that it could be used as a library and not trash the whole process if malformed input is given. I was banging my head against the wall until I realized you can just handle failure signals like a segmentation fault and jump to anywhere you want to return to the caller. So while I have to handle a lot of malformed input manually still for ux, I can bail if anything truly unexpected does happen. Just another reason why I love C.
 
With basic stuff in Python, how would you do this? That would be interesting to know, without using already existing libraries (so something that you make on your own). Also, is this a good use of a recursive? It took me two afternoons to think of a good one that would work.

As always, thanks to whoever decides to answer.
I'll take a look later, but

1. no, not really a good use in practice, but any use is a good use when you're completely new -- you can write your own (possibly terrible) implementation and compare the result to the standard implementation

2. better uses are stuff like tree traversals and some sorting-adjacent problems, I'll try to poast something in the new year

3. don't do functions-in-functions while you're learning algorithms, it's very easy to make a typo and accidentally use a variable from the outer scope and then spend ages looking for it. If you want a neat helper function, take it outside and MAYBE name it something starting with an underscore.

4. put your runner code in
Python:
if __name__ == "__main__":
    pool = "12345"
    all_strings(pool)
or better yet
Python:
if __name__ == "__main__":
    pool = sys.stdin.read().rstrip()  # rstrip is here to remove the line ending that stdin.read gets you
    all_strings(pool)
and then you can
Bash:
python permutation.py < input_data.txt  # for keeping test cases in files
python permutation.py <<< 12345  # consoom a string
The reason is when you import a file (a "module"), all the 0-level code in it runs. What if you eventually want to use your permutation generator (or whatever) in another task? You want to permute "niggerlicious", write
Python:
import combinatorics.all_strings
and it prints all permutations of "12345" before it does anything else.

Disclaimer, I am a retard, but I parlayed it into a jerb and later a business.
 
Last edited:
  • Like
Reactions: Shalalallah Akbar
Today I was trying to make my C++ application fault tolerant so that it could be used as a library and not trash the whole process if malformed input is given. I was banging my head against the wall until I realized you can just handle failure signals like a segmentation fault and jump to anywhere you want to return to the caller. So while I have to handle a lot of malformed input manually still for ux, I can bail if anything truly unexpected does happen. Just another reason why I love C.
Just because you can doesn't mean you should.
Dereferencing an invalid pointer is undefined behaviour, with all that entails. You are 100% setting yourself up for nasal demons by doing this.
 
I think powersets are a good exercise for learning recursion. For the fun of it, I wrote an implementation in Scheme as simply and compactly as I could:
Code:
(define (powerset s)
  (foldr (λ (x r)
           (foldr (λ (y z) (cons (cons x y) (cons y z))) '() r))
         '(()) s))

Output:
Code:
(powerset '(1 4 2 7))
'((1 4 2 7)
  (4 2 7)
  (1 2 7)
  (2 7)
  (1 4 7)
  (4 7)
  (1 7)
  (7)
  (1 4 2)
  (4 2)
  (1 2)
  (2)
  (1 4)
  (4)
  (1)
  ())

You should only need 2 simple recursive functions at most.
 
Last edited:
Parsing is hard. I know it's fun to do, and I'm not knocking the utility of implementations like we're discussing, but like this conversation makes plain: Parsing is hard. Good parsers tend to be front-line products in and of themselves. (Consider FFMPEG, Firefox/Chrome, compilers, etc.)

If thou, as a mere mortal, must implementsted parsing, use a library, or if you're doing something genuinely novel and must have a DSL, use YACC/bison/whatthefuckever. Some declarative/functional languages have good parsing abilities too, but those aren't mere mortal languages.

The problem with parsing lies in the edge cases. Formats as big as JSON and XML are riddled with footguns. Use a library to mitigate those (and offload blame!) and you will be much happier and your productivity will thank you for it. (You'll find fun shit this way too, like how MS Office's XML formats (used in Sharepoint) were at odds with MS VS's XML formats and require special conversion to handle the quirks of both.)

If I was doing JSON-XML conversion? I'd be bouncing it off of a Postgres instance. Postgres has great import/export of both, and proper data-driven representation of your data is going to aid conversion.
Parsing is a fundamental programming skill that every programmer ought to know how to do confidently, I'd say it is just as important as having a decent intuition about data structures and algorithms. It is incredible that it isn't a mandatory topic at some universities. Parsing should be the first thing you learn because it is that important and you are severely disadvantaged in your abilities if you don't have some parsing techniques up your sleeve ready to go. I've been travelling since this quoted post, so I decided to write a json parser for fun in troonlang while waiting on/in/for various modes of transport, and in between bursts of sleep.
C-like:
use std::env;
use std::io::ErrorKind;
use std::path::Path;
use std::process;
use std::fs;

use /*json_parser::*/parse::Parser;

fn main() {
    let args = env::args().collect::<Vec<_>>();
    if args.is_empty() {
        eprintln!("Invoke this properly, nigger.");
        process::exit(1);
    }
    if args.len() < 2 {
        eprintln!("Usage: {} <file.json>", args[0]);
        process::exit(1);
    }

    let path = Path::new(&args[1]);
    let file_contents = match fs::read_to_string(path) {
        Ok(contents) => contents,
        Err(e) => {
            match e.kind() {
                ErrorKind::PermissionDenied => eprintln!("Error: permission denied"),
                ErrorKind::NotFound => eprintln!("Error: `{}` not found.", &args[1]),
                ErrorKind::InvalidData => eprintln!("Error: Data is not UTF-8."),
                _ => eprintln!("Error: nigger"),
            }
            process::exit(1);
        },
    };

    let file_contents = file_contents.trim_start_matches("\u{FEFF}");

    let parser = Parser::new(file_contents, &args[1]);
    for value in parser {
        println!("{value:#?}");
    }
}

pub mod parse {
    use crate::str_ext::*;
    use crate::token::{Token, TokenKind, Tokenizer};
    use crate::error::Errors;
    use crate::unescape::unescape;
    use TokenKind::*;

    use std::cell::{RefCell, RefMut};
    use std::collections::HashMap;
    use std::process;

    const MAX_DEPTH: usize = 512;

    pub struct Parser<'s> {
        tokenizer: RefCell<Tokenizer<'s>>,
        contents: &'s str,
        errors: RefCell<Errors<'s>>,
    }

    #[allow(dead_code)]
    #[derive(Debug, Clone)]
    enum OneOf {
        Value(Value),
        Token(Token),
    }

    impl Iterator for Parser<'_> {
        type Item = Value;
        fn next(&mut self) -> Option<Self::Item> {
            let value = self.next_value(0);
            if self.errors().had_error() {
                return None;
            }
            value
        }
    }

    impl<'s> Parser<'s> {
        pub fn new(contents: &'s str, file_name: &'s str) -> Self {
            let tokenizer = Tokenizer::new(file_name, contents);
            Self {
                tokenizer: tokenizer.into(),
                contents,
                errors: Errors::new(file_name, contents).into()
            }
        }

        pub fn errors(&self) -> RefMut<Errors<'s>> {
            self.errors.borrow_mut()
        }

        fn error(&self, msg: &str, start: usize, len: usize) {
            self.errors.borrow_mut().push(msg, start, len);
            self.errors.borrow_mut().report_all();
        }

        fn next_token(&self) -> Option<Token> {
            self.tokenizer.borrow_mut().next()
        }

        pub fn next_value(&self, depth: usize) -> Option<Value> {
            let first_token = self.next_token()?;
            if depth >= MAX_DEPTH {
                let (start, _) = first_token.pos();
                let (line, col) = self.contents.pos(start);
                eprintln!("Exceeds depth limit for parsing at line {line}, column {col}");
                process::exit(1);
            }
            match first_token.kind {
                Number => self.parse_number(&first_token),
                OpenSquare => self.parse_array(&first_token, depth+1),
                OpenCurly => self.parse_object(&first_token, depth+1),
                TerminatedString => self.parse_string(&first_token),
                Null => Some(Value::Null),
                True => Some(Value::True),
                False => Some(Value::False),
                _ => {
                    let (start, len) = first_token.pos();
                    let token_str = match self.contents.substr(start, len) {
                        Some(s) => s,
                        None => {
                            self.error("This is le bug", start, len);
                            return None;
                        },
                    };
                    self.error(&format!("unexpected token: `{token_str}`"), start, len);
                    None
                },
            }
        }

        fn parse_string(&self, token: &Token) -> Option<Value> {
            let (start, len) = token.pos();
            let literal = self.contents.substr(start, len)?;

            if literal.len() < 2 || !literal.starts_with("\"") || !literal.ends_with("\"") {
                return None;
            };

            let literal = if literal.len() == 2 {
                String::new()
            } else {
                let key = &literal[1..(len-1)];
                match unescape(key) {
                    Ok(k) => k,
                    Err((range, err)) => {
                        let start = range.start + start + 1;
                        let len = range.end-range.start;
                        let msg = err.to_string();
                        self.error(&msg, start, len);
                        return None;
                    },
                }
            };

            Some(Value::String(literal))
        }

        fn parse_number(&self, token: &Token) -> Option<Value> {
            let (start, len) = token.pos();
            let n_str = match self.contents.substr(start, len) {
                Some(n_str) => n_str,
                None => {
                    self.error("token is beyond the given file, or has no length.", start, len);
                    return None;
                },
            };
            let n_res = n_str.parse::<f64>();
            let n = match n_res {
                Ok(n) => n,
                Err(_) => {
                    self.error("could not parse number", start, len);
                    return None;
                },
            };

            if n.is_nan() {
                return Some(Value::Null);
            }

            if n.is_infinite() {
                if n.is_sign_negative() {
                    return Some(Value::Number(f64::MIN));
                } else {
                    return Some(Value::Number(f64::MAX));
                }
            }

            Some(Value::Number(n))
        }

        fn parse_array(&self, token: &Token, depth: usize) -> Option<Value> {
            if depth >= MAX_DEPTH {
                let (start, _) = token.pos();
                let (line, col) = self.contents.pos(start);
                eprintln!("Exceeds depth limit for parsing at line {line}, column {col}");
                process::exit(1);
            }
            let (mut start, mut len) = token.pos();

            let mut array = vec![];

            loop {
                let value = self.expect_value_or_token(&[CloseSquare], start, len, depth)?;
                match value {
                    OneOf::Value(v) => array.push(v),
                    OneOf::Token(_) => break,
                }

                let tok = self.expect_token(&[CloseSquare, Comma], start, len)?;
                (start, len) = tok.pos();
                match tok.kind {
                    TokenKind::CloseSquare => break,
                    TokenKind::Comma => continue,
                    _ => {
                        return None;
                    },
                }
            }

            Some(Value::Array(array))
        }

        fn parse_object(&self, first_token: &Token, depth: usize) -> Option<Value> {
            if depth >= MAX_DEPTH {
                let (start, _) = first_token.pos();
                let (line, col) = self.contents.pos(start);
                eprintln!("Exceeds depth limit for parsing at line {line}, column {col}");
                process::exit(1);
            }
            use TokenKind::*;
            let (start, len) = first_token.pos();

            let mut hashmap = HashMap::<String, Value>::new();
            loop {
                let token = self.expect_token(&[CloseCurly, TerminatedString], start, len)?;
                let (start, len) = token.pos();
                match token.kind {
                    CloseCurly => return Some(Value::Object(hashmap)),
                    TerminatedString => (),
                    _ => {
                        self.error("expected one of [CloseSquare, TerminatedString]", start, len);
                        return None;
                    },
                }

                let key = match self.parse_string(&token) {
                    Some(Value::String(s)) => s,
                    _ => {
                        self.error("expected TerminatedString", start, len);
                        return None;
                    },
                };

                let token = self.expect_token(&[Colon], start, len)?;
                let (start, len) = token.pos();

                let value = self.next_value(depth+1);
                let value = match value {
                    Some(v) => v,
                    None => {
                        // no error, if this is returning None, it already errored
                        return None;
                    },
                };

                hashmap.insert(key, value);

                let token = self.expect_token(&[CloseCurly, Comma], start, len)?;
                let (start, len) = token.pos();

                match token.kind {
                    CloseCurly => break,
                    Comma => continue,
                    _ => {
                        self.error("expected one of [CloseSquare, Comma]", start, len);
                        return None;
                    },
                }
            }

            Some(Value::Object(hashmap))
        }

        fn expect_value_or_token(
            &self,
            kinds: &[TokenKind],
            start: usize,
            len: usize,
            depth: usize,
        ) -> Option<OneOf> {
            let target_kinds = [Number, OpenSquare, OpenCurly, TerminatedString, Null, True, False]
                .iter()
                .chain(kinds.iter()).cloned()
                .collect::<Vec<_>>().to_owned();
            let next_token = self.expect_token(&target_kinds, start, len)?;
            let value = match next_token.kind {
                Number => self.parse_number(&next_token),
                OpenSquare => self.parse_array(&next_token, depth+1),
                OpenCurly => self.parse_object(&next_token, depth+1),
                TerminatedString => self.parse_string(&next_token),
                Null => Some(Value::Null),
                True => Some(Value::True),
                False => Some(Value::False),
                _ => None,
            };

            if self.errors().had_error() {
                return None;
            }

            if let Some(value) = value {
                return Some(OneOf::Value(value));
            }

            Some(OneOf::Token(next_token))
        }

        fn expect_token(&self, kinds: &[TokenKind], start: usize, len: usize) -> Option<Token> {
            if kinds.is_empty() {
                self.error("no token to expect", start, len);
                return None;
            }

            let next_token = match self.next_token() {
                Some(t) => t,
                None => {
                    if kinds.len() > 1 {
                        let err = format!("expected one of {kinds:?}, got EOF");
                        self.error(&err, start, len);
                    } else {
                        self.error(&format!("expected {} got EOF", kinds[0]), start, len);
                    }
                    return None;
                },
            };

            let (start, len) = next_token.pos();
            let mut found = false;
            for kind in kinds.iter() {
                if next_token.kind == *kind {
                    found = true;
                    break;
                }
            }
            if !found {
                if kinds.len() > 1 {
                    let err = format!("expected one of {kinds:?}, got {}", next_token.kind);
                    self.error(&err, start, len);
                } else {
                    self.error(&format!("expected {} got {}", kinds[0], next_token.kind), start, len);
                }
                return None;
            }

            Some(next_token)
        }
    }


    #[derive(Debug, Clone, PartialEq)]
    pub enum Value {
        False,
        Null,  
        True,
        Object(HashMap<String, Value>),
        Array(Vec<Value>),
        Number(f64),
        String(String),
    }
}

pub mod token {
    use std::fmt::Display;
    use std::str::Chars;

    use crate::error::Errors;
    use crate::str_ext::Substr;
    const EOF_CHAR: char = '\0';

    #[derive(Debug, Clone, PartialEq)]
    pub enum TokenKind {
        // structural
        OpenSquare,
        CloseSquare,
        OpenCurly,
        CloseCurly,
        Colon,
        Comma,

        // literals
        False,
        Null,
        True,
        TerminatedString,
        UnterminatedString,
        Number,

        Unknown,
        Eof,
    }

    pub fn keyword(c: char, string: &str) -> Option<TokenKind> {
        use TokenKind::*;
        Some(match (c, string) {
            ('n', "ull") => Null,
            ('t', "rue") => True,
            ('f', "alse") => False,
            ('I', "nfinity") => Number,
            ('I', "nf") => Number,
            ('N', "aN") => Null,
            _ => return None,
        })
    }

    impl Display for TokenKind {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            use TokenKind::*;
            match self {
                Eof => write!(f, "EOF"),
                t => write!(f, "{:?}", t),
            }
        }
    }

    #[derive(Debug, Clone)]
    pub struct Token {
        pub kind: TokenKind,
        offset: u32,
        len: u16,
    }

    impl Token {
        fn new(kind: TokenKind, offset: usize, len: usize) -> Self {
            Self { kind, offset: offset as u32, len: len as u16 }
        }

        pub(crate) fn pos(&self) -> (usize, usize) {
            (self.offset as usize, self.len as usize)
        }
    }

    pub struct Tokenizer<'s> {
        chars_consumed: usize,
        len_remaining: usize,
        chars: Chars<'s>,
        errors: Errors<'s>,
    }

    impl<'s> Tokenizer<'s> {
        pub fn new(file_name: &'s str, string: &'s str) -> Tokenizer<'s> {
            let chars = string.chars();
            Tokenizer {
                chars_consumed: 0,
                len_remaining: string.len(),
                chars,
                errors: Errors::new(file_name, string),
            }
        }
    }

    impl Iterator for Tokenizer<'_> {
        type Item = Token;
        fn next(&mut self) -> Option<Self::Item> {
            let token = self.advance_token();
            if self.errors.had_error() {
                // they are greedily reported as they happen
                return None;
            }
            match token.kind {
                TokenKind::Eof => {
                    None
                },
                _ => Some(token)
            }
        }
    }

    #[inline]
    pub fn is_whitespace(c: char) -> bool {
        matches!(
            c,
            '\u{0009}'   // \t
            | '\u{000a}' // \n
            | '\u{000b}' // \v
            | '\u{000c}' // form feed
            | '\u{000d}' // \r
            | '\u{0020}' // space
        )
    }

    impl Tokenizer<'_> {
        pub fn advance_token(&mut self) -> Token {
            self.consume_whitespace();
            self.reset_pos_within_token();

            let first_chr = match self.bump() {
                Some(c) => c,
                None => {
                    let (start, _) = self.token_pos();
                    return Token::new(TokenKind::Eof, start, 0);
                },
            };

            let token_kind = match first_chr {
                '-' | '+'
                    if self.first().is_ascii_digit()
                        || self.first() == 'I'
                        || self.first() == 'N' => {
                    let c = self.bump();
                    if let Some(c) = c {
                        self.consume_number(c)
                    } else {
                        TokenKind::Eof
                    }
                },
                c @ '.' if self.first().is_ascii_digit() => {
                    self.consume_number(c)
                },
                c @ '0'..='9' => self.consume_number(c),
                '{' => TokenKind::OpenCurly,
                '}' => TokenKind::CloseCurly,
                '[' => TokenKind::OpenSquare,
                ']' => TokenKind::CloseSquare,
                ':' => TokenKind::Colon,
                ',' => TokenKind::Comma,
                '"' => {
                    let terminated = self.consume_string();
                    if !terminated {
                        let (start, len) = self.token_pos();
                        self.error("unterminated string", start, len);
                        TokenKind::UnterminatedString
                    } else {
                        TokenKind::TerminatedString
                    }
                },
                c if c.is_ascii_alphabetic() => {
                    if let Some(kw) = self.consume_keyword(c) {
                        kw
                    } else {
                        let (start, len) = self.token_pos();
                        let token_str = self.errors.source.substr(start, len).unwrap();
                        self.error(&format!("unexpected chars: `{token_str}`"), start, len);
                        self.errors.reset_errors();
                        TokenKind::Unknown
                    }
                },
                _ => {
                    let (start, len) = self.token_pos();
                    let token_str = self.errors.source.substr(start, len).unwrap();
                    self.error(&format!("unexpected chars: `{token_str}`"), start, len);
                    TokenKind::Unknown
                }
            };

            let (start, len) = self.token_pos();
            let res = Token::new(token_kind, start, len);
            self.reset_pos_within_token();
            res
        }

        fn consume_string(&mut self) -> bool {
            while let Some(c) = self.bump() {
                match c {
                    '"' => {
                        return true;
                    },
                    '\\' if self.first() == '\\' || self.first() == '"' => {
                        self.bump();
                    },
                    _ => (),
                }
            }
            false
        }

        fn consume_keyword(&mut self, c: char) -> Option<TokenKind> {
            let iter = self.chars.clone();
            let count = iter.take_while(|c| c.is_ascii_alphabetic()).count();
            let kword = self.chars.as_str().get(..count);
            for _ in 0..count {
                self.bump();
            }
            if let Some(kword) = kword {
                return keyword(c, kword);
            }
            None
        }

        // this felt disgusting to write
        fn consume_number(&mut self, first_digit: char) -> TokenKind {
            let mut already_past_point = false;
            if first_digit == '0' {
                match self.first() {
                    '0'..='9' => {
                        self.consume_decimal_digits();
                    },
                    '.' | 'e' | 'E' => {},
                    _ => return TokenKind::Number,
                }
            } else if first_digit == '.' {
                already_past_point = true;
            } else if first_digit == 'N' {
                if self.first() != 'a' {
                    return TokenKind::Unknown;
                }
                self.bump();
                if self.first() != 'N' {
                    return TokenKind::Unknown;
                }
                self.bump();
                return TokenKind::Null;
            } else if first_digit == 'I' {
                if self.first() != 'n' {
                    return TokenKind::Unknown;
                }
                self.bump();
                if self.first() != 'f' {
                    return TokenKind::Unknown;
                }
                self.bump();
                if self.first() == 'i' {
                    self.bump();
                    if self.first() != 'n' {
                        return TokenKind::Unknown;
                    }
                    self.bump();
                    if self.first() != 'i' {
                        return TokenKind::Unknown;
                    }
                    self.bump();
                    if self.first() != 't' {
                        return TokenKind::Unknown;
                    }
                    self.bump();
                    if self.first() != 'y' {
                        return TokenKind::Unknown;
                    }
                    self.bump();
                    return TokenKind::Number;
                }
                return TokenKind::Number;
            } else {
                self.consume_decimal_digits();
            };

            let mut peek = self.first();
            if already_past_point {
                peek = '.';
            }
            match peek {
                '.' if self.second() != '.' => {
                    if !already_past_point {
                        self.bump();
                    }
                    let mut empty_exponent = false;
                    match self.first() {
                        c if c.is_ascii_digit() => {
                            self.consume_decimal_digits();
                            match self.first() {
                                'e' | 'E' => {
                                    self.bump();
                                    empty_exponent = !self.consume_float_exponent();
                                },
                                _ => (),
                            }
                        },
                        'e' | 'E' => {
                            self.bump();
                            empty_exponent = !self.consume_float_exponent();
                        },
                        _ => (),
                    }
                    if empty_exponent {
                        let (start, len) = self.token_pos();
                        self.error("float has empty exponent", start, len);
                    }
                    TokenKind::Number
                },
                'e' | 'E' => {
                    self.bump();
                    let empty_exponent = !self.consume_float_exponent();
                    if empty_exponent {
                        let (start, len) = self.token_pos();
                        self.error("float has empty exponent", start, len);
                    }
                    TokenKind::Number
                },
                _ => TokenKind::Number,
            }
        }

        fn consume_float_exponent(&mut self) -> bool {
            if self.first() == '-' || self.first() == '+' {
                self.bump();
            }
            self.consume_decimal_digits()
        }

        fn consume_decimal_digits(&mut self) -> bool {
            let mut has_digits = false;
            while let '0'..='9' = self.first() {
                has_digits = true;
                self.bump();
            }
            has_digits
        }

        fn consume_whitespace(&mut self) {
            self.consume_while(is_whitespace);
        }

        pub fn is_eof(&self) -> bool {
            self.chars.as_str().is_empty()
        }

        fn first(&self) -> char {
            self.chars.clone().next().unwrap_or(EOF_CHAR)
        }

        fn second(&self) -> char {
            let mut iter = self.chars.clone();
            iter.next();
            iter.next().unwrap_or(EOF_CHAR)
        }

        fn pos_within_token(&self) -> usize {
            self.len_remaining - self.chars.as_str().len()
        }

        fn token_pos(&self) -> (usize, usize) {
            let start = self.chars_consumed;
            let len = self.pos_within_token();
            (start, len)
        }

        fn reset_pos_within_token(&mut self) {
            self.chars_consumed += self.pos_within_token();
            self.len_remaining = self.chars.as_str().len();
        }

        fn bump(&mut self) -> Option<char> {
            self.chars.next()
        }

        pub fn consume_while(&mut self, mut predicate: impl FnMut(char) -> bool) {
            while predicate(self.first()) && !self.is_eof() {
                self.bump();
            }
        }

        pub fn error(&mut self, msg: &str, offset: usize, len: usize) {
            self.errors.push(msg, offset, len);
            self.errors.report_all();
        }
    }
}

pub mod str_ext {
    pub trait Substr<'s> {
        fn substr(&'s self, start: usize, len: usize) -> Option<&'s str>;
    }

    impl<'s> Substr<'s> for &'s str {
        fn substr(&'s self, start: usize, len: usize) -> Option<&'s str> {
            if len == 0 {
                return None;
            }
            if start + len > self.len() {
                return None;
            }
            Some(&self[start..(start + len)])
        }
    }

    pub trait Textwise {
        fn pos(&self, n: usize) -> (usize, usize);
        fn line(&self, n: usize) -> &str;
    }

    impl Textwise for &str {
        fn pos(&self, n: usize) -> (usize, usize) {
            if n >= self.chars().count() {
                return (0, 0);
            }

            let mut line = 1;
            let mut col = 0;

            for (idx, c) in self.char_indices() {
                if idx == n {
                    return (line, col);
                }
                if c == '\n' {
                    line += 1;
                    col = 0;
                } else {
                    col += 1;
                }
            }

            unreachable!()
        }

        fn line(&self, line_n: usize) -> &str {
            if line_n == 0 {
                panic!("Line number must be greater than zero.");
            }
            let line_n = line_n - 1;
            for (idx, line) in self.lines().enumerate() {
                if idx == line_n {
                    return line;
                }
            }

            panic!("Index `{}` exceeds the amount of lines.", line_n + 1);
        }
    }
}

pub mod unescape {
    use std::{fmt::Display, ops::Range, str::Chars};

    pub fn unescape(literal: &str) -> Result<String, (Range<usize>, EscapeError)> {
        let mut unescaped = Ok(String::with_capacity(literal.len()));
        unescape_unicode(literal, &mut |range, c| {
            if let Ok(b) = &mut unescaped {
                match c {
                    Ok(c) => b.push(c),
                    Err(e) => unescaped = Err((range, e)),
                }
            }
        });
        unescaped
    }

    fn unescape_unicode<
        F: FnMut(Range<usize>, Result<T, EscapeError>),
        T: From<char> + From<u8>,
    > (
        src: &str,
        callback: &mut F,
    ) {
        let mut chars = src.chars();

        while let Some(c) = chars.next() {
            let start = src.len() - chars.as_str().len() - c.len_utf8();
            let res = match c {
                '\\' => match chars.clone().next() {
                    Some('\n') => {
                        skip_ascii_whitespace(&mut chars, start, &mut |range, err| {
                            callback(range, Err(err))
                        });
                        continue;
                    }
                    _ => scan_escape::<T>(&mut chars),
                },
                '"' => Err(EscapeError::EscapeOnlyChar),
                '\r' => Err(EscapeError::BareCarriageReturn),
                _ => Ok(T::from(c)),
            };
            let end = src.len() - chars.as_str().len();
            callback(start..end, res);
        }
    }

    fn skip_ascii_whitespace<
        F: FnMut(Range<usize>, EscapeError)
    > (
        chars: &mut Chars<'_>,
        start: usize,
        callback: &mut F,
    ) {
            let tail = chars.as_str();
            let first_non_space = tail
                .bytes()
                .position(|b| b != b' ' && b != b'\t' && b != b'\n' && b != b'\r')
                .unwrap_or(tail.len());
            if tail[1..first_non_space].contains('\n') {
                let end = start + first_non_space + 1;
                callback(start..end, EscapeError::MultipleSkippedLinesWarning);
            }
            let tail = &tail[first_non_space..];
            if let Some(c) = tail.chars().next() {
                if c.is_whitespace() {
                    let end = start + first_non_space + c.len_utf8() + 1;
                    callback(start..end, EscapeError::UnskippedWhitespaceWarning);
                }
            }
            *chars = tail.chars();
    }

    fn scan_escape<T: From<char> + From<u8>>(
        chars: &mut Chars<'_>,
    ) -> Result<T, EscapeError> {
        let res: char = match chars.next().ok_or(EscapeError::LoneSlash)? {
            '"' => '"',
            'n' => '\n',
            'r' => '\r',
            't' => '\t',
            '\\' => '\\',
            '\'' => '\'',
            '0' => '\0',
            'x' => {
                let hi = chars.next().ok_or(EscapeError::TooShortHexEscape)?;
                let hi = hi.to_digit(16).ok_or(EscapeError::InvalidCharInHexEscape)?;

                let lo = chars.next().ok_or(EscapeError::TooShortHexEscape)?;
                let lo = lo.to_digit(16).ok_or(EscapeError::InvalidCharInHexEscape)?;

                let value = (hi * 16 + lo) as u8;

                return if !value.is_ascii() {
                    Err(EscapeError::OutOfRangeHexEscape)
                } else {
                    Ok(T::from(value))
                };
            },
            'u' => return scan_unicode(chars).map(T::from),
            _ => return Err(EscapeError::InvalidEscape),
        };
        Ok(T::from(res))
    }

    fn scan_unicode(chars: &mut Chars<'_>) -> Result<char, EscapeError> {
        let mut n_digits = 1;

        let c = chars.next().ok_or(EscapeError::UnclosedUnicodeEscape)?;
        let mut value = c.to_digit(16).ok_or(EscapeError::InvalidCharInUnicodeEscape)?;

        loop {
            match chars.next() {
                None => return Err(EscapeError::UnclosedUnicodeEscape),
                Some(c) => {
                    let digit: u32 = c.to_digit(16).ok_or(EscapeError::InvalidCharInUnicodeEscape)?;
                    n_digits += 1;
                    if n_digits == 4 {
                        break std::char::from_u32(value).ok_or({
                            if value > 0x10FFFF {
                                EscapeError::OutOfRangeUnicodeEscape
                            } else {
                                EscapeError::LoneSurrogateUnicodeEscape
                            }
                        });
                    }
                    value = value * 16 + digit;
                },
            };
        }
    }

    #[derive(Debug, PartialEq, Eq)]
    pub enum EscapeError {
        LoneSlash,
        InvalidEscape,
        BareCarriageReturn,
        EscapeOnlyChar,

        TooShortHexEscape,
        InvalidCharInHexEscape,
        OutOfRangeHexEscape,

        InvalidCharInUnicodeEscape,
        UnclosedUnicodeEscape,
        LoneSurrogateUnicodeEscape,
        OutOfRangeUnicodeEscape,

        UnskippedWhitespaceWarning,
        MultipleSkippedLinesWarning,
    }

    impl Display for EscapeError {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            use EscapeError::*;
            write!(f, "{}", match self {
                LoneSlash => "escaped `\\` char without continuation",
                InvalidEscape => "invalid escape character",
                BareCarriageReturn => "raw `\\r` encountered",
                EscapeOnlyChar => "unescaped char that was escaped",

                TooShortHexEscape => "numeric char escape is too short",
                InvalidCharInHexEscape => "invalid char in numeric escape",
                OutOfRangeHexEscape => "character code in numeric escape is non-ascii",

                InvalidCharInUnicodeEscape => "non-hex value in unicode escape",
                UnclosedUnicodeEscape => "unclosed unicode escape",
                LoneSurrogateUnicodeEscape => "invalid in-bound unicode char",
                OutOfRangeUnicodeEscape => "out-of-bounds unicode char",

                UnskippedWhitespaceWarning => "unskipped whitespace",
                MultipleSkippedLinesWarning => "multiple skipped lines"
            })
        }
    }
}

pub mod error {
    use std::collections::VecDeque;
    use crate::str_ext::*;

    #[derive(Debug, Clone)]
    pub struct Error {
        message: String,
        offset: usize,
        len: usize,
    }

    impl Error {
        pub fn new(message: String, offset: usize, len: usize) -> Self {
            Self {
                message,
                offset,
                len,
            }
        }

        fn format(&self, file_name: &str, file_content: &str) -> String {
            let (line, col) = file_content.pos(self.offset);
            let substr = file_content.line(line);
            assert!(col <= substr.len());

            let (left, right) = substr.split_at(col);
            let (r_left, r_right) = right.split_at(self.len);
            let substr = format!("{}{}{}", left, r_left, r_right);

            let pad_n = line.to_string().len();
            let pad_l = vec![' '; pad_n].iter().collect::<String>();
            let pad_r = vec![' '; col].iter().collect::<String>();
            let uline = vec!['^'; self.len].iter().collect::<String>();

            format!(
                "error: {}\n{pad_l}--> {file_name}@{line}:{}\n{line} | {substr}\n{pad_l}   {pad_r}{uline}",
                self.message,
                col + 1,
            )
        }
    }

    #[derive(Debug, Clone)]
    pub struct Errors<'s> {
        had_error: bool,
        errors: VecDeque<Error>,
        pub(crate) source: &'s str,
        file_name: &'s str,
    }

    impl<'s> Errors<'s> {
        pub fn new(file_name: &'s str, source: &'s str) -> Self {
            Self {
                had_error: false,
                errors: VecDeque::new(),
                source,
                file_name,
            }
        }

        pub fn push(&mut self, message: &str, offset: usize, len: usize) {
            self.had_error = true;
            self.errors.push_back(Error::new(message.to_string(), offset, len));
        }

        pub fn pop(&mut self) -> Option<Error> {
            self.errors.pop_front()
        }

        pub fn report(&self, error: &Error) {
            eprintln!("{}", error.format(self.file_name, self.source))
        }

        pub fn report_all(&mut self) {
            while let Some(err) = self.pop() {
                self.report(&err);
            }
        }

        pub fn had_error(&self) -> bool {
            self.had_error
        }

        pub fn reset_errors(&mut self) {
            self.had_error = false;
        }
    }

    impl Iterator for Errors<'_> {
        type Item = String;
        fn next(&mut self) -> Option<Self::Item> {
            let err = self.errors.pop_front();
            if let Some(e) = err {
                return Some(e.format(self.file_name, self.source));
            }
            None
        }
    }
}
Ahem, anyways, yeah something well defined and standardized like json is well suited to have its parser generated from some parser-generator BNF variant, or just use a library.

Today I was trying to make my C++ application fault tolerant so that it could be used as a library and not trash the whole process if malformed input is given. I was banging my head against the wall until I realized you can just handle failure signals like a segmentation fault and jump to anywhere you want to return to the caller. So while I have to handle a lot of malformed input manually still for ux, I can bail if anything truly unexpected does happen. Just another reason why I love C.
This is actually significantly worse than just crashing the process because your memory is corrupted and it is not trivial to recover because you may be returning to a totally fucked program state, every time that happens there's a chance pointers get overwritten by whatever messed your memory up, leading to a memory leaks assuming you do actually manage to recover which means bounds checking all your pointers against the pages that are mapped to the process, but if you're gonna try and do that, why not just write safer code that doesn't mess up your memory in the first place? From a security standpoint this is worse too, you are basically giving the bad guys more of an opportunity to manipulate your program within the same address space, multiple shots at your program to set shit up in memory in just the right way for exploitation, making ASLR less effective, also by mimicking exceptions like this, you open up the opportunity for overwriting the signal mechanism's internal userspace data structures to achieve almost arbitrary CPU states triggered by segmentation faults, which is an incredibly favourable setup for whoever tries to break it. Just let the program crash and restart it when it does using a systemd unit or something so that it at least has a new randomised address space for each new exploit attempt.
 
Parsing is a fundamental programming skill that every programmer ought to know how to do confidently
Parsing what, exactly, though? This is the point I keep alluding to. One one hand, you've got something like parsing numbers in text like atoi() etc. Absolutely; this is within the scope of your general programmer. On the other hand, you've got something like parsing C. Or, heavens, parsing ENGLISH, because that is within the scope of a parser. Regular expressions are specialized parsing algorithms. There is an additional academic argument here that couples in to problem cases, the "what are we parsing" question. Parsing is a topic on which many books have been written and many careers founded, starting with Knuth's apex-tier LR(1) algorithm. Surely, the implication is not that a decent programmer should have naive access to the intuition encoded here.

Consider Earley's parsing algorithm, from the ACM, 1970. https://dl.acm.org/doi/pdf/10.1145/362007.362035

The following is an informal description of the algorithm as a recognizer: It scans an input string Xi . . . X~ from left to right looking ahead some fixed number k of symbols. As each symbol X~ is scanned, a set of states S~ is constructed which represents the condition of the recognition process at that point in the scan.

Each state in the set represents
  1. a production such that we are currently scanning a portion of the input string which is derived,
  2. a point in that production which shows how much of the production's right side we have recognized so far from its right side,
  3. a pointer back to the position in the input string at which we began to look for that instance of the production, and,
  4. a k-symbol string which is a syntactically allowed successor to that instance of the production. This quadruple is represented here as a production, with a dot in it, followed by an integer and a string.
Sure, this ought to be something the average programmer ought to be able to code, but naively expecting this sort of intuition about these systems is silly too.

Ahem, anyways, yeah something well defined and standardized like json is well suited to have its parser generated from some parser-generator BNF variant, or just use a library.
This has been my thesis for this all along. This is 2024: what are these mystical problem cases that don't have specialized adaptations already?

That's quite the codebase, and it looks dense. My Prolog's too rusty to try compete here, but boy am I mulling spending the time.
 
  • Agree
Reactions: UERISIMILITUDO
Today I was trying to make my C++ application fault tolerant so that it could be used as a library and not trash the whole process if malformed input is given. I was banging my head against the wall until I realized you can just handle failure signals like a segmentation fault
Why would your library be segfaulting on malformed input in the first place?
If it's bad user input, it should be handled. And if it's a bad pointer, then it's either a bug in your code or some sort of memory corruption, and you should crash in that case.
 
Parsing what, exactly, though? This is the point I keep alluding to. One one hand, you've got something like parsing numbers in text like atoi() etc. Absolutely; this is within the scope of your general programmer. On the other hand, you've got something like parsing C. Or, heavens, parsing ENGLISH, because that is within the scope of a parser. Regular expressions are specialized parsing algorithms. There is an additional academic argument here that couples in to problem cases, the "what are we parsing" question. Parsing is a topic on which many books have been written and many careers founded, starting with Knuth's apex-tier LR(1) algorithm. Surely, the implication is not that a decent programmer should have naive access to the intuition encoded here.

Consider Earley's parsing algorithm, from the ACM, 1970. https://dl.acm.org/doi/pdf/10.1145/362007.362035

Sure, this ought to be something the average programmer ought to be able to code, but naively expecting this sort of intuition about these systems is silly too.

This has been my thesis for this all along. This is 2024: what are these mystical problem cases that don't have specialized adaptations already?

That's quite the codebase, and it looks dense. My Prolog's too rusty to try compete here, but boy am I mulling spending the time.
I'm not saying the average programmer must be able to write a parser for the whole C grammar and produce the AST for any written program, but every programmer should be able to parse at a level beyond ATOI(3) and calling into an existing implementation of some standard format. Say for a moment INI parsers didn't exist, the average programmer should be able to take an example INI file and produce a parser for that file such that inputs of the same format but with a different shape will result in a successful parse where all relevant information is retained in useful data structures. A programmer should be able to take in a specification for some data format and produce a parser for it. Once programmers learn the CS fundamentals, the very next thing should be getting to an intermediate level of manual structured data parsing where one is able to take an equivalently complex file as an INI or some basic markup language, and produce a parser for it. Tons of different softwares have their own implementations for these types of things, and it's basically expected if you're working on any non-web proprietary software unless it involves cryptography. In game development, it's pretty common for a game dev to design their own content packaging format and package their assets in that rather than in some standardised format, just so that the game's content can't be easily pulled apart on day one. This means they have to write a parser for that custom file format.

I suppose my point is if every programmer is taught how to parse early on, they will all have an intuition and experience of parsing to the point where they are confident in tackling a parsing challenge and applying parsing to problems, rather than being afraid of parsing as so many programmers are. I've known so many programmers who are deathly afraid of any kind of string parsing to the point where they will either give up on a problem or instantly reach for the accursed regular expressions and botch together a horrible and incorrect "parser" that is overly lenient in its allowed inputs of what they actually intended.

Also anecdotally, I've found that if you design data structures to model some problem, and then parse the problem's inputs into the data structures, you've already solved a large part of the problem, and now you have an abstracted form of relevant inputs to work with rather than the crude and imperfect inputs.
 
Just because you can doesn't mean you should.
Dereferencing an invalid pointer is undefined behaviour, with all that entails. You are 100% setting yourself up for nasal demons by doing this.
I have read its a massive security vulnerability because you can execute an arbitrary jump if you can write into the jmpbuf and then trigger a segfault. That being said it's probably not going to touch a network. The segfaults are always null pointers but obviously could be different in a truly exceptional case. Since everyone is saying I'm retarded I'll probably just make the interrupt handler print the errors and crash instead of trying to be too clever
Why would your library be segfaulting on malformed input in the first place?
If it's bad user input, it should be handled. And if it's a bad pointer, then it's either a bug in your code or some sort of memory corruption, and you should crash in that case.
Well I was rewriting a big part of how it starts up and I happened to have it running in a broken state so I decided to sure up the error recovery. My initial plan was just to have it crash "gracefully" so I had some error printouts but I might've gotten carried away slightly
 
  • Feels
Reactions: ADHD Mate
I'm a noob in programming, this is the first time I create a recursive function on my own. I want to know if any of you have any tricks for coming up with those, and also if this code is retarded.
Unironically grind some leetcode, try DFS and Backtracking problems, while using mainly recursions.

Here is some solution to Generate Parentheses using recursion I did, which has 0ms runtime.
C++:
class Solution {
public:
    vector<string> generateParenthesis(int n) {
        vector<string> res;

        auto y = [&](int o, int c, string str, auto& self)
        {
            if (o == n and c == n)
            {
                res.push_back(std::move(str));
                return;
            }
            
            if (o < n) self(o + 1, c, str + '(', self);
            if (c < o) self(o, c + 1, str + ')', self);
        };

        y(0,0,"", y);

        return res;
    }
};

You can start with Fibonacci which I think is classic intro to recursion in programming.
And then there is Depth First Search and Backtracking lists. But prefer easy.
 
  • Like
Reactions: y a t s
res.push_back(std::move(str))
Is std::move to get around string copying in the background? I'm dealing with optimizing string usage right now and I've just been changing pass by value to passing a std::string* and += to build strings recursively
 
And then there is Depth First Search and Backtracking lists. But prefer easy.
Really any of the classical sort and search algorithms (e.g. merge sort, binary search) are good beginner exercises. The logic behind the divide-and-conquer strategy is easy enough for a beginner to grasp and visualize.

Edit: Reposting this nice gif from the merge sort wikipedia article.

Had to encode it as an mp4 because gifs aren't attaching properly rn. I sped it up a tad as well.
 
Last edited:
Is std::move to get around string copying in the background? I'm dealing with optimizing string usage right now and I've just been changing pass by value to passing a std::string* and += to build strings recursively
Yeah, std::move will transfer ownership from local string to new string inside vector.
Which is fancy way of saying pointer in new string will point to old string buffer, and old string will have it set to something else (which is safe to destroy).

However if you are returning local variable, you should not move it. Return Value Optimization might optimize out local variable entirely.
That;s why I have vector<string> res; and return res;.

Also idk your code, but I would always prefer reference over pointer, and const reference if I am not modifying it inside function. But += is perfectly fine.
C++ compilers are insanely smart. Many times I have tried to be clever only to have worse performance. (:_(
So unless you are profiling, go for most straightforward approach.
 
I'm using spectre console for my application and I was reading into how to style my text.
The docs have a link to their styles appendix section.
Its a fucking .css file. HOW IN THE FUCK DO YOU USE CSS IN THE TERMINAL? That is a thing? What the fuck? Since when?

It doesn't shock me, I've seen programs from the 90's whose UI was written in Html and JavaScript and it's not like that's uncommon. It turns out Html is just a pretty good language for arranging UI and JavaScript is already proven to work well with an Html DOM. In the same vein CSS is a well established way to manage styles, why not use a subset of it to style a terminal?
 
  • Agree
  • Like
Reactions: ${Sandy} and Marvin
It doesn't shock me, I've seen programs from the 90's whose UI was written in Html and JavaScript and it's not like that's uncommon. It turns out Html is just a pretty good language for arranging UI and JavaScript is already proven to work well with an Html DOM. In the same vein CSS is a well established way to manage styles, why not use a subset of it to style a terminal?
Reminds me of Qt and its stylesheets. They even have javascript integration with QML.
 
Back