Windows XP SP1 + Server 2003 source code has apparently leaked - WARNING ! Your system have been encrypted by Rensenware !

There's already an ex-Microsoft Windows kernel developer accusing the ReactOS team of incorporating stolen code.


I think it’s a ripoff of the Windows Research Kernel that Microsoft licensed to universities under an agreement that was obviously violated by some, as the code has been uploaded to numerous places, some of it on GitHub[1].

I glanced at the ReactOS code tree, and in my opinion, there is absolutely no way on earth this was written from a clean sheet only from the available public documentation.

For starters, there is no such thing as public documentation for the NT kernel internals. The only printed documentation consists of two black binders where every page are labeled Microsoft Confidential.

Many internal data structures and internal functions, not exported anywhere and not part of the public symbols, have the exact same names as they appear in the Research Kernel (which, by the way, is quite obsolete). There is an almost surely zero probability that this happened, at that scale, by accident.

A more sinister scenario (given the amount of code beyond what can be readily found in a few minutes of Googling) would be that ReactOS originated from one of the several leaks[2] that happened in the past.

Now, I′m not a lawyer. Why this is allowed to exist I don’t know. Probably because it ends up being something like a baby mosquito on the back of an giant elephant.

If any of the presumed authors wants to chime in and explain the similitudes, I’m happy to change my mind, but be ready to answer some though questions about the origins your coding and naming styles, and all the design choices that you made and why you ended up architecting and writing things the way you say you did ;)

bb.jpg
 
There's already an ex-Microsoft Windows kernel developer accusing the ReactOS team of incorporating stolen code.




View attachment 1621411
"there is absolutely no way on earth this was written from a clean sheet only from the available public documentation"

It wasn't. It was written from a clean sheet from documentation of the leaked code. That's legal. The code itself is protected by copyright, but what it does isn't protected by copyright.
 
There's already an ex-Microsoft Windows kernel developer accusing the ReactOS team of incorporating stolen code.




View attachment 1621411
I love the idea that much like the kilogram pre-2019, the only definition of Windows NT is a physical artifact somewhere :lol:

I've been taking a look around to see what the oldest code in this codebase is. Obviously the assembler code tends to be the oldest, as well as Microsoft's implementation of the CRT (or some subset of it)

The oldest actively maintained code seems to be some inverse trig assembler functions in NT\base\crts\fpw32\tran\i386\87triga.asm
Written in 1984, and then someone came back 12 and 16 years later to tweak some corner cases.

This may be old hat to some of you, but I was intrigued by the "off by TWO" error in Microsoft's strlen implementation. This too-clever-by-half implementation counts down from -1 and then negates the result (in two's complement) to end up with the length. Assembly people - doesn't that REPNE instruction really decrement ECX even on the final zero, making the final negative count really -(length+2), unlike the comment?
Anyway this is obviously fragile and of course someone got it wrong.

Code:
    page    ,132
    title    strlen - return the length of a null-terminated string
;***
;strlen.asm - contains strlen() routine
;
;    Copyright (c) 1985-1991, Microsoft Corporation. All rights reserved.
;
;Purpose:
;    strlen returns the length of a null-terminated string,
;    not including the null byte itself.
;
;Revision History:
;    04-21-87  SKS    Rewritten to be fast and small, added file header
;    05-18-88  SJM    Add model-independent (large model) ifdef
;    08-02-88  SJM    Add 32 bit code, use cruntime vs cmacros
;    08-23-88  JCR    386 cleanup
;    10-05-88  GJF    Fixed off-by-2 error.
;    10-10-88  JCR    Minor improvement
;    10-25-88  JCR    General cleanup for 386-only code
;    10-26-88  JCR    Re-arrange regs to avoid push/pop ebx
;    03-23-90  GJF    Changed to _stdcall. Also, fixed the copyright.
;    05-10-91  GJF    Back to _cdecl, sigh...
;
;*******************************************************************************

    .xlist
    include cruntime.inc
    .list

page
;***
;strlen - return the length of a null-terminated string
;
;Purpose:
;    Finds the length in bytes of the given string, not including
;    the final null character.
;
;    Algorithm:
;    int strlen (const char * str)
;    {
;        int length = 0;
;
;        while( *str++ )
;            ++length;
;
;        return( length );
;    }
;
;Entry:
;    const char * str - string whose length is to be computed
;
;Exit:
;    AX = length of the string "str", exclusive of the final null byte
;
;Uses:
;    CX, DX
;
;Exceptions:
;
;*******************************************************************************

    CODESEG

    public    strlen
strlen    proc \
    uses edi, \
    string:ptr byte

    mov    edi,string    ; edi -> string
    xor    eax,eax     ; null byte
    or    ecx,-1        ; set ecx to -1
repne    scasb            ; scan for null, ecx = -(1+strlen(str))
    not    ecx
    dec    ecx        ; ecx = strlen(str)
    mov    eax,ecx     ; eax = strlen(str)

ifdef    _STDCALL_
    ret    DPSIZE        ; _stdcall return
else
    ret            ; _cdecl return
endif

strlen    endp
    end
 
This may be old hat to some of you, but I was intrigued by the "off by TWO" error in Microsoft's strlen implementation. This too-clever-by-half implementation counts down from -1 and then negates the result (in two's complement) to end up with the length. Assembly people - doesn't that REPNE instruction really decrement ECX even on the final zero, making the final negative count really -(length+2), unlike the comment?
I would not call it too clever by half; it's simply the most efficient way of doing it in assembly. You are correct, however, that the value of ECX after the repne scasb loop is -(length+2). Whoever fixed the "off-by-2 error" (see the revision history comments) probably didn't fix the comment.

edit: interestingly, while it's not -(1+strlen(str)), it is ~(1+strlen(str)), where ~ is the bitwise not operation.
 
Last edited:
For starters, there is no such thing as public documentation for the NT kernel internals
The guy was so good at writing documentation for Windows internals that Microshaft hired him and Windows Sysinternals is an official Microsoft product complete with a blog and forum:

I would not call it too clever by half; it's simply the most efficient way of doing it in assembly.
This is actually completely incorrect, the most efficient way to do it (ignoring vector extensions cus idk which ones they had back then if any) is the GNU strlen algorithm:
You can implement this in something like 40 lines of Assembly
 
Last edited:
This is actually completely incorrect, the most efficient way to do it (ignoring vector extensions cus idk which ones they had back then if any) is the GNU strlen algorithm:
https://raw.githubusercontent.com/lattera/glibc/master/string/strlen.c You can implement this in something like 40 lines of Assembly
That is too clever by half.

Aside from the fact that it's difficult to understand, its worse case performance ("misfire" on every longword test) is actually worse than the simple implementation. Furthermore, it will in many cases actually read past the end of the character array, which is undefined behavior in C, meaning that it's compiler-dependent and non-portable.
 
That is too clever by half.

Aside from the fact that it's difficult to understand, its worse case performance ("misfire" on every longword test) is actually worse than the simple implementation. Furthermore, it will in many cases actually read past the end of the character array, which is undefined behavior in C, meaning that it's compiler-dependent and non-portable.
Sure but;
  1. We're assuming it gets written in Assembly like the other implementation so the C-specific issues aren't a problem
  2. The worst case performance is a pretty unlikely scenario, misfires only occur for characters 0xFF-0x80 which aren't often used in practice
 
Sure but;
  1. We're assuming it gets written in Assembly like the other implementation so the C-specific issues aren't a problem
  2. The worst case performance is a pretty unlikely scenario, misfires only occur for characters 0xFF-0x80 which aren't often used in practice
Fair. But even so, unless the performance is really a sticking point, I think that there's something to be said for shorter, more readable code. There's more to efficiency than just speed; you also have to consider the amount of code, and the ease of writing and debugging it.
 
Fair. But even so, unless the performance is really a sticking point, I think that there's something to be said for shorter, more readable code. There's more to efficiency than just speed; you also have to consider the amount of code, and the ease of writing and debugging it.
strlen() is one of those things that's used so much within programs that it's definitely worth optimising imo.
 
All right, I got my hands on the files, and everyone here has been missing the REAL scoop:
THE SOURCE CODE TO SOLITAIRE
NT\shell\osshell\games\sol

Hearts and the old Windows 3.x Reversi are also under NT\shell\osshell\games
And a number of the Entertainment Pack games (including Minesweeper) are under NT\shell\osshell\ep

dooflop isn't anywhere to be found, which is the other thing I'm sure everyone wanted to know.

A copy of the Hungarian Notation standard from 1988 is at NT\inetsrv\iis\svcs\cmp\doc\hungar.doc

From Freecell:

C:
        //
        // Caution:
        //    This shuffle algorithm has been published to people all around. The intention
        //    was to let people track the games by game numbers. So far all the games between
        //    1 and 32767 except one have been proved to have a winning solution. Do not change
        //    the shuffling algorithm else you will incur the wrath of people who have invested
        //    a huge amount of time solving these games.
        //

Do people really go one-by-one and try to win every Freecell game? :lol: That's some dedicated autism.
I HOPE they had an algorithm playing.
 
strlen() is one of those things that's used so much within programs that it's definitely worth optimising imo.
Perhaps, but you're not improving the big-O performance, and I seriously doubt you're going to run into a situation where strlen is a bottleneck - it's orders of magnitude faster than pretty much any I/O, for instance.
 
Did any of you really use XP? RTM/SP1 was a piece of shit.

this
on its debut, xp was a terrible, terrible OS. Like every new Windows ver at that time it had insane hardware requirements to run somewhat okaish, it was unstable, it needed a lot of HDD space, the UI was sluggish, buggy, was leaking memory etc etc, basically it was so bad that people didn't want to upgrade form 98/2000.
It only got marginally usable with SP1 and actually usable with SP2, and by that time the hardware got more powerful so it ran ok on an average system.
Also xp used NTFS and had real user accounts for that sweet, sweet security, but it also shipped with a hidden admin account called "Administrator" that had no password, so if you didnt know that it existed and neverput a password on it, anyone with access to your computer could log in as an admin.
I think the account is still there in 10, but it is disabled by default
 
By the way, there appears to be precisely one fingerprint of Bill Gates himself in this codebase:

In the FAT filesystem routines: NT\base\fs\fastfat\verfysup.c

C:
            //  This logic is a reasonable hack-o-rama to make BillG happy
            //  since his machine ran chkdsk after he installed Beta 3.  Why?

EDIT:
is the GNU strlen algorithm:
https://raw.githubusercontent.com/lattera/glibc/master/string/strlen.c You can implement this in something like 40 lines of Assembly
In fact, this code does show up in assembly form in places such as strncat:
NT\base\crts\crtw32\string\i386\strncat.asm
They claim to have gotten it from Intel in 1996 though.
The implementation of strlen that ships with Visual Studio today is an assembly version of that GNU-style algorithm too.
 
Some more interesting bits I found while searching for CPU-specific code:

From a VRML player (lol): a sort of hybrid table lookup/Newton-Raphson inverse square root algorithm that claims to be faster than the CPU intrinsics (as of 1997)
NT\multimedia\danim\src\daxctl\inc\recsqrt.h

A blast from the past: the pentnt utility to check for the Pentium divide bug
NT\base\fs\utils\pentbug\pentnt.c
And an FDIV workaround in assembly:
NT\base\crts\fpw32\tran\i386\adj_fdiv.asm

It seems that Microsoft didn't have any advance notice of the bug, the timestamps for this stuff are all December 1994 or later, well after the issue was made public.

Cyrix support in the kernel:
NT\base\ntos\ke\i386\cyrix.c

The newest processor explicitly coded for in here seems to be the Pentium 4.
 
  • Like
Reactions: Kosher Salt
"there is absolutely no way on earth this was written from a clean sheet only from the available public documentation"

It wasn't. It was written from a clean sheet from documentation of the leaked code. That's legal. The code itself is protected by copyright, but what it does isn't protected by copyright.
That's how the world got BSD after all.
 
Back