Thats pretty interesting, which model and what prompt are you using?
platypus instruct v2. I tried many of the different advertised models that came after it but couldn't get rid of the feeling that they were just finetuned to perform better on benchmarks and weren't actually any better.
It was quite the process to get there. I first built a context by basically consistently repeating it's cutoff date and reminders to treat the cutoff date as the current date and to avoid specific phrases, like "for that period" or "for the time". There's this saying about the "pink elephant", basically meaning not to tell the AI to avoid specific phrases directly, as it will just make more likely it will use them. I found this true for the GPTs, not so much for the llamas. If you tell a 70b to not talk about a "pink elephant", it will actually avoid talking about it.
It would give me appropriate replies but then add things like "However, in 2014..." or "Eventually in the future..." etc. I then rewrote such replies by hand to be more period-appropriate and then eventually, as the context filled with "correct" replies, it kept happening less and less. I then "reversed the roles" and set it up to ask itself random questions with the prebuilt context and gathered the replies as a dataset. I then regexed through that dataset and nuked all replies that had specific key phrases I found were likely "future contamination" if you will, like "in the future" "90s" future year numbers etc.. I also randomly went through and manually checked/edited some replies.
I then manually built a context of refusals (What is Nvidia? What is OpenAI? Who is Mark Zuckerberg? - Things not known in the 80s) and it surprisingly picked relatively quickly up on it. Put some of that into the dataset too.
I then trained a qlora with that (tiny but I'd say good quality) dataset and it got pretty solid as long as you mention the cutoff date in the context. You can still "jailbreak" it relatively easy to forget about the 80s but by default, it will reliably stay there. I overcooked it a bit on the refusals as it will sometimes pretend to not know people that already were kinda in the public eye but not as known as now, e.g. if you roll often enough It'll say it doesn't know who Donald Trump is. This is rare though. It's also a bit hyperfocused on mentioning the date (which has no special meaning, I picked it randomly) which I assume has to do with the wording in the dataset.
It was a proof-of-concept and practice for me. I rented GPU time for the training. loras and qloras have a bit of a bad name in the community of being kinda useless but I feel what people want from them is just too vague. "tell good stories" or "be better at roleplay" are not good goals for a lora and I feel when you make the datasets for them too diverse, all you get is noise. When they then notice their work didn't have much of a noticeable impact on the model, they completely fry and overfit on the lora to the point that it'll reply things out of the dataset verbatim and get brain damaged. I learned that from my time playing around with loras and SD. People just don't know what a "light touch" is.