Appears to have leaked from a cloud thanks to sloppy coding
Laura Dobberstein Tue 5 Jul 2022 // 06:04 UTCA threat actor has taken to a forum for news and discussion of data breaches with an offer to sell what they assert is a database containing records of over a billion Chinese civilians – allegedly stolen from the Shanghai Police.
Over the weekend, reports started to surface of a post to a forum at Breached.to. The post makes the following claim:
In 2022, the Shanghai National Police (SHGA) database was leaked. This database contains many TB of data and information on Billions of Chinese citizens.
HackerDan offered to sell the lot for 10 Bitcoin – about $200,000. We've saved HackerDan's post as a PDF in case it vanishes.
HackerDan released sample datasets: one containing delivery addresses and often instructions for drivers; another with police records; and the last with personal identification information like name, national ID number address, height, and gender.
China has a national police force, and that presumably has a Shanghai office. But an entity called the "Shanghai National Police" is hard to find.
Media outlets were nonetheless able to verify that the contents of the sample - whatever the source - describe actual people.

Tweet (Archive)
ChinaDan

June 30, 2022, 08:55 AM
In 2022, the Shanghai National Police (SHGA) database was leaked. This database contains many TB of data and information on Billions of Chinese citizens.
Sell: Shanghai GOV (SHGA.gov.cn) National Police Database
Host: http://oss-cn-shanghai-shga-d01-a.ops.ga.sh/
Data leaked from these tables:
Data Details:
Databases contain information on 1 Billion Chinese national residents and several billion case records, including:
- Name
- Address
- Birthplace
- National ID Number
- Mobile number
- All Crime / Case details
UPDATE: Per request, sample size increased to 750k (250k for each of the 3 main index): https://gofile.io/d/sCggGC
Staff update: Due to the chances of the file being reported to gofile, uploaded the sample to our own servers: https://cdn.breached.to/shga_sample_750k.tar.gz
PRICE: I am selling all of this data for 10BTC ($200k USD)
Contact for XMPP: dataman@rows.im

June 30, 2022, 08:55 AM
In 2022, the Shanghai National Police (SHGA) database was leaked. This database contains many TB of data and information on Billions of Chinese citizens.
Sell: Shanghai GOV (SHGA.gov.cn) National Police Database
Host: http://oss-cn-shanghai-shga-d01-a.ops.ga.sh/
Data leaked from these tables:
Code:
----TABLES----
person_address_label_info_slave QFpD25bKTJ2eQBxcbe2Aaw 90 0 546148916 0 172.2gb 172.2gb
nb_theme_address_merge_tracks_slave -bUMVB1uRRusUbbqZepEpA 300 0 37483779369 4 22.4tb 22.4tb
nb_theme_address_case_dwd_test 7COIWTt7QU-YPwWub8z_SQ 150 0 22375506 1749307 25.2gb 25.2gb
nb_theme_address_company_dwd-total fpnmEYB9SI6WevHnZIEwIA 150 0 1842856 0 2.8gb 2.8gb
nb_theme_address_case_dwd-total 7X8oNqULQnWFLpzHDaUTbg 150 0 1214119253 0 1tb 1tb
nb_theme_address_company_dwd_test g5f6l4LGQcGL3oQ6ON2Bbw 150 0 2017931 0 4.3gb 4.3gb
person_address_label_info_master t64pp9WnS3maY9jBjzTtiw 90 0 969830088 0 282.8gb 282.8gb
Data Details:
Databases contain information on 1 Billion Chinese national residents and several billion case records, including:
- Name
- Address
- Birthplace
- National ID Number
- Mobile number
- All Crime / Case details
UPDATE: Per request, sample size increased to 750k (250k for each of the 3 main index): https://gofile.io/d/sCggGC
Staff update: Due to the chances of the file being reported to gofile, uploaded the sample to our own servers: https://cdn.breached.to/shga_sample_750k.tar.gz
PRICE: I am selling all of this data for 10BTC ($200k USD)
Contact for XMPP: dataman@rows.im
Archive of sample
我下载解包之后导入了excel,发现了三个各含25万条数据的手机号+姓名+地址+身份证号,有效数据总计74万6800多行,最老的数据居然有1930年代生人,某些数据还注明了这些数据来自某某人口办单位,可能是核算或人口普查时留痕的。之后我随机抽取了15行数据,把他们的手机号用“支付宝转账”的姓名校验功能作了验证,结果每一个注册的支付宝实名账号都是真实存在的,每一个人都能被验证(即样本里数据真实)。数据非常杂,地域分布全国,地址也看不出来是快递地址还是备案住址。目前看不出来实用价值,除非是虚假注册账号之类需要这种,但是10BTC的售价成本显然高昂,真搞诈骗的买不起这么贵而且还没经过处理加工的数据。是真的,这个作者在他的发布源里贴了部分样本和索引集
在第二个包里我找到了“设法联系车主将车撤离”、“报警处理不立案”等字眼,初步判断这是110报警调度台、12345热线或其他应急服务电话调度的数据库数据,还有部分数据是公安局派出所数据,数据构成是“报警原因、出警结果”,偷车和民事纠纷记录特别多。
第三个包里除了身份证信息,所有行数据共同指向一个叫“oss-cn-xx/xxx/xxxx/xxxxx”的数据库,各地都有,应该是个能共同访问的数据库。从文件名判断,里面所有人都有的数据:1.莫名其妙的照片,有出境照片、有证件照、有工作照片,还有在逃人员的照片;2.宗教信仰、民族;3.宾馆酒店入住人脸识别登记信息;4.死亡证明;5.未成年人照片‘’6.驾驶证、执业证;7.不知道代表什么的莫名其妙的照片;8.居住证,身份证照片
然后特别有意思的是这几个样本里都有几个奇奇怪怪的项,比如:
1."PROF":"粮农"、"PROF":"退休工人"、"PROF":"操作工"、"PROF":"公安厅离退休干部工作处副调研员"
初步判断是调研或调查所定义的职业,或者来自于某些个人自己填写的登记表,种类非常多,没有标准化。
2."QUERY_STRING":"交通违法 实有 , "LABELNAMES":"交通违法 社会补助人员 常住人口 实有人口"、"LABELNAMES":"关注人员_涉毒关注人员"、"LABELNAMES":"交通违法 支内人员"、ESCU":"未服兵役","HEIGHT":"164"、EDEGREE":"学龄前儿童","ESCU":"未服兵役""MARR":"丧偶"。
这类属于教育程度+违法信息+特殊备注+人口属性和其他个人隐私
3.这个我看不懂,有些人有特殊“编号”"LABS":"AB00xxxx",最后四位不同,每个人编号都不同,也有些人都相同,有些人没有,有些人有,有些人一个人就有四五个号码。我看了很多遍,和地域 年龄 性别 是否犯罪 成年未成年 工作 照片 什么人口属性等 一概没有关系,完全没有规律,不知道这个编号代表什么,但唯一可以确定的是这个LABS+值,是一种标签,虽然不知道这个LABS代表什么,但绝对是标签,因为英文的laboratory就是标签的意思,和LAB缩写有近义的还有 labour(劳工)、lab的缩写也是研究的意思
最后我想说的是:
1.如果这些数据真的能涵盖10亿人,大家不要心存侥幸,只要你报过警或者有注册证件等情况,你的名字就一定被采集在这个库里,剩下三亿可能是未成年人还没来得及建库或者有独立的名单而已。
2.以现在AI自动化处理数据的能力,这些数据绝对被各省市的数据中心处理过了,说的通俗点就是10亿中国人每个人都有一个文件夹,仅在数据这个层面,国家或政府已经拥有了比互联网公司精细几千倍的用户画像,从你生老病死到衣食住行,都在这个文件夹里。
3.结合现在的数字化基建,这是个很恐怖的事情,和你不曾相识的人,可能通过一个摄像头就能瞬间掌握你的所有信息,“举头三尺有神明、小心今后拉清单”绝不是嘴上说说
4.千万不要有“泄露了也无所谓”的想法或态度,大家要记得弱价值如果引起量变,那必然是成为强价值,就拿这些不重名的75万条人口信息来说,拉进数据库用算法筛选或跑一遍,人口结构、地域分布、男女比例、教育水平、儿童比例、犯罪率、住房率、兵役率、民事纠纷排名、警情处理能力等等这些东西,分析出来就是分分钟的事情,这还仅仅只是75万条记录所能展示的信息,如果真有十亿条,那这里面会不会有财产信息?会不会有健康信息?会不会有公共信息?会不会有司法信息?会不会有其他更多的敏感隐私信息?谁也不敢保证。孙子兵法说“知己知彼,百战不殆”,这玩意外泄就相当于对方完全“知彼”了。六度分隔理论说通过6个人你就可以认识任意一个人,那通过10亿个人能获取的信息,难道会比六度分隔理论要少吗?想想就脊背发凉啊!
5.最后事已至此,既然政府无能,那每个人都该唤醒自己的隐私保护意识,奉劝大家:保护好自己的隐私,不要在纸上、登记表上、互联网上,过度留下自己的隐私信息,包括但不限于电话、住址、身份证、照片和其他隐私信息
It's true, this author has posted some samples and index sets in his release source
I downloaded and unpacked the data into excel and found three cell phone numbers containing 250,000 pieces of data + name + address + ID number, a total of 746,800 lines of valid data, the oldest data actually have 1930s birthdays, some data also indicated that these data from a certain population office units, may be the accounting or census to leave traces. After that, I randomly selected 15 rows of data and verified their cell phone numbers with the name verification function of "Alipay Transfer", and the result was that every registered Alipay real name account was real and every person could be verified (i.e. the data in the sample was real). The data is very mixed, the geographical distribution of the country, the address can not be seen is the express address or record address. At present, we do not see the practical value, unless it is a false registration account such as the need for this, but the cost of 10BTC is obviously high, the real fraud can not afford to buy such an expensive and unprocessed data.
In the second package I found the words "try to contact the owner of the car to evacuate", "police processing is not filed", etc. Initially, this is 110 police dispatcher, 12345 hotline or other emergency services phone dispatch database data, and some data It is the data of the police station of the Public Security Bureau, and the composition of the data is "the cause of the alarm, the result of the police", and there are especially many records of car theft and civil disputes.
In the third package, except for the ID card information, all rows of data point to a database called "oss-cn-xx/xxx/xxxx/xxxxx", which is available everywhere and should be a common database that can be accessed. Judging from the file name, the data that all people have in it: 1. inexplicable photos, including exit photos, ID photos, work photos, and photos of fugitives; 2. religious beliefs, ethnicity; 3. face recognition registration information for hotel check-in; 4. death certificates; 5. photos of minors'' 6. driver's license, license License; 7. Inexplicable photos that do not know what they represent; 8. Residence permit, ID card photos
What is particularly interesting is that there are several odd items in these samples, such as
1. "PROF": "grain farmer", "PROF": "retired worker", "PROF": "operator", "PROF": "deputy researcher of the retired cadres work department of the Public Security Department"
The preliminary judgment is that the occupations defined by the research or survey, or from the registration forms filled out by some individuals themselves, are very diverse and not standardized.
2. "QUERY_STRING": "Traffic violations Actual , "LABELNAMES": "Traffic violations Socially subsidized personnel Resident population Actual population", "LABELNAMES": "Person of concern_ Drug-related persons of concern", "LABELNAMES": "Traffic violations Persons within the branch", "ESCU": "Not in military service", "HEIGHT ": "164", "EDEGREE": "Preschooler", "ESCU": "Not in military service", "MARR": "Widowed".
This category belongs to educational level + illegal information + special remarks + demographic attributes and other personal privacy
3. I can not understand this, some people have special "number" "LABS": "AB00xxxx", the last four different, each person number is different, and some people are the same, some people do not, some people have, some people have a person has four or five numbers. I looked many times, and the geographic area, age, gender, whether crime, adults, minors, work, photos, what demographic attributes, etc., all have no relationship, there is no law, do not know what this number represents, but the only thing that can be determined is that this LABS + value, is a kind of label, although do not know what this LABS represents, but is definitely a label, because the English laboratory is the meaning of the label. and LAB abbreviation has a close meaning and labor (labor), lab abbreviation is also the meaning of research
The last thing I want to say is.
1. If these data can really cover 1 billion people, we should not take any chances, as long as you have reported to the police or have registered documents and other cases, your name must be collected in this library, the remaining 300 million may be minors have not had time to build a library or have a separate list only.
2. With the ability of AI automated data processing, these data are definitely processed by the data center of the provinces and cities, to put it plainly is 1 billion Chinese people each have a folder, only in this level of data, the state or government already has thousands of times finer than the Internet company's user portrait, from your birth, old age, illness, death to clothing, food, housing and transportation, are in this folder.
3. Combined with the current digital infrastructure, this is a very scary thing, and you have never met people, through a camera may be able to instantly grasp all your information, "raise your head three feet there is a god, beware of future pull list" is never a mouth to say
4. Do not have the idea or attitude of "it does not matter if it is leaked", we must remember that if the weak value causes a quantitative change, it is bound to become a strong value, take the 750,000 pieces of population information that are not renamed, pull into the database with algorithms to screen or run through, demographic structure, geographical distribution, male to female ratio, education level, child ratio The crime rate, housing rate, military service rate, civil dispute ranking, police handling ability and so on these things, analysis out is a matter of minutes, and this is only 750,000 records can show the information, if there are really a billion, then there will be property information? Will there be health information? Will there be public information? Will there be judicial information? Will there be other more sensitive and private information? No one can guarantee. Sun Tzu's Art of War says, "If you know yourself and your enemy, you will never lose a hundred battles". Six degrees of separation theory says that through six people you can know any person, then through a billion people can get the information, will be less than the six degrees of separation theory? Think about it, it's a chill down your spine!
5. Finally, since the government is incompetent, everyone should wake up their own sense of privacy protection, and advise everyone: protect your privacy, do not leave your private information on paper, registration forms, or the Internet, including but not limited to telephone, address, ID card, photos and other private information
I downloaded and unpacked the data into excel and found three cell phone numbers containing 250,000 pieces of data + name + address + ID number, a total of 746,800 lines of valid data, the oldest data actually have 1930s birthdays, some data also indicated that these data from a certain population office units, may be the accounting or census to leave traces. After that, I randomly selected 15 rows of data and verified their cell phone numbers with the name verification function of "Alipay Transfer", and the result was that every registered Alipay real name account was real and every person could be verified (i.e. the data in the sample was real). The data is very mixed, the geographical distribution of the country, the address can not be seen is the express address or record address. At present, we do not see the practical value, unless it is a false registration account such as the need for this, but the cost of 10BTC is obviously high, the real fraud can not afford to buy such an expensive and unprocessed data.
In the second package I found the words "try to contact the owner of the car to evacuate", "police processing is not filed", etc. Initially, this is 110 police dispatcher, 12345 hotline or other emergency services phone dispatch database data, and some data It is the data of the police station of the Public Security Bureau, and the composition of the data is "the cause of the alarm, the result of the police", and there are especially many records of car theft and civil disputes.
In the third package, except for the ID card information, all rows of data point to a database called "oss-cn-xx/xxx/xxxx/xxxxx", which is available everywhere and should be a common database that can be accessed. Judging from the file name, the data that all people have in it: 1. inexplicable photos, including exit photos, ID photos, work photos, and photos of fugitives; 2. religious beliefs, ethnicity; 3. face recognition registration information for hotel check-in; 4. death certificates; 5. photos of minors'' 6. driver's license, license License; 7. Inexplicable photos that do not know what they represent; 8. Residence permit, ID card photos
What is particularly interesting is that there are several odd items in these samples, such as
1. "PROF": "grain farmer", "PROF": "retired worker", "PROF": "operator", "PROF": "deputy researcher of the retired cadres work department of the Public Security Department"
The preliminary judgment is that the occupations defined by the research or survey, or from the registration forms filled out by some individuals themselves, are very diverse and not standardized.
2. "QUERY_STRING": "Traffic violations Actual , "LABELNAMES": "Traffic violations Socially subsidized personnel Resident population Actual population", "LABELNAMES": "Person of concern_ Drug-related persons of concern", "LABELNAMES": "Traffic violations Persons within the branch", "ESCU": "Not in military service", "HEIGHT ": "164", "EDEGREE": "Preschooler", "ESCU": "Not in military service", "MARR": "Widowed".
This category belongs to educational level + illegal information + special remarks + demographic attributes and other personal privacy
3. I can not understand this, some people have special "number" "LABS": "AB00xxxx", the last four different, each person number is different, and some people are the same, some people do not, some people have, some people have a person has four or five numbers. I looked many times, and the geographic area, age, gender, whether crime, adults, minors, work, photos, what demographic attributes, etc., all have no relationship, there is no law, do not know what this number represents, but the only thing that can be determined is that this LABS + value, is a kind of label, although do not know what this LABS represents, but is definitely a label, because the English laboratory is the meaning of the label. and LAB abbreviation has a close meaning and labor (labor), lab abbreviation is also the meaning of research
The last thing I want to say is.
1. If these data can really cover 1 billion people, we should not take any chances, as long as you have reported to the police or have registered documents and other cases, your name must be collected in this library, the remaining 300 million may be minors have not had time to build a library or have a separate list only.
2. With the ability of AI automated data processing, these data are definitely processed by the data center of the provinces and cities, to put it plainly is 1 billion Chinese people each have a folder, only in this level of data, the state or government already has thousands of times finer than the Internet company's user portrait, from your birth, old age, illness, death to clothing, food, housing and transportation, are in this folder.
3. Combined with the current digital infrastructure, this is a very scary thing, and you have never met people, through a camera may be able to instantly grasp all your information, "raise your head three feet there is a god, beware of future pull list" is never a mouth to say
4. Do not have the idea or attitude of "it does not matter if it is leaked", we must remember that if the weak value causes a quantitative change, it is bound to become a strong value, take the 750,000 pieces of population information that are not renamed, pull into the database with algorithms to screen or run through, demographic structure, geographical distribution, male to female ratio, education level, child ratio The crime rate, housing rate, military service rate, civil dispute ranking, police handling ability and so on these things, analysis out is a matter of minutes, and this is only 750,000 records can show the information, if there are really a billion, then there will be property information? Will there be health information? Will there be public information? Will there be judicial information? Will there be other more sensitive and private information? No one can guarantee. Sun Tzu's Art of War says, "If you know yourself and your enemy, you will never lose a hundred battles". Six degrees of separation theory says that through six people you can know any person, then through a billion people can get the information, will be less than the six degrees of separation theory? Think about it, it's a chill down your spine!
5. Finally, since the government is incompetent, everyone should wake up their own sense of privacy protection, and advise everyone: protect your privacy, do not leave your private information on paper, registration forms, or the Internet, including but not limited to telephone, address, ID card, photos and other private information
Google Translate (Archive)
"Five people confirmed all of the data, including case details that would be difficult to obtain from any source other than the police. Four more people confirmed basic information such as their names before hanging up," reported the Wall Street Journal.
The WSJ's reporter Karen Hao described the experience on Twitter:

Tweet (Archive)
While the Shanghai government and police department have largely been silent over the leak, social media platforms Weibo and WeChat were not – at least until Sunday afternoon when users on Weibo began experiencing data leak-related blocked hashtags.
On Monday, an unusual voice joined in the analysis of the event: Changpeng Zhao, the CEO of cryptocurrency exchange Binance.
"CZ" – as he's known – took to Twitter with the following:

Tweet (Archive)
CZ's post came four days after HackerDan's so, while some facts matched, it was unclear if the CEO was referring to a different event.
He later tweeted again, this time alleging "this exploit happened because the gov developer wrote a tech blog on CSDN and accidentally included the credentials."
Whatever the source of the leak, it will mightily annoy China. The nation's government has recently prioritized personal data protection and critical infrastructure security. If the People's Police have mucked up on both counts, that will not go down well. ®
Source (Archive)
WSJ Article:
Vast Cache of Chinese Police Files Offered for Sale in Alleged Hack
Leak would be one of largest in history if confirmed, covering a billion people; some data is verified as real

The cache of hacked data allegedly includes billions of records stolen from police in Shanghai, according to a post advertising its availability.PHOTO: WANG ZHAO/AGENCE FRANCE-PRESSE/GETTY IMAGES
By Karen Hao in Hong Kong and Rachel Liang in Singapore
July 4, 2022 9:14 am ET
A vast trove of data on Chinese citizens allegedly siphoned from a police database, some of which checks out as legitimate, is being offered for sale by an anonymous hacker or hacking group. If confirmed, it would mark one of history’s largest leaks of personal data.
The cache allegedly includes billions of records stolen from police in Shanghai, containing data on one billion Chinese citizens, according to a post advertising its availability that was published on Thursday by the hacker on a popular online cybercrime forum. The post, which began circulating on social media over the weekend, put the price for the leak at 10 Bitcoin, or roughly $200,000.
Cybersecurity experts say the claimed hack is alarming not just for its alleged size—which would rank among the biggest ever recorded and the largest known for China—but also because of the sensitivity of the information contained in the government database.
A sample of the data posted by the hacker, who claimed it included 750,000 records, contained individuals’ personal names, national ID numbers, phone numbers, birthdays and birthplaces, as well as detailed summaries of crimes and incidents reported to the police. The cases ranged from incidents of petty theft and cyber fraud to reports of domestic violence, dating as far back as 1995 to as recently as 2019.
While the scope of the data leak remains unconfirmed, The Wall Street Journal verified several of the records in the leak by calling people whose numbers were listed. Five people confirmed all of the data, including case details that would be difficult to obtain from any source other than the police. Four more people confirmed basic information such as their names before hanging up.
One woman, alarmed at the accuracy of the leaked details, asked whether the information about her had come from the iPhone that she had reported stolen in her case file in 2016.

Public anger over data breaches has grown in China amid concern that they have reached unbearable levels.PHOTO: ALY SONG/REUTERS
Another man, surnamed Wei, who had reported being defrauded of 30,000 yuan after scammers persuaded him to join an investment scheme, according to the records released by the hacker, sighed after hearing that his data had been leaked. “We are all running naked,” he said, using popular Chinese slang for a lack of privacy.
Cybersecurity experts remain cautious, however, about believing all of the hackers’ claims.
Troy Hunt, a web-security consultant based in Australia, said the sheer size of the database—which would include the majority of China’s population of 1.4 billion people—drew some suspicion, as did the anonymity of the user who posted the information.
While most hackers are driven by financial motives, the solicitation for a large sum of money also raises the possibility that the claim has been exaggerated or falsified, Mr. Hunt said.
Several numbers that the Journal tried were invalid or no longer in service. It is not uncommon for mobile phone users in China to change their numbers every few years.
The Shanghai police, Shanghai propaganda office and Chinese internet regulator didn’t respond to requests for comment.
On the forum where the database has been posted for sale, the hacker or group has claimed that the breach targeted Aliyun, a cloud computing subsidiary of
Alibaba Group, which they said hosts the Shanghai police database.
Alibaba said it was aware of the incident and was investigating it.
On Monday, Zhao Changpeng, CEO of cryptocurrency exchange Binance, tweeted that the company had detected the hack and had “stepped up verifications for users potentially affected.” Binance didn’t respond to a request for comment.
Data leaks have been rampant globally in recent years. In 2021, a total of 4,145 publicly disclosed breaches collectively exposed over 22 billion records, according to cybersecurity company Risk Based Security.
But such a massive leak would be particularly sensitive in China, where black-market data brokers once did a brisk business trafficking in personal data. Over the past few years, Beijing has ramped up its protection of personal information, most recently passing the Personal Information Protection Law in 2021, in part due to widespread public anger that data breaches had reached unbearable levels.
The actions have specifically targeted companies, however, leaving broad carve-outs for the government collection of information under national security considerations.
Cybersecurity experts say such a breach could have lasting and unpredictable consequences for the individuals affected.
“Trying to remove your information from the internet is like trying to remove pee from a pool,” said Mr. Hunt. “It just goes into a big melting pot of exposed data and you have no idea which bit has come from where.”
Mr. Hunt added that the leak highlighted how little China’s extensive system of internet filters, commonly referred to as the Great Firewall, can do to prevent its citizens’ data from being hacked and posted online for anyone to have access to.
“Despite China’s best efforts, the internet really doesn’t have borders,” he said.
Source (Archive)
Blogpost where credentials were supposedly leaked from (Archive) (Source (Archive))
Binance CEO Tweet saying same thing (Archive)

Last edited: