Resources for ICWE2019
Introduction
The following links are provided for anyone who wants to follow our ideas in paper "The “Game Hack” Scam", all the files are in JSON format, except the files in the Dom directories whaich are *.html files. In the paper, all the URLs mentioned in "Experiment" section is the final URL. The prefix of file name in DOM folder is MD5 generated by the first URL of page. To facilitate the work of dealing with data, except the links in "URLs" section, all the other links provided in this page are the list of files in corresponding DOM folder.
For the "Game Hack Scam" (GHS) links in "URL" section, it's organized by a dictionary format, as the following format.dict{ "URL md5":{ "first": "the first URL", "land": "the final URL" }, ... }For the "Offers domains" link in "URL" section, it's organized as the following format.
dict{ "offer domain":{ "count": "the numer of content lockers leading to the domain", "links": [ list of the content lockers that led to domain URL] }, ... }For the "n-grams" link in "URL" section, it's organized as the following format.
dict{ n-gram value (1,2,..):{ "count": "the numer of terms this n-gram contains", "terms": [ list of the terms in this n-gram] }, ... }For the "search queries" link in "URL" section its organized as a list.
DOMs
The DOM Folder is too large,its around 40GB for the crawled search URL's only, here we divide them into different files. The maximum size of each file is from 5GB to 10GB.
Legitimate DOM of Search URLs Crawled from May 1st to Septemper 30th
Link1
Link2
Link3
Link4
Link5
Game Hack Scam Dom Crawled from May 1st to Septemper 30th
Link1
Cite as
Badawi E., Jourdan GV., Bochmann G., Onut IV., Flood J. (2019) The “Game Hack” Scam. In: Bakaev M., Frasincar F., Ko IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science, vol 11496. Springer, Cham
URLs
GHSi's URL
Offers URL
n-grams URL
Search Queries URL
Paper Presentation