Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain cleanup methodology #6

Open
Daniel3356 opened this issue Apr 10, 2022 · 6 comments
Open

Domain cleanup methodology #6

Daniel3356 opened this issue Apr 10, 2022 · 6 comments

Comments

@Daniel3356
Copy link

Daniel3356 commented Apr 10, 2022

Here is the list of dead emails that need to be removed:
usbvap.com , hacktoy.com , outlook.sbs , yandex.cfd

@7c
Copy link
Owner

7c commented Apr 10, 2022

Thanks here is our first methodology for this problem:

once a domain was discovered, we want to keep this domain in our list 365 days and drop out of the list until it has been seen again and would be in the list for another 365 days since firstseen. The firstseen property is already part of our json format...

We have been thinking to check MX and A records and take them out of the list but based on our (even) short-time experience: those providers are most of the time small and might lose their hosting, or even DNS provider or even intentionally remove domains from DNS, this is very ambiguous way of 'detection'

We want to aim for 'reliable' way; all domains stay in the list once seen for 365 days and drop. Imagine you have a domain x.com and add to the disposable service, the chance that you still owe is very high, on the other hand if you drop the domain and someone else takes it; we would have a false-positive in the list but the chance that new owner hosts a mail server with hundreds of emails is very tiny and this is a risk we can take. Comments and ideas are welcome about our methodology.

@7c 7c changed the title Lean listing from dead emails Domain cleanup methodology Apr 10, 2022
@7c 7c pinned this issue Apr 28, 2022
@7c 7c reopened this Apr 28, 2022
@Daniel3356
Copy link
Author

Hey @7c,

Update of the unable domains:

mantutimaison.com
shhongshuhan.com
azwee.site
drhoangsita.com
snasu.info
yongshuhan.com
funplus.site
toanciamobile.com
zipzx.site
mamonsuka.com
mobitivaisao.com
usbvap.com
hansgu.com
filezw.site
mantutivi.com
ngocsita.com
hacktoy.com
phonestlebuka.com
zipea.site
bookel.site
hungtaoteile.com
omilk.site
devoi.site
prcea.site

Some of them have become premium domain such as:
usbvap.com sold for 114,986.58$CAD
hacktoy.com sold for 63,881.44$CAD
typery.com sold for 191,644.31$CAD
1ki.co sold for 879.01$CAD
oanhxintv.com sold for 3,689.79$CAD
king.buzz sold for 58,627.83$CAD

@d3xt3r01
Copy link

There must be some sort of way to use registrars to validate the domain still exists... Like whois does?

@d3xt3r01
Copy link

$ whois google.com -h whois.iana.org | grep whois:
$ whois google.com -h whois.verisign-grs.com

I suppose the initial tld data can be cached ...

@7c
Copy link
Owner

7c commented Jun 29, 2022

We still believe there is no need to drop them before 1 year. We know domain business, we are also capable of doing whois to verify but like i posted: "once a domain was discovered, we want to keep this domain in our list 365 days and drop out of the list until it has been seen again and would be in the list for another 365 days since firstseen. The firstseen property is already part of our json format..."

we do not need to worry about FREE domains, they are free, so invalid, so does not matter if they are detected as FAKE, which is kind of TRUE. If domain has changed ownership, which is 99.9% of cases not the case before 365 days, we will drop them anyways... If domain is still in ownership of that provider but changed their NS,MX,OWNER etc, they may use this method to obfuscate or hide domains from us...

If we keep a domain, regardless of any external data for 365 days and auto drop, i do not see any harm... quite the opposite... we will have solid detection and anti-obfuscation...

One scenario i can think of is: if a fake-email-provider adds domains they do not own. This would be a false-positive but we are filtering top 100k most visited websites domains from being added to mitigate this as much as possible.

@7c
Copy link
Owner

7c commented Nov 30, 2023

We have implemented the expiration code. We remove all domains they have not been seen last 365 days from our crawlers starting today.They will be removed from API/JSON/JSON_V2/MARKDOWN files and will only be visible EXPIRED DOMAINS SECTION

@7c 7c mentioned this issue Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants