-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Domain cleanup methodology #6
Comments
Thanks here is our first methodology for this problem: once a domain was discovered, we want to keep this domain in our list 365 days and drop out of the list until it has been seen again and would be in the list for another 365 days since firstseen. The We have been thinking to check MX and A records and take them out of the list but based on our (even) short-time experience: those providers are most of the time small and might lose their hosting, or even DNS provider or even intentionally remove domains from DNS, this is very ambiguous way of 'detection' We want to aim for 'reliable' way; all domains stay in the list once seen for 365 days and drop. Imagine you have a domain x.com and add to the disposable service, the chance that you still owe is very high, on the other hand if you drop the domain and someone else takes it; we would have a false-positive in the list but the chance that new owner hosts a mail server with hundreds of emails is very tiny and this is a risk we can take. Comments and ideas are welcome about our methodology. |
Hey @7c, Update of the unable domains:
Some of them have become premium domain such as: |
There must be some sort of way to use registrars to validate the domain still exists... Like |
$ whois google.com -h whois.iana.org | grep whois: I suppose the initial tld data can be cached ... |
We still believe there is no need to drop them before 1 year. We know domain business, we are also capable of doing whois to verify but like i posted: "once a domain was discovered, we want to keep this domain in our list 365 days and drop out of the list until it has been seen again and would be in the list for another 365 days since firstseen. The firstseen property is already part of our json format..." we do not need to worry about FREE domains, they are free, so invalid, so does not matter if they are detected as FAKE, which is kind of TRUE. If domain has changed ownership, which is 99.9% of cases not the case before 365 days, we will drop them anyways... If domain is still in ownership of that provider but changed their NS,MX,OWNER etc, they may use this method to obfuscate or hide domains from us... If we keep a domain, regardless of any external data for 365 days and auto drop, i do not see any harm... quite the opposite... we will have solid detection and anti-obfuscation... One scenario i can think of is: if a fake-email-provider adds domains they do not own. This would be a false-positive but we are filtering top 100k most visited websites domains from being added to mitigate this as much as possible. |
We have implemented the expiration code. We remove all domains they have not been seen last 365 days from our crawlers starting today.They will be removed from API/JSON/JSON_V2/MARKDOWN files and will only be visible EXPIRED DOMAINS SECTION |
Here is the list of dead emails that need to be removed:
usbvap.com
,hacktoy.com
,outlook.sbs
,yandex.cfd
The text was updated successfully, but these errors were encountered: