-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting out of scope domains output #18
Comments
Not sure if related or should be its own bug, but I did a run with a target file containing 3 domains. I passed that in to both -list and -match hoping it would restrict it to the domains that way, but I still got some external domains popping up. edit: update on this second case using a match file is likely working as expected. I had stripped off the paths so I could check the domains and I missed the fact that the paths contained my target domains. For example if blogspot.com was my target, this would come back on a matcher https://allo.google.com/url?q=http://rivexapa.blogspot.com/ So the matcher seems to do what it should, I just did not think about this case. The original issue where it came back with the domain not in my target list in the first place is still an issue. |
This seems to be caused by the regex extractor used here and here. The extractor tries to extract every URL found in the results (that are URLs themselves), so if the target is |
Describe the bug
I am still digging to see how consistent this is, but I am doing basic urlfinder scans with jsonl output. When I extract the url field and look at my unique values, I am getting results that were not what I input.
urlfinder version
Include the version of urlfinder you are using,
urlfinder -version
Complete command you used to reproduce this
urlfinder -list target.txt -output output.txt
jq -r '.url' output | unfurl -u domain
target.txt contains just google.com
The text was updated successfully, but these errors were encountered: