Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting out of scope domains output #18

Closed
fail-open opened this issue Nov 14, 2024 · 2 comments · Fixed by #26
Closed

Getting out of scope domains output #18

fail-open opened this issue Nov 14, 2024 · 2 comments · Fixed by #26
Assignees

Comments

@fail-open
Copy link

Describe the bug
I am still digging to see how consistent this is, but I am doing basic urlfinder scans with jsonl output. When I extract the url field and look at my unique values, I am getting results that were not what I input.
urlfinder version
Include the version of urlfinder you are using, urlfinder -version

Complete command you used to reproduce this
urlfinder -list target.txt -output output.txt
jq -r '.url' output | unfurl -u domain

small sampling of output 
jokadola.blogspot.com
www.robloxfreerobux.biz
trchatsohbetsitesi.blogspot.com
redneckpassions.com
rogabubu.blogspot.com
rinukeyo.blogspot.com
johoteke.blogspot.com
nickelbackpassions.com
ricezeyi.blogspot.com
www.italia-risparmio.it
happyland.net.vn
https
rofoheqo.blogspot.com
jogeguta.blogspot.com
qm5t0kieiv.ga
hubcontrol.ga
eusouafrpromotora.blogspot.com
cilveli.net
market.android.com
bola81.online
tvpassions.com
allrevolution.ga
www.tusaybat.com
tcchatrandom.blogspot.com

target.txt contains just google.com

@fail-open
Copy link
Author

fail-open commented Nov 14, 2024

Not sure if related or should be its own bug, but I did a run with a target file containing 3 domains. I passed that in to both -list and -match hoping it would restrict it to the domains that way, but I still got some external domains popping up.

edit: update on this second case using a match file is likely working as expected. I had stripped off the paths so I could check the domains and I missed the fact that the paths contained my target domains. For example if blogspot.com was my target, this would come back on a matcher https://allo.google.com/url?q=http://rivexapa.blogspot.com/

So the matcher seems to do what it should, I just did not think about this case. The original issue where it came back with the domain not in my target list in the first place is still an issue.

@dogancanbakir dogancanbakir self-assigned this Nov 15, 2024
@mhmdiaa
Copy link

mhmdiaa commented Nov 15, 2024

This seems to be caused by the regex extractor used here and here.

The extractor tries to extract every URL found in the results (that are URLs themselves), so if the target is foo.com and it encounters https://foo.com/redirect?url=https://bar.com, the regex will extract https://bar.com and add it to the results as well.

@ehsandeep ehsandeep changed the title [Issue] Getting out of scope domains in my results Getting out of scope domains output Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants