Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 3221225762 in AD authentication #8345

Open
MarioSpenc opened this issue Oct 10, 2024 · 9 comments
Open

Error 3221225762 in AD authentication #8345

MarioSpenc opened this issue Oct 10, 2024 · 9 comments

Comments

@MarioSpenc
Copy link

We have a ISO Debian 12 PF V14.0 installation with data import from running-fine 13.1.

Basically it works, 50% of time, between we have following rejects in Radius Audit logs:

Called-Station-Id = "80:e8:6f:9c:18:87",
Calling-Station-Id = "2c:ea:7f:0a:4e:e2",
EAP-Message = "0x0225004a1a022500453148f97cfe5c4a9343727243c911c0e7530000000000000000e0e7bb18a4b6a3a7b773bb861700bdfae85e72f9074a838f004555524f50455c616c746868656765",
EAP-Type = "MSCHAPv2",
Event-Timestamp = "Oct 10 2024 13:07:26 CEST",
Framed-MTU = "1500",
FreeRADIUS-Proxied-To = "127.0.0.1",
MS-CHAP-Challenge = "0x778fda51024c231f00c350884d331b62",
MS-CHAP-User-Name = "xxx",
MS-CHAP2-Response = "0x255548f97cfe5c4a9343727243c911c0e7530000000000000000e0e7bb18a4b6a3a7b773bb861700bdfae85e72f9074a838f",
Module-Failure-Message = "chrooted_mschap: Program returned code (1) and output 'NT Error: code: 3221225762
message: (3221225762
'Indicates a name that was specified as a remote computer name is syntactically invalid.')'",
Module-Failure-Message = "chrooted_mschap: External script says: NT Error: code: 3221225762
message: (3221225762
'Indicates a name that was specified as a remote computer name is syntactically invalid.')",
Module-Failure-Message = "chrooted_mschap: MS-CHAP2-Response is incorrect",
NAS-IP-Address = "10.xxx",
NAS-Identifier = "xxx",
NAS-Port = "50107",
NAS-Port-Id = "xxx",
NAS-Port-Type = "Ethernet",
PacketFence-Domain = "xxx",
PacketFence-KeyBalanced = "xx",
PacketFence-NTLM-Auth-Host = "100.64.0.1",
PacketFence-NTLM-Auth-Port = "5000",
PacketFence-Outer-User = "xx\x",
PacketFence-Radius-Ip = "xxx",
Realm = "default",
Service-Type = "Framed-User",
State = "0xbaa4429eba815889078d36d6c0de94e9",
Stripped-User-Name = "xxx",
User-Name = "EURxxxOPE\xxx",
User-Password = "******"

RADIUS Reply
EAP-Message = "0x04250004",
MS-CHAP-Error = "%!E(MISSING)=691 R=0 C=20dd2031319e98badf2b398597145013 V=3 M=Authentication rejected",
Message-Authenticator = "0x00000000000000000000000000000000"
@MarioSpenc
Copy link
Author

As we drove into too much problems with AD integration in V14, we have to go back to V13.1. I think we will wait for more stable version ... ;-)

@stgmsa
Copy link
Contributor

stgmsa commented Oct 10, 2024 via email

@MarioSpenc
Copy link
Author

no cluster, AD nodes are all working, tried also different ones ...

mention: V13.1 PF works like a charm with absolutely same configuration!

@stgmsa
Copy link
Contributor

stgmsa commented Oct 10, 2024 via email

@MarioSpenc
Copy link
Author

export/import SQL based (not Mariadb-backup!)

@E-ThanG
Copy link

E-ThanG commented Oct 17, 2024

Same issue here. Brand new ISO 14.0 install with nothing imported. I'm building from scratch on this VM just to make sure there isn't weirdness from importing or earlier troubleshooting of unrelated issues.

It's this function call in the ntlm-auth-api rpc.py module that is crashing. The "remote computer name" that it is referring to appears to be the "server_name" variable:


ntlm-auth-api-domain[65796]: [2024-10-15 14:44:05,300] ERROR in app: Exception on /ntlm/auth [POST]
ntlm-auth-api-domain[65796]: Traceback (most recent call last):
ntlm-auth-api-domain[65796]:  File "/usr/local/pf/bin/pyntlm_auth/rpc.py", line 140, in transitive_login
ntlm-auth-api-domain[65796]:    result = global_vars.s_secure_channel_connection.netr_LogonSamLogonWithFlags(server_name, workstation,
ntlm-auth-api-domain[65796]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ntlm-auth-api-domain[65796]: samba.NTSTATUSError: (3221225762, 'Indicates a name that was specified as a remote computer name is syntactically invalid.')

I noticed that in rpc.py the function init_secure_connection() uses a global to initially populate the server_name. Using that server it finds a random DC to start the secure connection with. That random selection isn't saved back to the global though.

def init_secure_connection():

   <SNIP>

    server_name = global_vars.c_server_name # FQDN of Domain Controller   <------ Global

    domain_controller_records = utils.find_ldap_servers(global_vars.c_realm, global_vars.c_dns_servers)
    if len(domain_controller_records) > 0:
        idx = random.randint(0, len(domain_controller_records) -1)
        record = domain_controller_records[idx]
        server_name = record.get('target')   <------ Local

The transitive_login() function populates its own server_name from the global and proceeds to use the secure connection that was initialized with the randomly selected server_name. That might be a problem, it looks like it'd initiate a connection to server A but then try to talk to server B. It's random, so sometimes you would end up using the secure connection to the same server as is configured in the global, and other times it'd be some other server. If you only have one DC, you wouldn't see an issue.

def transitive_login(account_username, challenge, nt_response):
    server_name = global_vars.c_server_name    <------ Global
    domain = global_vars.c_domain
    workstation = global_vars.c_workstation
    global_vars.s_secure_channel_connection, global_vars.s_machine_cred, global_vars.s_connection_id, error_code, error_message = get_secure_channel_connection()

   <SNIP>
                   
        try:
            result = global_vars.s_secure_channel_connection.netr_LogonSamLogonWithFlags(server_name, workstation,   <-- Still using the global value
                                                                                         current, subsequent,
                                                                                         logon_level, logon,
                                                                                         validation_level,
                                                                                         netr_flags)
            (return_auth, info, foo, bar) = result

13.2 doesn't have the random selection process, that's the only change I see in rpc.py. In 13.2 the global value is the only server_name that is ever used. Also, doesn't this break the concept of a sticky DC as well? I may be confused on the difference between an AD authentication source and the domain configuration, it seems like 2 sides of the same coin.

Lastly, this docker is very hard to get information out of. I added a bunch of print statements. I only see them printing ~10% of the time, even when the authentication is successful. My best guess is that there is a race with the 2 threads started in app,.py. I always see config_load() printing at startup, but later, nothing. I was trying to troubleshoot that aspect, but I've broken everything now and I can't get ntlm-auth-api to start at all. Never fear, I made a VM snapshot before tinkering.

@salamander555
Copy link

We are also experiencing an issue with Active Directory authentication in the new PacketFence 14.0. It occurs regardless of whether it is a fresh installation (ISO or appliance) or an imported configuration from our existing version 13.2.

The main issue we identified is that it randomly selects a Domain Controller in the AD. Since we have many branch offices to which the ports are blocked, this leads to failures. If the correct Domain Controller is chosen by chance, there is no error. The configuration is correct, Sticky DC is set (but appears to be ignored).

E-ThanG describes a similar behavior in the previous post.

@stgmsa
Copy link
Contributor

stgmsa commented Oct 17, 2024

Thanks @E-ThanG @salamander555
we're investigating, we'll make a patch once confirmed.

@E-ThanG
Copy link

E-ThanG commented Oct 18, 2024

I did testing of my own, there are a few issues. The first is the theory I mentioned above, it does indeed fail when the secure communication is opened to one server_name and netr_LogonSamLogonWithFlags is called with a different server_name.

Additionally, our AD domain has 6 DCs, but only 2 are on our primary campus. The on-campus DCs are preferred for the best performance.

IMO AD authentication sources should use the configured DCs only, not all of the potential DCs. The configuration is called "Host", so we should configure FQDN hostnames of DCs, not the domain name that contains all DCs. Also, it should only chose a random DC if "Shuffle" is enabled and it should only pick from the configured DC list. I suppose there's an argument for the configuration being "DC Hostname or Domain DNS". If a DC hostname is configured (DNS name with single IP), it doesn't try to find others. If it's a domain DNS entry that contains multiple DC host IPs, then use all DCs found. I'd prefer that being an additional configuration option though.

Regarding logging of debug/error messages, I copied the method from config_load() and used simple print statements. Adding the second argument "file=sys.stderr" lets me see all of the print statements all of the time. Since this is run in a Flask app, it'd be ideal to use Flask's logging function instead of print though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants