Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetFX dependency redirections fails in some cases #3714

Open
RassK opened this issue Oct 16, 2024 · 11 comments
Open

NetFX dependency redirections fails in some cases #3714

RassK opened this issue Oct 16, 2024 · 11 comments
Labels
bug Something isn't working
Milestone

Comments

@RassK
Copy link
Contributor

RassK commented Oct 16, 2024

What

Runtime dependency redirection mechanism is failing in some cases.

Example affected library: System.Diagnostics.DiagnosticSource

Affected version: OpenTelemetry.AutoInstrumentation v1.8.0 (ships with System.Diagnostics.DiagnosticSource v8.0.0)

App 1 (NetFxConsole):
.NET FX 4.8 Console app (very simple, creates one manual Activity + makes a HttpClient call)
ref: System.Diagnostics.DiagnosticSource v8.0.1
result: works

App 2 (SimpleFrameworkApp):
ASP.NET 4.8 (template)
ref: System.Diagnostics.DiagnosticSource v8.0.1
result:
Could not load file or assembly 'System.Diagnostics.DiagnosticSource, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51' or one of its dependencies. The system cannot find the file specified.

Native logs attached:
otel-dotnet-auto-5888-NetFxConsole-Native.log
otel-dotnet-auto-7856-w3wp-Native.log

@RassK RassK added the bug Something isn't working label Oct 16, 2024
@RassK
Copy link
Contributor Author

RassK commented Oct 16, 2024

Let's make sure we don't release this before a proper fix:
#3710

This is a very good test example at the moment.

@Kielek Kielek added this to the 1.9.0 milestone Oct 16, 2024
@RassK
Copy link
Contributor Author

RassK commented Oct 17, 2024

I think @pjanotti figured it out that this check here is skipping important modules from dependency analysis. The initial tests show that after I removed System.Diagnostics prefix from being skipped the test application successfully bumped System.Diagnostics.DiagnosticSource to v8.0.1 from v8.0.0.

Not sure if runtime_information_.is_desktop() doesn't correctly identify IIS apps?
Does anyone know how important is the initial skiplist for performance at all?

CC: @zacharycmontoya @nrcventura

@pjanotti
Copy link
Contributor

pjanotti commented Oct 18, 2024

Not sure if runtime_information_.is_desktop() doesn't correctly identify IIS apps?

It is based on https://learn.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/icorprofilerinfo3-getruntimeinformation-method so unless that is broken it should correctly identify as "desktop" meaning .NET Fx.

Does anyone know how important is the initial skiplist for performance at all?

I'm not aware of any measurement of that. Anyway, we need to ensure first that we are correct we then can take care of optimizing if that becomes an issue. This should affect start up time and should be done only once per app domain IIRC.

@nrcventura
Copy link
Member

I think that @pjanotti and @RassK are both correct, and have identified the root cause. It looks like the primary purpose of this skip_assembly_prefixes list was to skip the assemblies that do not need to be instrumented. The problem in this issue arrises because we also use that list to skip the assembly binding redirects.

Either way I think we need to decouple the concepts of which assemblies should be skipped for instrumentation, and which assemblies should be skipped for binding redirects. I suspect there are multiple assemblies in the current skip list that we want to apply binding redirects to. I am also wondering if we should ever skip an assembly for these binding redirects, because more and more built-in assemblies (or at least assemblies in the current skip list) will potentially be instrumented or use one of the libraries that we need to apply these binding redirects to. The current skip list check might need to be applied after we perform the binding redirects.

@RassK
Copy link
Contributor Author

RassK commented Oct 21, 2024

So should we start with a quick fix by removing https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/blob/main/src/OpenTelemetry.AutoInstrumentation.Native/otel_profiler_constants.h#L48 ?

Seems the most crucial thing to do 🤔

@Kielek
Copy link
Contributor

Kielek commented Oct 21, 2024

One more thing, it will be great to have a tests that check if we do not ship (in .netFx folder any library listed/prefixed by https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/blob/315758d7088eafab96a927dc374ca8f9c7f1c994/src/OpenTelemetry.AutoInstrumentation.Native/otel_profiler_constants.h)

@nrcventura
Copy link
Member

I don't think that we need the quick fix. I think we can just move the logic for updating the references, before the skip assemblies checks.

@RassK
Copy link
Contributor Author

RassK commented Oct 22, 2024

Why I'm towards a quickfix is that S.D.DS v8.0.1 is already out there and we don't have a supporting release. Splunk also has a failing POC in progress because of this bug. For last, even if we manage to reorder loading with some time, there are no tests to cover this issue... I'm not sure how to pinpoint this issue so we prevent it happening again 🤔 Testing seems to need loads of work.

@Kielek Kielek mentioned this issue Oct 22, 2024
3 tasks
@pjanotti
Copy link
Contributor

@RassK I think the hope was that the two things could be really separated: is that too big of a change?

For the tests we could potentially read the redirection from a file only in the test (e.g.: have some env var that reads from a file and ignores the map defined at compile time if defined and if not just proceed with the map defined in the header file). This file could point to earlier versions of the redirect but we should only use the ones from the app. That said I think that the tests could be done in separate as a follow-up.

@RassK
Copy link
Contributor Author

RassK commented Oct 24, 2024

Seems some cache messed my testing up (by somehow compiling the redirection in). Prob updating my local branch cleared a cache and now the quickfix or making sure System.Diagnostics is not skipped does not work. I also could not get over of the question why IIS and console is different - and seems this is the important question.

Location 1: This is fine, if checks work as supposed
https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/blob/main/src/OpenTelemetry.AutoInstrumentation.Native/cor_profiler.cpp#L705

Location 2: There is another discovered path which does skipping. Even fixing it similar to the Location 1 does not fix it.
https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/blob/main/src/OpenTelemetry.AutoInstrumentation.Native/cor_profiler.cpp#L1216

So back to the question, why IIS and console differ. There is clearly a difference how AppDomain is handled.

Console app loads the main module (In my case NetFxConsole.exe) in ModuleLoadFinished section. Then analyses it's dependencies and finds System.Diagnostics.DiagnosticSource v8.0.1 and performs redirection update.

[2024-10-24T10:06:30.192636400Z] [3796|7816] [debug] ModuleLoadFinished: 140728057938248 NetFxConsole AppDomain 1766768132752 [DefaultDomain]  | IsNGEN = false | IsDynamic = false | IsResource = false
[2024-10-24T10:06:30.192674500Z] [3796|7816] [debug] RedirectAssemblyReferences: AssemblyRef for [mscorlib] version=4.0.0.0
[2024-10-24T10:06:30.192678000Z] [3796|7816] [debug] RedirectAssemblyReferences: AssemblyRef for [System.Diagnostics.DiagnosticSource] version=8.0.0.1
[2024-10-24T10:06:30.192682400Z] [3796|7816] [info] RedirectAssemblyReferences: redirection update for [System.Diagnostics.DiagnosticSource] to_version=8.0.0.1 previous_version_redirection=8.0.0.0
[2024-10-24T10:06:30.192718000Z] [3796|7816] [debug] RedirectAssemblyReferences: AssemblyRef for [System.Net.Http] version=4.2.0.0
[2024-10-24T10:06:30.192723000Z] [3796|7816] [debug] RedirectAssemblyReferences: same version for [System.Net.Http] version=4.2.0.0 previous_redirects=0
[2024-10-24T10:06:30.192724300Z] [3796|7816] [debug] RedirectAssemblyReferences: EnumAssemblyRefs returned S_FALSE assembly_refs_count=0
[2024-10-24T10:06:30.192728900Z] [3796|7816] [debug] ModuleLoadFinished: done NetFxConsole v1.0.0.0

However IIS app never performs ModuleLoadFinished for the main module (In my case SimpleFrameworkApp.dll). Therefore skips entirely the dependency analysis for the dependencies of the SimpleFrameworkApp. I could not find any other AppDomain having System.Diagnostics.DiagnosticSource as a dependency besides our modules (OpenTelemetry.* modules).

I can find direct loading of System.Diagnostics.DiagnosticSource v8.0.1 module but because it's not a dependency of a module then no redirection analysis is performed.

[2024-10-24T10:06:24.023477200Z] [3572|6248] [debug] ModuleLoadFinished: 140728071477120 System.Diagnostics.DiagnosticSource AppDomain 2175410301504 [/LM/W3SVC/5/ROOT-1-133742379818231863]  | IsNGEN = false | IsDynamic = false | IsResource = false
[2024-10-24T10:06:24.023751000Z] [3572|6248] [debug] RedirectAssemblyReferences: AssemblyRef for [mscorlib] version=4.0.0.0
[2024-10-24T10:06:24.023754500Z] [3572|6248] [debug] RedirectAssemblyReferences: AssemblyRef for [System.Memory] version=4.0.1.2
[2024-10-24T10:06:24.023758000Z] [3572|6248] [debug] RedirectAssemblyReferences: same version for [System.Memory] version=4.0.1.2 previous_redirects=0
[2024-10-24T10:06:24.023760500Z] [3572|6248] [debug] RedirectAssemblyReferences: AssemblyRef for [System] version=4.0.0.0
[2024-10-24T10:06:24.023763000Z] [3572|6248] [debug] RedirectAssemblyReferences: AssemblyRef for [System.Core] version=4.0.0.0
[2024-10-24T10:06:24.023765800Z] [3572|6248] [debug] RedirectAssemblyReferences: AssemblyRef for [System.Runtime.CompilerServices.Unsafe] version=6.0.0.0
[2024-10-24T10:06:24.023768700Z] [3572|6248] [debug] RedirectAssemblyReferences: same version for [System.Runtime.CompilerServices.Unsafe] version=6.0.0.0 previous_redirects=0
[2024-10-24T10:06:24.023769900Z] [3572|6248] [debug] RedirectAssemblyReferences: EnumAssemblyRefs returned S_FALSE assembly_refs_count=0
[2024-10-24T10:06:24.023773100Z] [3572|6248] [debug] ModuleLoadFinished: done System.Diagnostics.DiagnosticSource v8.0.0.1

So my next question is: Where is ModuleLoadFinished for SimpleFrameworkApp module ?

@RassK
Copy link
Contributor Author

RassK commented Nov 6, 2024

#3751 is going to fix the issue for some time. Will keep this open to investigate a proper solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

4 participants