Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for debugging imports #878

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Tests for debugging imports #878

wants to merge 13 commits into from

Conversation

vidartf
Copy link

@vidartf vidartf commented Mar 2, 2022

Encoding some expected behavior in tests.

CC @echarles

Not sure why this is not needed on CI, but I seem to need it locally.
Test to cover what happens if we step past the end of a cell's code.
Expected behavior is to continue until next execute arrives, and then
stop on the first statement in the new code.
@echarles
Copy link
Member

echarles commented Mar 3, 2022

Thx for working on this @vidartf
Quick headup that CI is failing with the current state of this PR.

=========================== short test summary info ============================
FAILED ipykernel/tests/test_debugger.py::test_step_into_lib[kernel0] - Assert...
FAILED ipykernel/tests/test_debugger.py::test_step_into_end - Failed: DID NOT...

@vidartf
Copy link
Author

vidartf commented Mar 3, 2022

Yes. My statement is that master has a bug 😉

vidartf and others added 8 commits March 3, 2022 11:20
Probably redundant, but more robust
When launching the kernel in-process, there might be frames before the ipykernel root, making the assumption that it is the first frame invalid.
Monkey patches pydevd's global debugger instance so that frames from `__tracebackhide__ == "__ipython_bottom__"` and below are excluded from both stack traces and tracing (aka stepping). This should prevent users stepping out into the IPython/ipykernel internals between statements, and after cells.

Points of note and/or discussion:
- Doesn't yet handle display hooks I think.
- Path/module filters are not sufficient, as the same file/frames can appear on both sides of the IPython stack "bottom".
- For the same reason, caching of the filter can not be used.
- Monkey patching is not ideal, especially since we also access some internals that are probably not considered part of the public API.
- Are threads / forking important here?
- Note that using `traitlets.validate()` for the test is useful, since it also shows up below the "bottom" of the IPython stack (at the time of writing). Maybe using a more central IPython function would be more reliable?
Some of these might be nice to keep around, especially during the review phase. We can remove (parts of) this before merging if we want to keep things tidy!
@vidartf
Copy link
Author

vidartf commented Mar 10, 2022

@fabioz: Do you have any input on my dirty monkey patching of your very nice library? 😅

@vidartf vidartf marked this pull request as draft March 10, 2022 20:44
@vidartf
Copy link
Author

vidartf commented Mar 10, 2022

So, this is an initial draft for filtering frames (proof of concept if you'd like):

Monkey patches pydevd's global debugger instance so that frames from __tracebackhide__ == "__ipython_bottom__" and below are excluded from both stack traces and tracing (aka stepping). This should prevent users stepping out into the IPython/ipykernel internals between statements, and after cells.

Points of note and/or discussion:

  • Doesn't yet handle display hooks I think.
  • Path/module filters are not sufficient, as the same file/frames can appear on both sides of the IPython stack "bottom".
  • For the same reason, caching of the filter can not be used.
  • Monkey patching is not ideal, especially since we also access some internals that are probably not considered part of the public API.
  • Are threads / forking important here?
  • Note that using traitlets.validate() for the test is useful, since it also shows up below the "bottom" of the IPython stack (at the time of writing). Maybe using a more central IPython function would be more reliable?

@fabioz
Copy link

fabioz commented Mar 11, 2022

Regarding the monkey-patching, I'm a bit wary of it as it's bound to break as the debugger moves forward (if this functionality is really needed, this should probably be in the debugger itself).

get_file_type is also pretty important performance-wise, so, this change will probably make the debugger significantly slower if the results can't be cached since you're covering for a use-case such as:

-> bottom_frame (ignore)
-> frame: func_a (ignore)
-> __tracebackhide__ == "__ipython_bottom__" (ignore)
-> frame: func_a (show)
-> frame: func_b (show)

And going up the stack every time the file type is needed is bound to be slow...

Is this use-case common where a frame from code func_a would appear both before and after the __ipython_bottom__ (is it really worthy to support it)? Can you provide the actual use case where this is needed?

@vidartf
Copy link
Author

vidartf commented Mar 11, 2022

@fabioz Thanks for the input. As mentioned, this is mainly proof-of-concept at this stage, and anything that can be done to clean it up will be very welcome. Making it not rely on internal implementation details of pydevd is definitely one of these things.

Is this use-case common where a frame from code func_a would appear both before and after the __ipython_bottom__ (is it really worthy to support it)? Can you provide the actual use case where this is needed?

Anything that IPython or ipykernel uses or calls out to while processing its message loop are things to consider. Some packages/libs I've observed in logs include: traitlets, asyncio, concurrent, zmq, threading, iostream, jupyter_client, json, hmac, date_utils, logging, queue, tornado, selectors, enum, signal, inspect, tokenize, contextlib, os, tempfile, codeop, dis, decorator, etc. These were only the ones I could find from quickly scanning the logs after running the tests here. I'm sure there will be others as well. Not being able to debug these (or where the ability to trace these are based on a race) seems a deal-breaker. That said, the goal would of course be to:

  1. Ensure that no performance degradation occurs for the cases where the debug target is not an ipykernel, or where the user does not want to debug libraries (just_my_code = True).
  2. Find more optimized ways of implementing this. E.g. for set_trace_for_frame_and_parents, if we could break the loop on the __ipython_bottom__ frame, we would avoid enumerating those frames at all. If we can do the same for other things that loop over frames for filtering, we could avoid the need to touch get_file_type at all. However, at the moment I'm not familiar enough with the pydevd source to make any good calls about that, so I brute-forced it in get_file_type since it is a central location for most things that filter frames. Any input from you would be very helpful in figuring out what would be possible here!

@vidartf
Copy link
Author

vidartf commented Mar 11, 2022

Not being able to debug these (or where the ability to trace these are based on a race) seems a deal-breaker.

Additionally: If these frames show up above the "bottom" frame for the first time, than a step command will suddenly trace out into these outer frames, or they will confusingly show up in the stack trace. These behaviors might be much more detrimental to the user experience than not being able to trace these libs.

@fabioz
Copy link

fabioz commented Mar 11, 2022

Humm, if you know the points when you want to start/stop tracing and want absolute control, maybe using get_global_debugger().enable_tracing() / get_global_debugger().disable_tracing() could be a better approach...

@fabioz
Copy link

fabioz commented Mar 11, 2022

I couldn't find any reference to __ipython_bottom__ (apart from the usage where it's checked in the debugger). Where would that be defined?

p.s.: the enable/disable tracing would still leave items showing in the stack frames, but those could be removed in the wrapper that sends the info to the client.

@vidartf
Copy link
Author

vidartf commented Mar 11, 2022

Interesting. I can see a context manager applying this before calling into user code. Some questions/concerns come to mind though:

  • Would we need to be careful to reapply the same tracers? I.e. would the trace state need to be stored between calls to disable and the next enable? Would there be concerns of this going out of sync? Also, are there not race concerns with debug messages coming in between the code executions (i.e. the trace will be enabled while below the "bottom" frame)?
  • IPython have some support for async code in cells. I'm not very familiar with this, but I'm concerned whether this could mean that we might end up in the IPython loop again while awaiting something in the cell? I would have to look into this. I also don't know how the stack would look like at that point.

I couldn't find any reference to __ipython_bottom__ (apart from the usage where it's checked in the debugger). Where would that be defined?

https://github.com/ipython/ipython/blob/de8729a984272b023768a8b85295ae06146da71d/IPython/core/interactiveshell.py#L3347

@vidartf
Copy link
Author

vidartf commented Mar 11, 2022

The main place that __ipython_bottom__ is used is in the PDB extension for IPython: https://github.com/ipython/ipython/blob/de8729a984272b023768a8b85295ae06146da71d/IPython/core/debugger.py

@fabioz
Copy link

fabioz commented Mar 11, 2022

  • Would we need to be careful to reapply the same tracers? I.e. would the trace state need to be stored between calls to disable and the next enable? Would there be concerns of this going out of sync? Also, are there not race concerns with debug messages coming in between the code executions (i.e. the trace will be enabled while below the "bottom" frame)?

This shouldn't be a problem. The tracing enabling/disabling is supported by the debugger and it should make sure everything works properly (if there's any issue that'd be a bug to be fixed).

  • IPython have some support for async code in cells. I'm not very familiar with this, but I'm concerned whether this could mean that we might end up in the IPython loop again while awaiting something in the cell? I would have to look into this. I also don't know how the stack would look like at that point.

The async is indeed trickier because whenever you enter the async part you want to enable the debugger and when you get back to the loop you want to disable it (this may be a blocker... I'm not sure if there's a way to implement that).

@fabioz
Copy link

fabioz commented Mar 11, 2022

After thinking a bit more about it I think that your original approach of having a root frame from which the debugging happens (relying on __tracebackhide__ == "__ipython_bottom__") may really be the best approach and it does cover for the asynchronous use case properly.

I have to thinker a bit more about how to implement that efficiently though...

@fabioz
Copy link

fabioz commented May 6, 2022

@vidartf I've added some thoughts on the plans to tackle this in the debugger (see: microsoft/debugpy#869 (comment)).

After I finish tackling that issue, I'll give you a heads up so that you can update this PR to make use of those new arguments in the attach/launch requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants