-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Interpreter and Environment Discovery
One of the things that the extension does on start up is search for installed interpreters and environments in known global and workspace locations. This is later used to enable behaviors such as auto-selection, environment lists, environment activation based on type, etc.
This was the original way the extension looked for interpreters and environments. There were several implementations of IInterpreterLocatorService
that were focussed on a particular interpreter or environment type. For example, we have a class that specifically looks at the environments.txt
file created by conda
and reports the environments discovered through that file.
Each of the implementation was exposed to the rest of the extension via service container (using inversify
). In addition to the IInterpreterLocatorService
interface, some classes also provide convenient wrappers for domain specific features. Such as identifying if a given interpreter belongs to a environment of a particular type.
There are a couple of issues with this design:
- Each implementation of
IInterpreterLocatorService
requires that the python interpreter is eventually run to extract the required information. This often slows down extension load depending on the number of environments available on a given machine. - Since each implementation also exposes additional functionality, there are other classes that offer unrelated features that have taken a dependency these classes. This also means that when we are testing we sometimes have to mock large number of unrelated classes to test a simple feature.
- A large number of classes are implemented and used as singletons. This has lead to lack of well defined API separating the modules that do discovery and modules that consume the result of discovery.
This version of the interpreters and environment discovery module attempts to provide a scoped API to get the interpreters for use with the rest of the extension. This design exposes APIs via the IComponentAdapter
interface. IComponentAdapter
was added to allow integrating with the rest of the extension which uses dependency injection to acquire dependencies. This component depends only on platform APIs like file system, processes, OS specific features, and settings passed in when the component is created. Implementation of IComponentAdapter
can depend on vscode
APIs if needed. IComponentAdapter
acts as a bridge between the APIs exposed by the component and the rest of the extension.
One issue with the old locator code was the the internal of the environment specific locators were exposed to the rest of the extension. This made testing difficult due to extension taking dependencies on concrete implementation rather than abstractions. With the new component all code flows through the component adapter that exposes a well defined API.
The discovery component is activated as a part of the component activation. Once all the classes are loaded following APIs are available to use:
API | Description |
---|---|
getInterpreters | Returns interpreter found, this API may return interpreters from cache. |
getInterpreterDetails | Returns environment info for a given interpreter. |
onDidCreate | Register a callback to be called when a workspace virtual environment is created. |
onRefreshing | Discovery component is still looking for environments. |
onRefreshed | Discovery component has finished looking for environments. |
getInterpreterInformation | Temporary. Returns partial environment information as available at that point in time. |
isMacDefaultPythonPath | Temporary. Returns true if the interpreter path is identified as the default Mac python path. |
isCondaEnvironment | Temporary. Returns true if the interpreter path is identified as the part of a conda environment. |
getCondaEnvironment | Temporary. Returns name and path to the conda environment given path to interpreter that belongs to a conda environment. |
isWindowsStoreInterpreter | Temporary. Returns true if the interpreter path is identified as python installed via Windows Store. |
hasInterpreters | Temporary. Returns true if an interpreter has been found. |
getWorkspaceVirtualEnvInterpreters | Temporary. Returns environments that belong to a workspace. |
getWinRegInterpreters | Temporary. Returns environments that were discovered using windows Registry. |
APIs marked temporary will be either removed or we already have equivalents in the new component making them obsolete.
On activation the component adapter loads the known environments from cache and performs a background refresh to find any new environments. If any new environment is discovered, this will trigger a cache update once we have enough details about the environment. See CachingLocator
for implementation.
Once a environment is found, we reduce the number of environments detected to a distinct set of environments. This reduction is done to prevent cases where same interpreter binary might have symlinks that exist in the same folder with a slightly different name. The comparison rule is if the versions and the parent directory of the python binary match, then it is likely the same python environment.
The next step is the reduction phase where the environment is checked for missing information. This is the resolver step, where we find additional information, by running a python script in that environment. The number of simultaneous python processes we execute here is throttled to prevent over use of system resources in cases where there are large number of environments.
After these steps anything that remains is used to overwrite the cache. So the next call to getInterpreters
will pull the latest information from the cache.
There are two groups of locators, global and workspace locators. Global locators look for python installed in the global locations such as the ~/.venv
, ~/.pyenv
, Windows Registry, etc. Workspace locators look for python that is available in the workspace. Workspace locators are similar to global locators and have some retraction on running python. Lastly, there are also file system watchers that are initialized on some global folders and workspace locations. The watchers are there to find any environment that is created after the extension has started up.
Each locator has does the following things:
- Finds the environment or interpreter that the locator is responsible for.
- Extracts any information from files or metadata. Information such as, version, distribution name, environment name, etc.
- Fires an event indicating that it found an environment, if it is a FS watching locator.
The environments are exposed via iterEnvs
method implemented by each locator. A locator must also implement resolveEnv
method, which given an absolute path to an interpreter will identify the environment and provide additional data, if missing, for environments that belong to the type handled by that locator.
A locator also implements onChanged
event. This is used by locators which also watch the file system for new environment creation. When this event is fired, the handler is expected to call iterEnvs
to get the updated set of environments.
Windows has two unique sources where we can find python installations.
- Windows Registry (see PEP 514 for the registry layout). We have a locator that specifically looks the windows registry locations for both 32 bit and 64 bit registry for install pythons. The locators also extract versions and other metadata from the registry keys.
- Windows Store installations. These are global pythons environments installed via windows store. There are some subtleties here, about which paths are valid to use and which are not. See the comments on
isWindowsStoreEnvironment
andgetWindowsStorePythonExes
for more details. This location is watched, so new installations should found by the extension even after initial discovery.
Another way we detect installed pythons is via PATH environment variable. This is done for all platforms. However, we do include additional paths for non-windows based OS (see commonPosixBinPaths
for more details). These locations are not watched to reduce the performance load due to watching large number of binaries.
We have specific locators of the each of these environment types. The individual implementation should have details on how each environment is found. In most cases we look at the known location and known environment variables to find the environments.
Environment | Locator Implementation |
---|---|
venv, virtualenv, virtualenvwrapper, pipEnv | GlobalVirtualEnvironmentLocator |
pyenv | PyenvLocator |
conda | CondaEnvironmentLocator |
Workspace virtual environments are found in two ways. One byt searching through the directories, the other by watching the workspace folders for creation of virtual environments. Implementation details can be see here WorkspaceVirtualEnvironmentLocator
.
Each environment is treated as a particular environment type based on priority. This is to ensure that when we need to activate the environment we treat it the right way. For example, you could use pyenv to create and conda environment. See getPrioritizedEnvKinds
for more info.
These are the things you should consider before implementing a new locator:
- Do you need file system watching for your locator?
- Do you need a new environment type? or does it fit within the known environment kinds (see
PythonEnvKind
)
If you need file watching then extend
the FSWatchingLocator
class (see WindowsStoreLocator
for example). It provides convenience methods to setup file system watching for a particular location. If you don't need it then extend
the Locator
class (see WindowsRegistryLocator
for example).
Once you have your implementation, add it to createNonWorkspaceLocators
. If you had to add a new environment type, be sure to update the following function getPrioritizedEnvKinds
.
Be sure to add tests for your implementation. You can look at the tests for the existing locators for ideas on how to add tests.