Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Expose functionality as a re-usable library #268

Open
BenBE opened this issue Nov 23, 2022 · 5 comments
Open

[FEATURE] Expose functionality as a re-usable library #268

BenBE opened this issue Nov 23, 2022 · 5 comments

Comments

@BenBE
Copy link

BenBE commented Nov 23, 2022

Is your feature request related to a problem? Please describe.
I'm always frustrated when the only way to extract information about open file descriptors in a system is to parse them from running lsof as a sub-process and filter its output. While this might be fine for single snap-shotted use, this becomes quite wasteful when trying to integrate such functionality in a system monitoring tool which could benefit from these information being presented with a sane API.

Describe the solution you'd like
The collected information about open file descriptors and processes should be made available as a linkable shared library that other programs can use. This library should expose information about processes (+stats) and their open file descriptors. The library should expose functions to create snapshots of the whole system or just single processes, allow to collect information on single file, all file descriptors of a process or all of them. The library should allow to use a cache for these information so that information updates can be done incrementally (similar to the existing monitoring mode).

Describe alternatives you've considered
There is an existing feature of the lsof binary to run in a continuous mode. While this mode in general allows to receive updates for open file descriptors in fixed intervals it's not suitable to use when the scope of needed information changes over time (e.g. system monitor just showing number of open files per process and only needs actual file names and types once details for a particular process are requested by a user). Using a library could instruct the library which information to collect and request missing information on-demand.

There's furthermore a re-implementation of lsof in util-linux with JSON as interchange format (which is at times easier to integrate with applications), but which suffers similar flexibility issues over its runtime while furthermore neither being widely available nor backwards compatible with lsof. Thus while JSON is a nice interchange format for data (nless you have to parse it), a binary, in-process API would be much easier to work with.

Additional context
The machine-readable format of lsof is hardly documented. In particular some parts of it are overly complicated (having to convert numbers back-and-forth between representations, when things would be unambiguous in a proper library API). Also, depending on the platform, certain attributes are not exposed consistently, leading to situations where sometimes only one of two attributes (e.g. size or offset) are available.

@jiegec
Copy link
Contributor

jiegec commented Jan 20, 2023

Reading the code, the dialect-specific code and user-interfacing code is tightly coupled via global variables, and there is extensive usage of static variables. It will be a very big project ;) Maybe suitable for a GSoC?

@jiegec
Copy link
Contributor

jiegec commented Jan 27, 2023

I made some progress at https://github.com/jiegec/lsof/blob/library/include/lsof.h, you can see API definitions there, and here is a list of DONEs and TODOs:

  • Re-arrange directory structure, build liblsof
  • Add initial library interface with Doxygen.
  • Move global and static variables to lsof context.
  • Don't check Foffset/Fsize/Fsv in dialect, only check them in print.c.
  • Convert string-based fd column to enum + int.
  • Rewrite LTbasic with liblsof.
  • Set visibility to hidden for internal functions of liblsof.
  • Migrate more options from cli to liblsof until cli only uses public interface of liblsof.
  • Migrate more dialects, currently only Linux is tested.
  • Write some utilities with liblsof, maybe re-implement a netstat?
  • Fix memory leaks.
  • Replace all exit calls in liblsof with proper error reporting.

@BenBE
Copy link
Author

BenBE commented Jan 28, 2023

Had a quick look at the API: Looks good so far.

Probably some suggestions:

  • lsof_file_access_mode: Be explicit about the value of the modes. Maybe _NONE = 0, _READ = 4, _WRITE = 2, _READ_WRITE = _READ | _WRITE, …
  • Do the lsof_fd_type values align with what the OS usually uses? (e.g. git shows 100664 for a regular file, the first two digits being the file type)
  • lsof_protocol_type is missing UDP and Unix Domain sockets
  • lsof_file should probably group the valid fields into a valid mask with uint32_t or uint64_t
  • lsof_file might include a linked list with additional attributes (e.g. socket addresses, paths), thus avoiding regular redesigns when new columns/information have to be added (each attr containing an attr_type and a data section (union?)
  • DiD: It's usually better to place the count of elements before the array (pointer) to those elements (cf. lsof_proces->files)

From the PoV of an implementer of a system monitoring tool that tracks process information itself: How would I go about receiving updated information about open files for one process in a regular interval (polling is totally fine and expected, the updating part is the focus here)? As I read the API now I'd have to create a completely new context each time causing quite some information to be collected each time (in particular process information) that are thrown away again; or am I missing sth here?

Regarding platform support: How about extending support to FreeBSD and Darwin next?

@jiegec
Copy link
Contributor

jiegec commented Jan 28, 2023

  • lsof_file_access_mode: Be explicit about the value of the modes. Maybe _NONE = 0, _READ = 4, _WRITE = 2, _READ_WRITE = _READ | _WRITE, …

DONE

  • Do the lsof_fd_type values align with what the OS usually uses? (e.g. git shows 100664 for a regular file, the first two digits being the file type)

Actually, lsof_fd_type does not correspond to struct stat.st_mode, but rather how the file relates to the process, e.g. it is an open fd, cwd, root directory, memory-mapped file etc. The regular file/directory distinction is given in the TYPE column, saved as a string in struct lfile.type. The format is very casual, as can be seen from manpage:

       TYPE       is  the  type  of  the node associated with the file - e.g.,
                  GDIR, GREG, VDIR, VREG, etc.

                  or ``IPv4'' for an IPv4 socket;

                  or ``IPv6'' for an open IPv6 network file - even if its  ad‐
                  dress is IPv4, mapped in an IPv6 address;

                  or ``ax25'' for a Linux AX.25 socket;

                  or ``inet'' for an Internet domain socket;

                  or ``lla'' for a HP-UX link level access file;

                  or ``rte'' for an AF_ROUTE socket;

                  or ``sock'' for a socket of unknown domain;

Which is a big mess, and even the regular file is represented as REG, VREG.. Upper cases and lower cases are mixed. I want to unify them, but may break downstream users. Lf->ntype is better, but it is mainly for internal use, for example N_NFS can override N_REGLR.

  • lsof_protocol_type is missing UDP and Unix Domain sockets

Yes, working on it. But unix domain socket is special, it is currently reported in TYPE column, the same level as IPv4/IPv6 because unix domain socket can be dgram/stream-based.

  • lsof_file should probably group the valid fields into a valid mask with uint32_t or uint64_t

Yes, WIP.

  • lsof_file might include a linked list with additional attributes (e.g. socket addresses, paths), thus avoiding regular redesigns when new columns/information have to be added (each attr containing an attr_type and a data section (union?)

Good idea, I can also use this way to report selection results. lsof cli requires this information to report and set exit code.

  • DiD: It's usually better to place the count of elements before the array (pointer) to those elements (cf. lsof_proces->files)

Thanks for the suggestions, I will work on it after finishing FreeBSD support.

As I read the API now I'd have to create a completely new context each time causing quite some information to be collected each time (in particular process information) that are thrown away again; or am I missing sth here?

No, you can call lsof_gather() multiple times with the same context, just the options cannot be changed. This is because liblsof requires some preprocessing steps for the options.

Regarding platform support: How about extending support to FreeBSD and Darwin next?

Darwin done, FreeBSD/NetBSD/OpenBSD WIP.

@jiegec
Copy link
Contributor

jiegec commented Jan 31, 2023

Progress update:

  • Darwin, FreeBSD, NetBSD & OpenBSD porting
  • Convert Lf->type to enum

I will focus on liblsof user interface next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants