Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing ps() output format #71

Closed
YoannPa opened this issue Nov 18, 2019 · 6 comments
Closed

Changing ps() output format #71

YoannPa opened this issue Nov 18, 2019 · 6 comments

Comments

@YoannPa
Copy link

YoannPa commented Nov 18, 2019

Hi,
I was wondering how one could change the output format when using the ps() function to generate the table of processes ? when using ps command in a classic terminal you have an option to format the output, displaying the metrics you are interested in.
In my own specific case what I would be interested in would be to be able to have in ps() table the %CPU and also the ELAPSED TIME metrics for all processes.

I already made a function able to retrieve these metrics ( https://github.com/mathosi/cluster_check ), but since a dedicated package already exist I was wondering if it would be possible to have this feature here.
Thank you in advance for your help.
Best,

@gaborcsardi
Copy link
Member

I was wondering how one could change the output format when using the ps() function to generate the table of processes?

But it already does that, no? It returns a data frame / tibble with all the processes.

when using ps command in a classic terminal you have an option to format the output

You can write a function that formats the output differently.

displaying the metrics you are interested in

In R it is pretty easy to select some columns of a data frame, that's why it is up to the end user to do it.

This said, I am open to adding a different print method, that is more similar to a regular ps x output. Then you'll need to open an issue with some sketch how the output should look.

In my own specific case what I would be interested in would be to be able to have in ps() table the %CPU and also the ELAPSED TIME metrics for all processes.

ps::ps() returns created, which is when the process was created, we can easily add elapsed time with simply Sys.time() - created.

ps does not calculate %CPU currently, because it is somewhat cumbersome. If you want to calculate this, open an(other) issue for it.

I already made a function able to retrieve these metrics ( mathosi/cluster_check ), but since a dedicated package already exist I was wondering if it would be possible to have this feature here.

Yeah, unfortunately, I don't think the approach of calling the external ps program is portable even across various Unix systems, and it will surely not work on Windows, where ps is not available.

@YoannPa
Copy link
Author

YoannPa commented Nov 18, 2019

But it already does that, no? It returns a data frame / tibble with all the processes.

In the Linux terminal version of ps you can really specify the metrics you want, in the order you want. I think this feature would deserve an additionnal function on its own for allowing a user to give as a string using similar output codes as in a classic ps command, the metrics he wants and in which order he wants it.
It is not so simple to do, I haven't implemented yet myself this option in the repository I linked in my previous message.
OR, another way to think it would be to generate a table with all metrics by default, and to then let the user select and order the dataframe the way he wants it. This second way would better fit what you expect from the user to do, and I would be totally fine with that.

ps does not calculate %CPU currently, because it is somewhat cumbersome. If you want to calculate this, open an(other) issue for it.

The way I designed my function ps.to.df(), it just does internally a system() of the classic ps command.
This way you don't have to calculate the %CPU usage, you just grab what is already calculated by the classic ps. Maybe I am wrong seeing things this way, but since a calculation method already exists in ps, and knowing that ps is available by default on most Linux distributions, I though it was making sense to make use of it directly.

Yeah, unfortunately, I don't think the approach of calling the external ps program is portable even across various Unix systems, and it will surely not work on Windows, where ps is not available.

Yes this approach would be limitated to UNIX system, but I think most of them have ps. In Windows the equivalent would be tasklist I guess.

I saw that one of the function from your package is called ps_cmdline().
It would be nice if this function could also have an option to handle direct ps command line as character string ?
Something like that: ps_cmdline(cmd="-C rsession -o %cpu,%mem,pid")
Or add a new function to do such thing. At least Linux user could directly pass a command line and have the output as a data.frame.

@gaborcsardi
Copy link
Member

gaborcsardi commented Nov 18, 2019

OR, another way to think it would be to generate a table with all metrics by default,

Right, I think that's the R way to do it. We can still allow customization, though.

The way I designed my function ps.to.df() , it just does internally a system() of the classic ps command.

That's fine for your script, but it is not for a tool like the ps R package, which is expected to work on macOS and Windows as well, at the very least. It should also work if the ps program is not available, e.g. on some small Docker containers.

Your script already fails on macOS, because ps has different options there:

❯ source("https://raw.githubusercontent.com/mathosi/cluster_check/master/ps_to_df.R")
❯ ps.to.df()
ps: illegal option -- -
usage: ps [-AaCcEefhjlMmrSTvwXx] [-O fmt | -o fmt] [-G gid[,gid...]]
          [-g grp[,grp...]] [-u [uid,uid...]]
          [-p pid[,pid...]] [-t tty[,tty...]] [-U user[,user...]]
       ps [-L]
[1] perCPU  perMEM  PID     PPID    USER    COMMAND STARTED ELAPSED STAT
<0 rows> (or 0-length row.names)
Warning message:
In system(command = cmd, intern = TRUE) :
  running command 'ps -A --no-headers -o %cpu:5,%mem:5,pid:7,ppid:7,user:36,comm:15,lstart:30,etime:30,stat:5 --sort=-%cpu' had status 1

Something like that: ps_cmdline(cmd="-C rsession -o %cpu,%mem,pid")

I think you missed what ps_cmdline() did. It returns the command line of a process, e.g.:

❯ ps::ps_cmdline(ps::ps_handle())
[1] "/Library/Frameworks/R.framework/Resources/bin/exec/R"

The ps R package does not use the ps program at all.

@YoannPa
Copy link
Author

YoannPa commented Nov 18, 2019

That's fine for your script, but it is not for a tool like the ps R package, which is expected to work on macOS and Windows as well, at the very least. It should also work if the ps program is not available, e.g. on some small Docker containers.

Your script already fails on macOS, because ps has different options there

Yes you are right. I wasn't even aware that MacOs was using ps honnestly. Then I guess another approach wouldn't fit your package expectations. One should just provide a way to do it in Linux only.

I think you missed what ps_cmdline() did.
Yes sorry about that, I was thinking maybe a function with a similar name could be created to do what I described.

Thank you for your time and your answer, that's really nice of you.
Let me know if you plan to give more flexibility to the content of the ps() table.
Best,
Yoann.

@YoannPa YoannPa closed this as completed Nov 18, 2019
@gaborcsardi
Copy link
Member

Let me know if you plan to give more flexibility to the content of the ps() table.

Yes, as I said, please open other issues. Thanks.

@gaborcsardi
Copy link
Member

#72
#73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants