Create option for machine readable output for scan #984

keith-turner · 2017-12-22T14:44:04Z

It would be useful if the fluo scan command had an option to produce machine readable output. This could be something like fluo scan -a app1 --json or fluo scan -s app1 --csv. Not sure what the best format to use for the output is. If we start with something like a --json option then we can always add something like a --csv option later.

The text was updated successfully, but these errors were encountered:

blueshift-brasil · 2018-01-31T22:21:41Z

Suggestion: make a refactoring on org.apache.fluo.core.util.ScanUtil to receive a java.io.OutputStream or a java.io.Writer or both to avoid something like that:

System.out.println(sb.toString());

Could this be done in that same thread?

keith-turner · 2018-02-01T15:14:48Z

I think passing in something like an OutputStream is much cleaner. Would need to preserve the checkError() behavior where the scan breaks when the output stream is closed. Looking into this I realized the current code is inefficient because checkError() also flushes. It would be more efficient to use something like an OutputStream and when it throws an IOException just stop the scan. Or if using a PrintStream, then checkError() could be called less frequently like every 100 or 1000 lines.

blueshift-brasil · 2018-02-19T19:15:28Z

I'm working on this. Do you think we could use the commons-csv lib?
I believe we could use it for both the current format (tsv like) and for csv format.

I'm creating some properties on fluo-app to CSV format:

## Fluo Scan properties
## -----------------
## Properties to export the scan result to CSV format.
fluo.scan.csv.delimiter = ;
fluo.scan.csv.header = true
fluo.scan.csv.quote = "
# Possible values: ALL, ALL_NON_NULL, MINIMAL, NON_NUMERIC and NONE
# @see org.apache.commons.csv.QuoteMode
fluo.scan.csv.quoteMode = ALL
fluo.scan.csv.comment = #
fluo.scan.csv.escape = \

blueshift-brasil · 2018-02-20T01:47:21Z

In distribution module, fetch.sh file, are there any reason for this dependency to be in this version?

download com.google.code.gson:gson:jar:2.2.4

In Fluo pom.xml we are in 2.8.0. Can I upgrade to use for the --json scan?

blueshift-brasil · 2018-02-20T02:34:16Z

Sample of CSV file:

[root@6cf4e94e7248 share]# fluo scan -a myapp --csv
"ROW";"COLUMN_FAMILY";"COLUMN_QUALIFIER";"COLUMN_VISIBILITY";"VALUE"
"HISTORICO:123:1:10:100:17bc30e1-4c55-4037-9bb8-032b2c422935";"cadastral";"DAT_NSC";"";"111111111"
"HISTORICO:123:1:10:100:17bc30e1-4c55-4037-9bb8-032b2c422935";"cadastral";"NOM_RAZ_SOC";"";"yyy"
"HISTORICO:123:1:10:100:18f73b40-d14c-4717-83cf-4a63f4012e9c";"cadastral";"DAT_NSC";"";"111111111"
"HISTORICO:123:1:10:100:18f73b40-d14c-4717-83cf-4a63f4012e9c";"cadastral";"NOM_RAZ_SOC";"";"yyy ;"
"HISTORICO:123:1:10:100:5a30271c-513a-47f3-86fd-dbf0edb98f93";"cadastral";"DAT_NSC";"";"111111111"
"HISTORICO:123:1:10:100:5a30271c-513a-47f3-86fd-dbf0edb98f93";"cadastral";"NOM_RAZ_SOC";"";"yyy"
"HISTORICO:123:1:10:100:ddba1524-05c6-4bbe-89db-a85a8bca6b22";"cadastral";"DAT_NSC";"";"111111111"
"HISTORICO:123:1:10:100:ddba1524-05c6-4bbe-89db-a85a8bca6b22";"cadastral";"NOM_RAZ_SOC";"";"yyy"
"HISTORICO:123:1:10:100:fd36a744-54de-4377-9396-3043ac01064d";"cadastral";"DAT_NSC";"";"111111111"
"HISTORICO:123:1:10:100:fd36a744-54de-4377-9396-3043ac01064d";"cadastral";"NOM_RAZ_SOC";"";"yyy"

JSON file:

[root@6cf4e94e7248 share]# fluo scan -a myapp --json
{"ROW":"HISTORICO:123:1:10:100:0a5f7383-58ec-48e1-8a4a-545bb99ace9f","COLUMN_FAMILY":"cadastral","COLUMN_QUALIFIER":"cadastralDAT_NSC","COLUMN_VISIBILITY":"","VALUE":"111111111"}
{"ROW":"HISTORICO:123:1:10:100:0a5f7383-58ec-48e1-8a4a-545bb99ace9f","COLUMN_FAMILY":"cadastral","COLUMN_QUALIFIER":"cadastralNOM_RAZ_SOC","COLUMN_VISIBILITY":"","VALUE":"yyy"}
{"ROW":"HISTORICO:123:1:10:100:60aabb11-cc1a-4bdd-9c12-29545ceae5ea","COLUMN_FAMILY":"cadastral","COLUMN_QUALIFIER":"cadastralDAT_NSC","COLUMN_VISIBILITY":"","VALUE":"111111111"}
{"ROW":"HISTORICO:123:1:10:100:60aabb11-cc1a-4bdd-9c12-29545ceae5ea","COLUMN_FAMILY":"cadastral","COLUMN_QUALIFIER":"cadastralNOM_RAZ_SOC","COLUMN_VISIBILITY":"","VALUE":"yyy"}
{"ROW":"HISTORICO:123:1:10:100:d4cc616a-629a-4361-b7ee-0efa007a400b","COLUMN_FAMILY":"cadastral","COLUMN_QUALIFIER":"cadastralDAT_NSC","COLUMN_VISIBILITY":"","VALUE":"111111111"}
{"ROW":"HISTORICO:123:1:10:100:d4cc616a-629a-4361-b7ee-0efa007a400b","COLUMN_FAMILY":"cadastral","COLUMN_QUALIFIER":"cadastralNOM_RAZ_SOC","COLUMN_VISIBILITY":"","VALUE":"yyy"}

keith-turner · 2018-02-20T15:53:41Z

My slight preference would be to make the csv options command line options. The reason is that I think this would give more predictable results. However I can see the convenience of putting the options in the config file. So I am uncertain which is best.

If doing command line options, could be something like the following.

fluo scan -a app1 --csv --csv-delimiter '"'

I would replace COLUMN_FAMILY, COLUMN_QUALIFIER, and COLUMN_VISIBILITY with FAMILY, QUALIFIER, and VISIBILITY in json and csv output to make it shorter.

I think using commons csv is fine.

For the json dependency upgrade we need to make sure it does not cause issue with Accumulo and Hadoop libraries. If it does not, then its ok to upgrade.

blueshift-brasil · 2018-02-20T18:35:55Z

About the config in the command line, I believe is possible keep both. It’s a good idea!
The command line could overwrite the property config. Even if the user don’t inform any parameter on command line or on property file we going to assume the DEFAUL behavior of the component (commons csv).

About the short names. Sounds good! Done!

About the dependence, is it common this two versions of the same component?
How can I test properly if the upgrade works well? Any idea?
Detail: with the version 2.2.4 it was not possible to write the json file.

keith-turner · 2018-02-20T21:27:52Z

I think its ok to update gson to 2.8.0 in the fetch script. I was looking at mvn dependency:tree and 2.8.0 is the version that hadoop 2.6 depends on. I am not sure why fetch.sh has a different version than the pom.

blueshift-brasil · 2018-02-21T02:06:38Z

Now we have 3 options to configure the scan command:

Based on fluo-app.properties
Based on --csv-* parameters
And based on -o overwrite parameter

[root@9bb10c2c941e share]# fluo scan -a myapp --csv
[root@9bb10c2c941e share]# fluo scan -a myapp --csv --csv-delimiter '|'
[root@9bb10c2c941e share]# fluo scan -a myapp --csv -o fluo.scan.csv.delimiter='|'

keith-turner · 2018-02-21T14:41:49Z

I forgot about -o

blueshift-brasil mentioned this issue Feb 21, 2018

Update the scan command utilization apache/fluo-website#126

Open

keith-turner pushed a commit that referenced this issue Apr 4, 2018

Scan command now can produce results as csv and json. #984 #1018

0ec21c9

keith-turner added a commit that referenced this issue Apr 4, 2018

Remove csv option from scan command #984 #1018

f147016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create option for machine readable output for scan #984

Create option for machine readable output for scan #984

keith-turner commented Dec 22, 2017

blueshift-brasil commented Jan 31, 2018

keith-turner commented Feb 1, 2018 •

edited

Loading

blueshift-brasil commented Feb 19, 2018

blueshift-brasil commented Feb 20, 2018

blueshift-brasil commented Feb 20, 2018

keith-turner commented Feb 20, 2018

blueshift-brasil commented Feb 20, 2018

keith-turner commented Feb 20, 2018

blueshift-brasil commented Feb 21, 2018 •

edited

Loading

keith-turner commented Feb 21, 2018

Create option for machine readable output for scan #984

Create option for machine readable output for scan #984

Comments

keith-turner commented Dec 22, 2017

blueshift-brasil commented Jan 31, 2018

keith-turner commented Feb 1, 2018 • edited Loading

blueshift-brasil commented Feb 19, 2018

blueshift-brasil commented Feb 20, 2018

blueshift-brasil commented Feb 20, 2018

keith-turner commented Feb 20, 2018

blueshift-brasil commented Feb 20, 2018

keith-turner commented Feb 20, 2018

blueshift-brasil commented Feb 21, 2018 • edited Loading

keith-turner commented Feb 21, 2018

keith-turner commented Feb 1, 2018 •

edited

Loading

blueshift-brasil commented Feb 21, 2018 •

edited

Loading