Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TArray constructor from DataFrame (DataFrames.jl) and from TimeArray (TimeSeries.jl) #4

Closed
femtotrader opened this issue May 21, 2016 · 6 comments · Fixed by #7
Closed

Comments

@femtotrader
Copy link
Collaborator

Hello,

it will be nice if you could tell me how to construct a TArray from DataFrame (DataFrames.jl) and from TimeArray (TimeSeries.jl) because I'm considering adding JuliaTS.jl support to femtotrader/TALib.jl#6

Here is some code to get a sample DataFrame
Download sample data https://github.com/femtotrader/TALib.jl/blob/master/test/ford_2012.csv

using DataFrames
filename = "test/ford_2012.csv"
dfOHLCV = readtable(filename)
dfOHLCV[:Date] = Date(dfOHLCV[:Date])

and for a sample TimeArray

using TimeSeries
taOHLCV = readtimearray(filename)

Maybe such constructors could be add to JuliaTS.jl (without adding these package as dependencies) ?

Kind regards

@tanmaykm
Copy link
Owner

Constructing TArray from DataFrame or TimeArray would be something like:

using DataFrames
using JuliaTS
using TimeSeries

# read as dataframe
dfOHLCV = readtable("ford_2012.csv");
dfOHLCV[:Date] = Date(dfOHLCV[:Date]);

# read as timeseries
tsOHLCV = readtimearray("ford_2012.csv");

# dataframe to TArray
ta = TArray((:Date,), [n=>dfOHLCV[n] for n in names(dfOHLCV)]...)

# timeseries to TArray
ta = TArray((:Date,), :Date=>tsOHLCV.timestamp, [symbol(n)=>tsOHLCV[n].values for n in colnames(tsOHLCV)]...)

May be Requires.jl will help adding such conversion functions without explicit package dependencies.

@tanmaykm
Copy link
Owner

@femtotrader, what do you think of an alternate interface for timeseries as in this notebook here: https://github.com/tanmaykm/notebooks/blob/master/stocks/demo2.ipynb ?

It is somewhat similar to python xarray. The backing array can be made to support NDSparseData. Is this a more convenient way for exploring data?

The implementation is in my fork here: https://github.com/tanmaykm/AxisArrays.jl/tree/tan

@femtotrader
Copy link
Collaborator Author

femtotrader commented May 22, 2016

Thanks for Require.jl package suggestion. I didn't know it.

I don't feel confortable enough with JuliaTS / AxisArray so I can't help for now about API usage but I will do it when I will have a better understanding about it.

Python xarray (formerly xray) is a very interesting package and having a Julia alternative will be a great feature.

A 3D (like Panel) data structure is a great feature to have. I will use it in https://github.com/femtotrader/DataReaders.jl (to store for example OHLCV values for several stocks). https://github.com/femtotrader/TALib.jl might also be able to support this kind of structure and apply a same indicator to several stocks at once.

Maybe a function to read CSV (and XLS, XLSX) files should be add ?
Because for now I don't see any other method than reading first to a DataFrame (or a TimeArray) and convert to TArray.

@femtotrader
Copy link
Collaborator Author

femtotrader commented May 22, 2016

julia> ta = TArray((:Date,), [n=>dfOHLCV[n] for n in names(dfOHLCV)]...)
julia> ta
TArray 250x6 Tuple{Date} => Tuple{Float64,Float64,Int64,Float64,Float64}
 (:Date,) => (:Close,:High,:Volume,:Low,:Open)
 (2012-01-03,) => (11.13,11.25,45709900,10.99,11.0)
 (2012-01-04,) => (11.3,11.53,79725200,11.07,11.15)
 (2012-01-05,) => (11.59,11.63,67877500,11.24,11.33)
 (2012-01-06,) => (11.71,11.8,59840700,11.52,11.74)
 (2012-01-09,) => (11.8,11.95,53981500,11.7,11.83)
 (2012-01-10,) => (11.8,12.05,121750600,11.63,12.0)
 (2012-01-11,) => (12.07,12.18,63806000,11.65,11.74)
 (2012-01-12,) => (12.14,12.18,48687700,11.89,12.16)
 (2012-01-13,) => (12.04,12.08,46366700,11.84,12.01)
 (2012-01-17,) => (12.02,12.26,44398400,11.96,12.2)
 
 (2012-12-17,) => (11.39,11.41,46983300,11.14,11.16)
 (2012-12-18,) => (11.67,11.68,61810400,11.4,11.48)
 (2012-12-19,) => (11.73,11.85,54884700,11.62,11.79)
 (2012-12-20,) => (11.77,11.8,47750100,11.58,11.74)
 (2012-12-21,) => (11.86,11.86,94489300,11.47,11.55)
 (2012-12-24,) => (12.4,12.4,91734900,11.67,11.67)
 (2012-12-26,) => (12.79,12.79,140331900,12.31,12.31)
 (2012-12-27,) => (12.76,12.81,108315100,12.36,12.79)
 (2012-12-28,) => (12.87,12.88,95668600,12.52,12.55)
 (2012-12-31,) => (12.95,13.08,106908900,12.76,12.88)

julia> ta = TArray((:Date,), :Date=>tsOHLCV.timestamp, [symbol(n)=>tsOHLCV[n].values for n in colnames(tsOHLCV)]...)
TArray 250x6 Tuple{Date} => Tuple{Float64,Float64,Float64,Float64,Float64}
 (:Date,) => (:Close,:High,:Volume,:Low,:Open)
 (2012-01-03,) => (11.13,11.25,4.57099e7,10.99,11.0)
 (2012-01-04,) => (11.3,11.53,7.97252e7,11.07,11.15)
 (2012-01-05,) => (11.59,11.63,6.78775e7,11.24,11.33)
 (2012-01-06,) => (11.71,11.8,5.98407e7,11.52,11.74)
 (2012-01-09,) => (11.8,11.95,5.39815e7,11.7,11.83)
 (2012-01-10,) => (11.8,12.05,1.217506e8,11.63,12.0)
 (2012-01-11,) => (12.07,12.18,6.3806e7,11.65,11.74)
 (2012-01-12,) => (12.14,12.18,4.86877e7,11.89,12.16)
 (2012-01-13,) => (12.04,12.08,4.63667e7,11.84,12.01)
 (2012-01-17,) => (12.02,12.26,4.43984e7,11.96,12.2)
 
 (2012-12-17,) => (11.39,11.41,4.69833e7,11.14,11.16)
 (2012-12-18,) => (11.67,11.68,6.18104e7,11.4,11.48)
 (2012-12-19,) => (11.73,11.85,5.48847e7,11.62,11.79)
 (2012-12-20,) => (11.77,11.8,4.77501e7,11.58,11.74)
 (2012-12-21,) => (11.86,11.86,9.44893e7,11.47,11.55)
 (2012-12-24,) => (12.4,12.4,9.17349e7,11.67,11.67)
 (2012-12-26,) => (12.79,12.79,1.403319e8,12.31,12.31)
 (2012-12-27,) => (12.76,12.81,1.083151e8,12.36,12.79)
 (2012-12-28,) => (12.87,12.88,9.56686e7,12.52,12.55)
 (2012-12-31,) => (12.95,13.08,1.069089e8,12.76,12.88)

Column order is not preserved.

julia> names(dfOHLCV)
6-element Array{Symbol,1}:
 :Date
 :Open
 :High
 :Low
 :Close
 :Volume

julia> colnames(tsOHLCV)
5-element Array{UTF8String,1}:
 "Open"
 "High"
 "Low"
 "Close"
 "Volume"

but

julia> ta.valnames
(:Close,:High,:Volume,:Low,:Open)

Maybe an OrderedDict might be use
see a similar issue here JuliaData/DataFrames.jl#950

I also noticed that Volume type (Int64) is not preserved. Volume seems to be converted to Float64 when using TimeArray (from TimeSeries.jl) to TArray

@tanmaykm
Copy link
Owner

TimeArray stores all columns in the same array. It promoted Int64 volume column to Float64. DataFrame can handle differently typed columns though.

I think column order is not preserved because of the use of setdiff here:

valnames = tuple(setdiff([c.first for c in colpairs], keynames)...)
. Will push a fix. Thanks for pointing it out.

@femtotrader
Copy link
Collaborator Author

femtotrader commented May 22, 2016

Thanks @tanmaykm for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants