Releases: DyfanJones/RAthena
Releases · DyfanJones/RAthena
RAthena 2.6.1
RAthena 2.6.0
Bug Fix:
- Delay Python to R conversion to prevent 64 bit integer mapping to R's base 32 bit integer (#168) causing the follow bug in Data Scan info message. Thanks to @juhoautio for identifying issue.
INFO: (Data scanned: -43839744 Bytes)
Feature:
- Add
clear_s3_resource
parameter toRAthena_options
to prevent AWS Athena output AWS S3 resource being cleared up bydbClearResult
(#168). Thanks to @juhoautio for the request. - Support extra boto3 parameters for
boto3.session.Session
class andclient
method (#169) - Support
endpoint_override
parameter allow default endpoints for each service to be overridden accordingly (#169). Thanks to @aoyh for the request and checking the package in development.
RAthena 2.5.1
Bug Fix:
- Fixed unit test helper function
test_data
to usesize
parameter explicitly.
RAthena 2.5.0
Feature:
- Allow all information messages to be turned off (noctua # 178).
- Allow
RAthena_options
to change 1 parameter at a time without affecting other pre-configured settings - Return warning message for deprecated
retry_quiet
parameter inRAthena_options
function.
RAthena 2.4.0
Feature:
- Add support
dbplyr
2.0.0 backend API. - Add method to set unload on a package level to allow
dplyr
to benefit fromAWS Athena unload
methods (noctua # 174).
Bug Fix:
- Ensure
dbGetQuery
,dbExecute
,dbSendQuery
,dbSendStatement
work on older versions ofR
(noctua # 170). Thanks to @tyner for identifying issue. - Caching would fail when statement wasn't a character (noctua # 171). Thanks to @ramnathv for identifying issue.
v-2.3.0
Feature:
- Add support to
AWS Athena UNLOAD
(noctua: # 160). This is to take advantage of read/write speedparquet
has to offer.
import awswrangler as wr
import getpass
bucket = getpass.getpass()
path = f"s3://{bucket}/data/"
if "awswrangler_test" not in wr.catalog.databases().values:
wr.catalog.create_database("awswrangler_test")
cols = ["id", "dt", "element", "value", "m_flag", "q_flag", "s_flag", "obs_time"]
df = wr.s3.read_csv(
path="s3://noaa-ghcn-pds/csv/189",
names=cols,
parse_dates=["dt", "obs_time"]) # Read 10 files from the 1890 decade (~1GB)
wr.s3.to_parquet(
df=df,
path=path,
dataset=True,
mode="overwrite",
database="awswrangler_test",
table="noaa"
);
wr.catalog.table(database="awswrangler_test", table="noaa")
library(DBI)
con <- dbConnect(RAthena::athena())
# Query ran using CSV output
system.time({
df = dbGetQuery(con, "SELECT * FROM awswrangler_test.noaa")
})
# Info: (Data scanned: 80.88 MB)
# user system elapsed
# 57.004 8.430 160.567
RAthena::RAthena_options(cache_size = 1)
# Query ran using UNLOAD Parquet output
system.time({
df = dbGetQuery(con, "SELECT * FROM awswrangler_test.noaa", unload = T)
})
# Info: (Data scanned: 80.88 MB)
# user system elapsed
# 21.622 2.350 39.232
# Query ran using cache
system.time({
df = dbGetQuery(con, "SELECT * FROM awswrangler_test.noaa", unload = T)
})
# Info: (Data scanned: 80.88 MB)
# user system elapsed
# 13.738 1.886 11.029
v-2.2.0
Bug Fix:
- sql_translate_env correctly translates R functions quantile and median to AWS Athena equivalents (noctua # 153). Thanks to @ellmanj for spotting issue.
Feature:
- Support AWS Athena timestamp with time zone data type.
- Properly support data type list when converting data to AWS Athena SQL format.
library(data.table)
library(DBI)
x = 5
dt = data.table(
var1 = sample(LETTERS, size = x, T),
var2 = rep(list(list("var3"= 1:3, "var4" = list("var5"= letters[1:5]))), x)
)
con <- dbConnect(RAthena::athena())
#> Version: 2.2.0
sqlData(con, dt)
# Registered S3 method overwritten by 'jsonify':
# method from
# print.json jsonlite
# Info: Special characters "\t" has been converted to " " to help with Athena reading file format tsv
# var1 var2
# 1: 1 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}
# 2: 2 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}
# 3: 3 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}
# 4: 4 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}
# 5: 5 {"var3":[1,2,3],"var4":{"var5":["a","b","c","d","e"]}}
#> Version: 2.1.0
sqlData(con, dt)
# Info: Special characters "\t" has been converted to " " to help with Athena reading file format tsv
# var1 var2
# 1: 1 1:3|list(var5 = c("a", "b", "c", "d", "e"))
# 2: 2 1:3|list(var5 = c("a", "b", "c", "d", "e"))
# 3: 3 1:3|list(var5 = c("a", "b", "c", "d", "e"))
# 4: 4 1:3|list(var5 = c("a", "b", "c", "d", "e"))
# 5: 5 1:3|list(var5 = c("a", "b", "c", "d", "e"))
v-2.2.0 now converts lists into json lines format so that AWS Athena can parse with sql array/mapping/json functions. Small down side a s3 method conflict occurs when jsonify is called to convert lists into json lines. jsonify was choose in favor to jsonlite due to the performance improvements (noctua # 156).
RAthena-2.1.0
Bug Fix:
dbIsValid
wrongly stated connection is valid for result class when connection class was disconnected.sql_translate_env.paste
broke with latest version ofdbplyr
. New method is compatible withdbplyr>=1.4.3
noctua # 149.
Feature:
sql_translate_env
: add support forstringr
/lubridate
style functions, similar to Postgres backend.dbConnect
addtimezone
parameter so that time zone betweenR
andAWS Athena
is consistent noctua # 149.
RAthena-v2.0.1
This is a hot fix patch to fix keyboard interrupt not raising errors correctly.
RAthena-v2.0.0
Merge pull request #139 from DyfanJones/doc_rd doc: fix package references