Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(etls): add interreg 2014UK16RFOP002 - EUBFR-258 #219

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions config.example.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
"2014tc16rfcb047",
"2014tc16rfpc001",
"2014tc16rftn002",
"2014uk16rfop002",
"bulgaria",
"cordis",
"devco",
Expand Down
1 change: 1 addition & 0 deletions docs/types/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Here's a list of the transformations made in ETLs around the `Project` model.
- [2014tc16rfcb047 - XLS](./etls/2014tc16rfcb047-xls.md)
- [2014tc16rfpc001 - XLS](./etls/2014tc16rfpc001-xls.md)
- [2014tc16rftn002 - XLS](./etls/2014tc16rftn002-xls.md)
- [2014uk16rfop002 - XLS](./etls/2014uk16rfop002-xls.md)
- [bulgaria - XLS](./etls/bulgaria-xls.md)
- [CORDIS - CSV](./etls/cordis-csv.md)
- [DEVCO - XLS](./etls/devco-xls.md)
Expand Down
136 changes: 136 additions & 0 deletions docs/types/etls/2014uk16rfop002-xls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
<!-- Generated by documentation.js. Update this documentation by updating the source code. -->

## 2014uk16rfop002XlsTransform

Map fields for 2014uk16rfop002 producer, XLS file types

Example input data: [stub][1]

Transform function: [implementation details][2]

### Parameters

- `record` **[Object][3]** Piece of data to transform before going to harmonized storage.

Returns **Project** JSON matching the type fields.

### getBudget

Preprocess `budget`.

Input fields taken from the `record` are:

- `EU (£) (30%)`
- `EU (£)`
- `Total (£)`

#### Parameters

- `record` **[Object][3]** The row received from parsed file

Returns **Budget**

### getDescription

Preprocess `description`.

Input fields taken from the `record` are:

- `Project No.`
- `GOG (£)`
- `PS (£)`
- `PS (£) (70%)`

#### Parameters

- `record` **[Object][3]** The row received from parsed file

Returns **[String][4]**

### getProjectId

Preprocess `project_id`.

Input fields taken from the `record` are:

- `Project Name`

#### Parameters

- `record` **[Object][3]** The row received from parsed file

Returns **[String][4]**

### getLocations

Preprocess `project_locations`.

#### Parameters

- `record` **[Object][3]** The row received from parsed file

Returns **[Array][5]&lt;[Location][6]>**

### getThemes

Preprocess `themes`.

Input fields taken from the `record` are:

- `Activity`

#### Parameters

- `record` **[Object][3]** The row received from parsed file

Returns **[Array][5]&lt;[String][4]>**

### getThirdParties

Preprocess `third_parties`.

Input fields taken from the `record` are:

- `Sponsor`

#### Parameters

- `record` **[Object][3]** The row received from parsed file

Returns **[Array][5]&lt;ThirdParty>**

### getTimeframe

Preprocess `timeframe`.

Input fields taken from the `record` are:

- `Start Date`
- `End Date`

#### Parameters

- `record` **[Object][3]** The row received from parsed file

Returns **Timeframe**

### getTitle

Preprocess `title`.

Input fields taken from the `record` are:

- `Project Name`

#### Parameters

- `record` **[Object][3]** The row received from parsed file

Returns **[String][4]**

[1]: https://github.com/ec-europa/eubfr-data-lake/blob/master/services/ingestion/etl/2014uk16rfop002/xls/test/stubs/record.json
[2]: https://github.com/ec-europa/eubfr-data-lake/blob/master/services/ingestion/etl/2014uk16rfop002/xls/src/lib/transform.js
[3]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object
[4]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String
[5]: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array
[6]: https://developer.mozilla.org/docs/Web/API/Location
1 change: 1 addition & 0 deletions scripts/documentation/docs-md.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ const transforms = [
'2014tc16rfcb047-xls',
'2014tc16rfpc001-xls',
'2014tc16rftn002-xls',
'2014uk16rfop002-xls',
'bulgaria-xls',
'cordis-csv',
'devco-xls',
Expand Down
16 changes: 16 additions & 0 deletions services/ingestion/etl/2014uk16rfop002/xls/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# 2014uk16rfop002 XLS ETL mapping rules

Model to compare with is available at: https://ec-europa.github.io/eubfr-data-lake/

| Field | Target |
| -------------------- | ----------------- |
| Project No. | description |
| Project Name | title |
| Sponsor | third_parties |
| EU (£), EU (£) (30%) | budget.eu_contrib |
| GOG (£) | description |
| PS (£), PS (£) (70%) | description |
| Total (£) | budget.total_cost |
| Activity | themes |
| Start Date | timeframe.from |
| End Date | timeframe.to |
29 changes: 29 additions & 0 deletions services/ingestion/etl/2014uk16rfop002/xls/babel.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
module.exports = {
presets: [
'@babel/preset-flow',
[
'@babel/preset-env',
{
targets: {
node: '8.10',
},
modules: false,
loose: true,
},
],
],
env: {
test: {
presets: [
[
'@babel/preset-env',
{
targets: {
node: '8.10',
},
},
],
],
},
},
};
32 changes: 32 additions & 0 deletions services/ingestion/etl/2014uk16rfop002/xls/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"private": true,
"name": "@eubfr/ingestion-etl-2014uk16rfop002-xls",
"version": "0.7.0",
"scripts": {
"deploy": "sls deploy -v",
"test:unit": "jest --testPathPattern=unit"
},
"dependencies": {
"@eubfr/lib": "^0.7.0",
"@eubfr/logger-messenger": "^0.7.0",
"xlsx": "0.14.2"
},
"devDependencies": {
"@babel/core": "7.4.3",
"@babel/preset-env": "7.4.3",
"@babel/preset-flow": "7.0.0",
"@eubfr/types": "^0.7.0",
"aws-sdk": "2.434.0",
"babel-jest": "24.7.0",
"babel-loader": "8.0.5",
"jest": "24.7.0",
"serverless": "1.40.0",
"serverless-webpack": "5.2.0",
"webpack": "4.29.6"
},
"jest": {
"transform": {
"^.+\\.js$": "babel-jest"
}
}
}
123 changes: 123 additions & 0 deletions services/ingestion/etl/2014uk16rfop002/xls/serverless.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
service: ingestion-etl-2014uk16rfop002-xls

plugins:
- serverless-webpack

custom:
webpack:
webpackConfig: ./webpack.config.js
includeModules:
forceExclude:
- aws-sdk
packager: yarn
eubfrEnvironment: ${opt:eubfr_env, file(../../../../../config.json):eubfr_env, env:EUBFR_ENV, 'dev'}
bucketName: ${file(../../../../../resources/harmonized-storage/serverless.yml):custom.bucketName}

package:
individually: true

provider:
name: aws
runtime: nodejs8.10
timeout: 60
stage: ${opt:stage, file(../../../../../config.json):stage, env:EUBFR_STAGE, 'dev'}
region: ${opt:region, file(../../../../../config.json):region, env:EUBFR_AWS_REGION, 'eu-central-1'}
deploymentBucket:
name: eubfr-${self:custom.eubfrEnvironment}-deploy
stackTags:
ENV: ${self:custom.eubfrEnvironment}
iamRoleStatements:
- Effect: 'Allow'
Action:
- 's3:PutObject'
Resource:
Fn::Join:
- ''
- - 'arn:aws:s3:::'
- ${self:custom.bucketName}
- '/*'
# Allow queueing messages to the DLQ https://docs.aws.amazon.com/lambda/latest/dg/dlq.html
- Effect: 'Allow'
Action:
- sqs:SendMessage
Resource: '*'

functions:
parseXls:
handler: src/events/onParseXLS.handler
name: ${self:provider.stage}-${self:service}-parseXls
memorySize: 1024
environment:
BUCKET: ${self:custom.bucketName}
REGION: ${self:provider.region}
STAGE: ${self:provider.stage}
events:
- sns:
arn:
Fn::Join:
- ''
- - 'arn:aws:sns:'
- Ref: 'AWS::Region'
- ':'
- Ref: 'AWS::AccountId'
- ':${self:provider.stage}-etl-2014uk16rfop002-xls'
topicName: ${self:provider.stage}-etl-2014uk16rfop002-xls
- sns:
arn:
Fn::Join:
- ''
- - 'arn:aws:sns:'
- Ref: 'AWS::Region'
- ':'
- Ref: 'AWS::AccountId'
- ':${self:provider.stage}-etl-2014uk16rfop002-xlsx'
topicName: ${self:provider.stage}-etl-2014uk16rfop002-xlsx

resources:
Resources:
ParseXlsLambdaFunction:
Type: 'AWS::Lambda::Function'
Properties:
DeadLetterConfig:
TargetArn:
Fn::ImportValue: ${self:provider.stage}:ingestion-dead-letter-queue:LambdaFailureQueue
SNSTopic2014uk16rfop002XLS:
Type: AWS::SNS::Topic
Properties:
TopicName: ${self:provider.stage}-etl-2014uk16rfop002-xls
DisplayName: 2014uk16rfop002 XLS ETL
SNSTopic2014uk16rfop002XLSX:
Type: AWS::SNS::Topic
Properties:
TopicName: ${self:provider.stage}-etl-2014uk16rfop002-xlsx
DisplayName: 2014uk16rfop002 XLSX ETL
SNSTopic2014uk16rfop002XLSPolicy:
Type: AWS::SNS::TopicPolicy
Properties:
PolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: Allow-IngestionManager-Publish
Action:
- sns:Publish
Effect: Allow
Resource:
Fn::Join:
- ''
- - 'arn:aws:sns:'
- Ref: 'AWS::Region'
- ':'
- Ref: 'AWS::AccountId'
- ':${self:provider.stage}-etl-2014uk16rfop002-*'
Principal:
AWS:
Fn::Join:
- ''
- - 'arn:aws:sts::'
- Ref: 'AWS::AccountId'
- ':assumed-role/ingestion-manager-${self:provider.stage}-'
- Ref: 'AWS::Region'
- '-lambdaRole/${self:provider.stage}-ingestion-manager-onObjectCreated'
Topics:
- Ref: SNSTopic2014uk16rfop002XLS
- Ref: SNSTopic2014uk16rfop002XLSX
Loading