This project is a serverless ETL pipeline that extracts parking meter data from NYC Open Data, transforms it into a structured format, and loads it into MongoDB.
- Extract: Fetches parking meter data from NYC Open Data API
- Transform: Processes and normalizes the data (meter types, operational hours, locations)
- Load: Stores the processed data in MongoDB
- Schedule: Runs automatically every 7 days
- Go 1.x
- Node.js & npm
- AWS CLI configured with appropriate credentials
- MongoDB instance
Create a .env
file with the following variables:
NYC_API_URL=https://data.cityofnewyork.us/resource/693u-uax6.json
NYC_API_APP_TOKEN=your_api_token
MONGODB_URI=your_mongodb_connection_string
MONGODB_DATABASE=your_database_name
BATCH_SIZE=1000
- Configure AWS credentials:
aws configure
- Store environment variables in AWS Parameter Store:
aws ssm put-parameter --name "/parkit/nyc_api_url" --value "https://data.cityofnewyork.us/resource/693u-uax6.json" --type "String"
aws ssm put-parameter --name "/parkit/nyc_api_token" --value "your_token" --type "SecureString"
aws ssm put-parameter --name "/parkit/mongodb_uri" --value "your_uri" --type "SecureString"
aws ssm put-parameter --name "/parkit/mongodb_database" --value "your_database" --type "SecureString"
aws ssm put-parameter --name "/parkit/batch_size" --value "1000" --type "String"
- Install dependencies:
npm install
- Build the Go binary:
go build -o cmd/sync/main ./cmd/sync
- Deploy to AWS:
serverless deploy
The main components are:
internal/nyc/client.go
: NYC Open Data API clientinternal/models/parking.go
: Data modelsinternal/database/mongodb.go
: MongoDB operationsinternal/service/sync.go
: Main ETL logicserverless.yml
: AWS Lambda configuration
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request