Skip to content

lwe-speers/socialmedia-public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Geberit Social Media Development

Project Goal

Persist Social Media Data into HANA Tables. Thereby the following Social Media Platforms are adressed:

  • Twitter
  • LinkedIn
  • Youtube
  • Facebook
  • Instagram
  • Pinterest

The respective fields are continuously documented within this file

Project Concept

image

Local Developments are versioned using Git and Github. Production-Ready code is built as a Docker Container and shipped via DockerHub (or Geberit internal Azure) into a SAP Data Intelligence Container. From there, we can use locally developed functionality within a custom operator and implement it into a Data Intelligence Pipeline.

Python Code

SAP Data Intelligence

For persisting the retreived Social Media Data we use SAP Data intelligence. to get an introduction to SAP DI, this Video Series is recommended.

Social Media Operator

The central product of the project is a custom operator, called "Social Media". Locally developed code is shipped through a Docker Container into SAP DI. Three Tags on the right are used to generate the custom operator in DI Within DI we retreive the code using a Dockerfile in 'Repository-dockerfiles-SocialMedia': image

Content of the DI Social Media DOCKERFILE:

FROM lwxspeers/pyapp:latest

# Install python library "requests"
RUN pip install requests

# Install python library "tornado" (Only required with SAP Data Hub version >= 2.5)
RUN pip install tornado==5.0.2

# Add vflow user and vflow group to prevent error
# container has runAsNonRoot and image will run as root
RUN groupadd -g 1972 vflow
RUN useradd -g 1972 -u 1972 -m vflow

# Change ownership over home folder
RUN chown -R vflow:vflow /home/vflow

# Change user to vflow
USER 1972:1972

# Setting up envs
ENV HOME=/home/vflow
ENV PYTHONPATH=/home/vflow:/home/vflow/relational_engine/src

WORKDIR /home/vflow/relational_engine/src

Generating the Social Media Operator

For using locally developed python functionality in a DI pipeline, have generated a new custom operator, which extends the regular python3 operator. This operator is used within all social media graphs and can be customized for each graph separately. It is important to set the tags equally as in the above mentioned dockerfile. Thereby, the Operator and Dockerfile can interact. image

Within the Script we can now import and use python functionality from our shipped process.py interface: image

Using the SocialMedia Operator in a DI Graph

For each social media platform we use a separate graph. These graphs can be found when searching for 'socialmedia' in the Graphs tab. See the Twitter Graph as an example: image

Each Graph is structured around an instance of our DI Operator. Input are API Tokens for the social media platform, output are tables to persist the results from the social media platforms. Within such a model, click on the instance on <> to customize the instance-specific python code:

image

Generally we want as little code as possible within the operator. Therefore in each graph, the operator contains of two functions:

  1. messager() to bring the data into SAP DI specific message format. This message is then sent to the HANA Operator
  2. main() which executes the messager() when retreiving data from the api inport. This data is structured in table format, therefore we indicate where to find the api token. In the above example we need the token (First row, second column) as well as the Organizational ID (First row, third column). These are used in the get_twitter_data() function, which comes from the shipped Docker container, which we have imported into DI.

Note:

The messager()function will format not all twitter data into a message - just one specific table. For twitter we have two tables: followerStatisticsand twitterStatistics. These are sent to output ports in the operator, which have to be created and named according to the table names: rightclick on the operator, add port... image

Using the HANA Client in a DI Graph

Use the HANA Client Operator to persist the operator message into a HANA Table. The following configurations are set: Connection: Configuration Manager, HanaCloud_Dev Table name: "Schemaname"."Tablename" Table Columns: They can be either defined in a form or in JSON Format. image

See the entire configuration for TwitterFollowerStatistics image

Note:

Beside the graphical interface, the entire SAP DI Logic can also be viewed and edited in JSON Format by clicking the above right icon: image The entire SAP DI structure is mirrored in this JSON Logic within the DI vscode User Application. From here I have created a Github repo with a remote. The DI Structure with its JSON files is available in this repository.

Steps to update DI Operator from local code repository

  1. Open Terminal in Windows or Pycharm (Alt+F12)
  2. Navigate to your code directory
  3. Ensure you are on the correct git branch image
  4. Build the Docker Image docker build -t lwxspeers/pyapp . image
  5. Push the Docker Image to your Repository docker push lwxspeers/pyapp:latest image
  6. Import the Docker Image in DI
  • Go to Modeler - Repository - Dockerfiles - SocialMedia - Dockerfile image
  • Press "Save", Then press "Build", wait until the Image is built succesfully image
  1. Your updates from local code are now available within the operator. Enter your target DI Model, save and run the model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published