Persist Social Media Data into HANA Tables. Thereby the following Social Media Platforms are adressed:
- Youtube
The respective fields are continuously documented within this file
Local Developments are versioned using Git and Github. Production-Ready code is built as a Docker Container and shipped via DockerHub (or Geberit internal Azure) into a SAP Data Intelligence Container. From there, we can use locally developed functionality within a custom operator and implement it into a Data Intelligence Pipeline.
For persisting the retreived Social Media Data we use SAP Data intelligence. to get an introduction to SAP DI, this Video Series is recommended.
The central product of the project is a custom operator, called "Social Media". Locally developed code is shipped through a Docker Container into SAP DI. Three Tags on the right are used to generate the custom operator in DI Within DI we retreive the code using a Dockerfile in 'Repository-dockerfiles-SocialMedia':
FROM lwxspeers/pyapp:latest
# Install python library "requests"
RUN pip install requests
# Install python library "tornado" (Only required with SAP Data Hub version >= 2.5)
RUN pip install tornado==5.0.2
# Add vflow user and vflow group to prevent error
# container has runAsNonRoot and image will run as root
RUN groupadd -g 1972 vflow
RUN useradd -g 1972 -u 1972 -m vflow
# Change ownership over home folder
RUN chown -R vflow:vflow /home/vflow
# Change user to vflow
USER 1972:1972
# Setting up envs
ENV HOME=/home/vflow
ENV PYTHONPATH=/home/vflow:/home/vflow/relational_engine/src
WORKDIR /home/vflow/relational_engine/src
For using locally developed python functionality in a DI pipeline, have generated a new custom operator, which extends the regular python3 operator. This operator is used within all social media graphs and can be customized for each graph separately. It is important to set the tags equally as in the above mentioned dockerfile. Thereby, the Operator and Dockerfile can interact.
Within the Script we can now import and use python functionality from our shipped process.py interface:
For each social media platform we use a separate graph. These graphs can be found when searching for 'socialmedia' in the Graphs tab. See the Twitter Graph as an example:
Each Graph is structured around an instance of our DI Operator. Input are API Tokens for the social media platform, output are tables to persist the results from the social media platforms. Within such a model, click on the instance on <> to customize the instance-specific python code:
Generally we want as little code as possible within the operator. Therefore in each graph, the operator contains of two functions:
messager()
to bring the data into SAP DI specific message format. This message is then sent to the HANA Operatormain()
which executes themessager()
when retreiving data from the api inport. This data is structured in table format, therefore we indicate where to find the api token. In the above example we need the token (First row, second column) as well as the Organizational ID (First row, third column). These are used in theget_twitter_data()
function, which comes from the shipped Docker container, which we have imported into DI.
The messager()
function will format not all twitter data into a message - just one specific table. For twitter we have two tables: followerStatistics
and twitterStatistics
. These are sent to output ports in the operator, which have to be created and named according to the table names:
rightclick on the operator, add port...
Use the HANA Client Operator to persist the operator message into a HANA Table. The following configurations are set: Connection: Configuration Manager, HanaCloud_Dev Table name: "Schemaname"."Tablename" Table Columns: They can be either defined in a form or in JSON Format.
See the entire configuration for TwitterFollowerStatistics
Beside the graphical interface, the entire SAP DI Logic can also be viewed and edited in JSON Format by clicking the above right icon: The entire SAP DI structure is mirrored in this JSON Logic within the DI vscode User Application. From here I have created a Github repo with a remote. The DI Structure with its JSON files is available in this repository.
- Open Terminal in Windows or Pycharm (Alt+F12)
- Navigate to your code directory
- Ensure you are on the correct git branch
- Build the Docker Image docker build -t lwxspeers/pyapp .
- Push the Docker Image to your Repository docker push lwxspeers/pyapp:latest
- Import the Docker Image in DI
- Go to Modeler - Repository - Dockerfiles - SocialMedia - Dockerfile
- Press "Save", Then press "Build", wait until the Image is built succesfully
- Your updates from local code are now available within the operator. Enter your target DI Model, save and run the model