Skip to content

Commit

Permalink
Merge pull request #12 from DeepthiSudharsan/DS-Kahani
Browse files Browse the repository at this point in the history
Code Cleaning and Documentation
  • Loading branch information
sameersegal authored Nov 14, 2024
2 parents 981c954 + c1cc0f1 commit 8fa504b
Show file tree
Hide file tree
Showing 125 changed files with 111 additions and 1,279 deletions.
117 changes: 106 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,120 @@
# Yeh Hai Meri Kahani
# Kahani : Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures

For TAB we want to create a system that's engaging for the user during the complete generation of our multi-step visual story
#### Official Repository for "Kahani : Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures"

Basic
- [ ] remap the volume

Here are some possible approaches:
## Abstract

- [ ] Gradio Streaming Chat with multi-step process
- [ ] SDXL Turbo
- [ ] Manual control
Large Language Models (LLMs) and Text-To-Image (T2I) models have demonstrated the ability to generate compelling text and visual stories. However, their outputs are predominantly aligned with the sensibilities of the Global North, often resulting in an outsider's gaze on other cultures. As a result, non-Western communities have to put extra effort into generating culturally specific stories. To address this challenge, we developed a visual storytelling pipeline called **Kahani** that generates culturally grounded visual stories for non-Western cultures. Our pipeline leverages off-the-shelf models GPT-4 Turbo and Stable Diffusion XL (SDXL). By using Chain of Thought (CoT) and T2I prompting techniques, we capture the cultural context from user's prompt and generate vivid descriptions of the characters and scene compositions. To evaluate the effectiveness of our pipeline, we conducted a comparative user study with ChatGPT-4 (with DALL-E3) in which participants from different regions of India compared the cultural relevance of stories generated by the two tools. Results from the qualitative and quantitative analysis performed on the user study showed that our pipeline was able to capture and incorporate more Culturally Specific Items (CSIs) compared to ChatGPT-4. In terms of both its cultural competence and visual story generation quality, our pipeline outperformed ChatGPT-4 in 27 out of the 36 comparisons.

Additional UX improvements:
- [ ] Beautify the formatting of the story
<p align="center">
<img width="321" alt="kahani gpt comparison sample" src="https://github.com/user-attachments/assets/b123e289-5a44-464d-8e7a-bc2fe6a4ed1f">
</p>

## Developer Notes:

Follow the given commands to setup and run this project and install necessary packages.

```
# Build the docker image
$ docker build . -t kahani-streaming
# Set up the environment variables
$ touch .env
$ vi .env
# Paste the below two env variables in the .env file and replace it with your API key and endpoint
# SDAPI_HOST=http://172.17.0.1:7860
# OPENAI_API_KEY=<OPENAI_API_KEY>
# To run the docker container from the built docker image
$ docker run -it -d -p 8080:8080 --env-file .env kahani-streaming
```
## Gradio Tool Snapshot

A snapshot of the Kahani Gradio Tool :
<p align="center">
<img width="600" alt="gradio tool" src="https://github.com/user-attachments/assets/acd733ff-13e5-4ccf-b125-b910c2c0c313">
</p>

## Kahani Pipeline
<p align="center">
<img width="600" alt="kahani pipeline" src="https://github.com/user-attachments/assets/e1fac90a-b940-4a8a-8cdf-135530a44a72">
</p>

Our proposed visual storytelling pipeline consists of five primary steps starting from extracting details from the user story prompt and expanding on these details based on the story's cultural context, to generating cultural visuals for each scene. Prompts for each of the steps are provided in the `src/prompts` folder.

## Directory Structure

```
.
├── README.md
└── src
├── Dockerfile
├── api.py
├── app.py
├── avatar.png
├── kahani.py
├── llm.py
├── models.py
├── pipeline.yml
├── poetry.lock
├── prompts
│ ├── __init__.py
│ ├── base.py
│ ├── bounding_box
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── break_story_into_scenes
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── classify_change
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── create_story
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── extract_characters
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── extract_culture
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── generate_character
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── generate_pose
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── generate_scenes
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── summarise_culture
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ └── user_input
│ ├── __init__.py
│ ├── system.md
│ └── user.md
├── pyproject.toml
└── utils.py
```

<!-- ## Citation -->

<!-- In case you use this work or pipeline, please cite our work,-->

<!-- ```-->
<!-- ```-->

## Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Expand All @@ -45,3 +136,7 @@ trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.

## Privacy

You can read more about Microsoft's privacy statement [here](https://go.microsoft.com/fwlink/?LinkId=521839).
2 changes: 1 addition & 1 deletion Dockerfile → src/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ EXPOSE 8080
ARG VERSION="1"
ENV DOCKER_IMAGE_VERSION=${VERSION}

CMD ["gradio", "app.py"]
CMD ["gradio", "app.py"]
2 changes: 1 addition & 1 deletion api.py → src/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,4 +462,4 @@ def pose_generation(reference_image, **kwargs):
else:
print("Invalid action")



File renamed without changes.
File renamed without changes
2 changes: 1 addition & 1 deletion kahani.py → src/kahani.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,4 +489,4 @@ def sync_scenes_in_db(self, image, index):
print(f"-------- scene {index} synced ----------")




File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion pipeline.yml → src/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,4 @@ steps:
commandOptions: '-var="prefix=$(prefix)" -var="OPENAI_API_KEY=$(OPENAI_API_KEY)" -var="SDAPI_HOST=$(SDAPI_HOST)" -var="container_registry=$(container_registry)" -var="container_registry_username=$(container_registry_username)" -var="container_registry_password=$(container_registry_password)"'
# condition: and(succeeded(), or(contains(variables.modifiedFolders, 'terraform'), contains(variables.modifiedFolders, 'server'), contains(variables.modifiedFolders, 'pbyc')))
displayName: "Terraform Apply"


File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion utils.py → src/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,4 @@ def final_scene_generation_prompt(names, prompts):
for name, prompt in zip(names, processed_prompts):
result.append({"name": name, "description": prompt})

return result
return result
4 changes: 0 additions & 4 deletions terraform/.gitignore

This file was deleted.

27 changes: 0 additions & 27 deletions terraform/README.md

This file was deleted.

5 changes: 0 additions & 5 deletions terraform/backend.tf

This file was deleted.

43 changes: 0 additions & 43 deletions terraform/main.tf

This file was deleted.

33 changes: 0 additions & 33 deletions terraform/variables.tf

This file was deleted.

36 changes: 0 additions & 36 deletions terraform/webapp.tf

This file was deleted.

Binary file not shown.
Binary file removed tests/image_processing/inputs/bala_pose_one.png
Binary file not shown.
62 changes: 0 additions & 62 deletions tests/image_processing/test_image_processing.py

This file was deleted.

Loading

0 comments on commit 8fa504b

Please sign in to comment.