Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Streamlit landing page #76

Merged
merged 29 commits into from
Apr 14, 2021
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
5125440
added landing page markdown file
connellyw Mar 27, 2021
cbc9bea
Added intro and template
PaigeCD Mar 27, 2021
92a6138
wrote my portion about the local file system
Mar 30, 2021
8aa5c28
Made LANDING_PAGE the home page
PaigeCD Mar 30, 2021
5b80939
Added frequency analysis section
nathandloria Mar 31, 2021
7cdbaa1
added the sentiment analysis section
Mar 31, 2021
79c1087
added to document similarity
connellyw Mar 31, 2021
bf54d8e
Pushed fix to streamlit.py and added image for frequency analysis
nathandloria Mar 31, 2021
7192619
Finished the AWS section of the Landing Page
Mar 31, 2021
bf8dfd9
Merge branch 'issue#60' of https://github.com/Allegheny-Ethical-CS/Ga…
Mar 31, 2021
95bd4fa
added picture to document similarity
connellyw Mar 31, 2021
c3b02f3
Fixed headings and README
nathandloria Mar 31, 2021
0593034
Fixed headings and README
nathandloria Mar 31, 2021
b13c0c4
Fixed landing page
nathandloria Mar 31, 2021
0fa18bb
Added topic modeling section and pictures
PaigeCD Mar 31, 2021
cc306d8
Merge branch 'master' into issue#60
nathandloria Apr 6, 2021
77f9e5d
Merge branch 'master' into issue#60
enpuyou Apr 7, 2021
fcd4128
Added markdown file to docs folder
PaigeCD Apr 7, 2021
6d88490
Merge branch 'issue#60' of github.com:Allegheny-Ethical-CS/GatorMiner…
PaigeCD Apr 7, 2021
7b74560
Revert "Merge branch 'issue#60' of github.com:Allegheny-Ethical-CS/Ga…
connellyw Apr 7, 2021
f3ca7e5
Fixed build issue
nathandloria Apr 7, 2021
4772902
Add clarification of file limits
connellyw Apr 7, 2021
b976355
updated main.yml
nathandloria Apr 7, 2021
dd2ec0c
Revert "Revert "Merge branch 'issue#60' of github.com:Allegheny-Ethic…
nathandloria Apr 8, 2021
55e1494
Merge branch 'master' into issue#60
enpuyou Apr 10, 2021
b7a81bf
Update LANDING_PAGE.md
nathandloria Apr 11, 2021
0f622a7
Update LANDING_PAGE.md
nathandloria Apr 11, 2021
55aa0df
Add mdl command to check docs folder
enpuyou Apr 14, 2021
af075a5
Merge branch 'master' into issue#60
nathandloria Apr 14, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions docs/LANDING_PAGE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Welcome to GatorMiner!

GatorMiner is an automated text-mining tool written in Python to measure the technical
responsibility of students in computer science courses. It is being used to analyze
students' markdown reflection documents and five questions survey based on
Natural Language Processing in the Department of Computer Science at Allegheny
College.

## Data Retrieving

There are currently two ways to import text data for analysis: through local file system or AWS DynamoDB.

### Local File System

What is a local file system?

- A controlled place where data can be stored and received. In this case, this
is where GatorMiner keeps data isolated so it can be easily identified.

In GatorMiner, you can type in the path(s) to the directories(s) that hold
reflection markdown documents. You are welcome to try the tool with the sample
documents. You are welcome to try the tool with the sample documents we provided
nathandloria marked this conversation as resolved.
Show resolved Hide resolved
in the 'resources', for example:

```shell
nathandloria marked this conversation as resolved.
Show resolved Hide resolved
resources/sample_md_reflections/lab1, resources/sample_md_reflections/lab2, resources/sample_md_reflections/lab3
```

### AWS

Retrieving reflection documents from AWS is a feature integrated with the use
of [GatorGrader](https://github.com/GatorEducator/gatorgrader) where students'
markdown reflection documents are being collected and stored inside the a
pre-configured DynamoDB database. In order to use this feature, you will need
to have some credential tokens (listed below) stored as environment variables:

```bash
export GATOR_ENDPOINT=<Your Endpoint>
export GATOR_API_KEY=<Your API Key>
export AWS_ACCESS_KEY_ID=<Your Access Key ID>
export AWS_SECRET_ACCESS_KEY=<Your Secret Access Key>
```

It is likely that you already have these prepared when using GatorMiner in
conjunction with GatorGrader, since these would already be exported when
setting up the AWS services. You can read more about setting up an AWS service
with GatorGrader [here](https://github.com/enpuyou/script-api-lambda-dynamodb).

Once the documents are successfully imported, you can then navigate through
the select box in the sidebar to view the text analysis:

<img src="resources/images/select_box.png" alt="browser" style="width:100%"/>

## Analysis

### Frequency Analysis

Frequency analysis is the quantification and analysis of word usage in text (how often a word appears within a certain text). Overall, frequency analysis can provide amazing insight into the many aspects of assignments that instructors may not always be able to observe so it can be extremely valuable to make this information available in a user-friendly and intuitive fashion. This can be achieved using GatorMiner frequency analysis.
nathandloria marked this conversation as resolved.
Show resolved Hide resolved

Within the GatorMiner tool, you have the ability to choose `Frequency Analysis` as an analysis option after the path to the desired reflection documents is submitted.

When the tool runs a frequency analysis, on any number of assignments, it provides 3 different options to choose from:

- Overall
- Student
- Question

When `Overall` is selected, the application will display a vertical bar chart containing a list of the words used with the highest frequency for each given assignment.

When `Student` is selected, a dropdown menu is provided allowing you to pick which student the tool should display frequency data for. As with `Overall`, this data is also displayed as a vertical bar chart and you can display multiple students' data on the same page in order to compare and contrast the types of words that are being used by student.

Finally, when `Question` is selected, the option to pick one or more specific questions appears. The tool then produces and displays a vertical bar chart which contains frequency information for each of the selected questions in the assignment. This is helpful for comparing the ways in which different terms are utilized within different questions in an assignment.

<img src="resources/images/frequency.png" alt="browser" style="width:100%"/>

### Sentiment Analysis

Sentiment analysis (or opinion mining) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Overall,
this is a technique to determine whether data is positive, negative, or neutral.

Within the GatorMiner tool, you have the ability to choose `Sentiment Analysis` as an analysis option after the path to the desired reflection documents is submitted.

When the tool runs a Sentiment analysis, on any number of assignments, it provides 3 different options to choose from:

- Overall
- Student
- Question

When `Overall` is selected, a scatter plot and a bar chart appear on the screen
displaying the overall sentiment polarity in, for example, assignment-01 given by the users.

When `Student` is selected, it allows the user to choose a specific student to
observe. When chosen it shows the sentiment shown by the chosen user with a mini bar graph and a bigger version of that using a histogram. Inside this feature, you can also change the number of plots per row.

Finally, when `Question` is selected, it allows the user to choose a certain question in the drop down menu. When chosen, it shows the user the sentiment the question was given.

<img src="resources/images/sentiment.png" alt="browser" style="width:100%"/>

### Document Similarity

Document similarity analyzes documents and compares text to determine frequency of words between documents.

Within the GatorMiner tool, you have the ability to choose `Document Similarity` as an analysis option after the path to the desired reflection documents is submitted.

In the `Document Similarity` section, you are able to select the type of similarity analysis `TF-IDF` and `Spacy`.

When `TF-IDF` is selected, the application will display a frequency matrix showing the correlation between documents. It does this buy dividing the frequency of the word by the total number of terms in a document.

When `Spacy` is selected, the application will display a drop down named 'Model name' with two options:

- `en_core_web_sm` which is used to produce a correlation matrix for **SMALLER** files. (<13mb)
- `en_core_web_md` which is used to produce a correlation matrix for **LARGER** files. (>13mb)

**Warning exceeding these file limits could cause the program to crash.**

**See [Spacy.io](https://spacy.io/models/en) for more details of file limits.**

<img src="resources/images/similarity.png" alt="browser" style="width:100%"/>

### Topic Modeling

Topic modeling analyzes documents to find keywords in order to determine the documents' dominant topics.

Within the GatorMiner tool, you have the ability to choose `Topic Modeling` as an analysis option after the path to the desired reflection documents is submitted.

In the `Topic Modeling` section, you are able to select the type of topic modeling analysis `Histogram` and `Scatter`.

When `Histogram` is selected, the application will display a histogram in which the dominant topic is on the x-axis and the count of records is on the y-axis. A legend in the top right corner will display the names of the reflection files new to the color that corresponds with them.

When `Scatter` is selected, the application will display a scatter plot. The legend on the right side will display the colors that correspond to topic numbers and the shapes that correspond with topics.

Sliders are also provided that can adjust the amount of topics or adjust the amount of words per topic.

<img src="resources/images/topic.png" alt="browser" style="width:100%"/>
20 changes: 10 additions & 10 deletions streamlit_web.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def main():
if debug_mode:
st.write(main_df)
if analysis_mode == "Home":
readme()
landing_src()
else:
if analysis_mode == "Frequency Analysis":
st.title(analysis_mode)
Expand All @@ -84,26 +84,26 @@ def main():
interactive()
success_msg.empty()

def readme():
def landing_src():
"""function to load and configurate readme source"""

with open("README.md") as readme_file:
readme_src = readme_file.read()
with open("docs/LANDING_PAGE.md") as landing_file:
landing_src = landing_file.read()
for file in os.listdir("resources/images"):
if file.endswith(".png"):
img_path = f"resources/images/{file}"
with open(img_path, "rb") as f:
img_bin = base64.b64encode(f.read()).decode()
readme_src = readme_src.replace(img_path, f"data:image/png;base64,{img_bin}")
landing_src = landing_src.replace(img_path, f"data:image/png;base64,{img_bin}")

st.markdown(readme_src, unsafe_allow_html=True)
st.markdown(landing_src, unsafe_allow_html=True)

def landing_pg():
"""landing page"""
landing = st.sidebar.selectbox("Welcome", ["Home", "Interactive"])

if landing == "Home":
readme()
landing_src()
else:
interactive()

Expand Down Expand Up @@ -134,7 +134,7 @@ def retreive_data(data_retreive):
except TypeError:
st.sidebar.warning(
"No data imported. Please check the reflection document input")
readme()
landing_src()
else:
global success_msg
success_msg = None
Expand Down Expand Up @@ -170,7 +170,7 @@ def import_data(data_retreive_method, paths):
json_lst.append(md.collect_md(path))
except FileNotFoundError as err:
st.sidebar.text(err)
readme()
landing_src()
else:
passbuild = st.sidebar.checkbox(
"Only retreive build success records", value=True)
Expand All @@ -181,7 +181,7 @@ def import_data(data_retreive_method, paths):
json_lst.append(ju.clean_report(response))
except (EnvironmentError, Exception) as err:
st.sidebar.error(err)
readme()
landing_src()
# when data is retreived
if json_lst:
raw_df = pd.DataFrame()
Expand Down