Skip to content

Commit

Permalink
[DOC] (Issue #8) : adding the docs to GitHub pages.
Browse files Browse the repository at this point in the history
  • Loading branch information
Madu01 committed Jun 5, 2023
1 parent 020bf1c commit a4c3895
Show file tree
Hide file tree
Showing 99 changed files with 7,112 additions and 106 deletions.
1 change: 0 additions & 1 deletion README.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: d2483fc74d7b28ac83b5605bb2443898
config: 3d28de955ed6b7ae24203c537e424466
tags: 645f666f9bcd5a90fca523b33c5a78b7
Empty file removed docs/.nojekyll
Empty file.
345 changes: 345 additions & 0 deletions docs/CONTRIBUTING.html

Large diffs are not rendered by default.

341 changes: 341 additions & 0 deletions docs/README.html

Large diffs are not rendered by default.

70 changes: 70 additions & 0 deletions docs/_sources/CONTRIBUTING.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Welcome to StackOverflow Mining contributing guide <!-- omit in toc -->

Thank you for investing your time in contributing to our project! :sparkles:.

Read our [Code of Conduct](./docs_contributing/CODE_OF_CONDUCT.md) to keep our community approachable and respectable.

In this guide you will get an overview of the contribution workflow from opening an issue, creating a PR, reviewing, and merging the PR.

## New contributor guide

To get an overview of the project, read the [README](README.md).

We categorize our issues avaluating two metrics:
- Skill level
- Funccionality
See at [labels tab](https://github.com/FGA-GCES/MSR2021Replication/labels).


## Getting started

To understand project's file structure take a look at [Getting Started](./docs_contributing/getting_started.md).


### Issues

#### Create a new issue

If you identify a problem with the docs or code, check to see if a problem already exists.

If a related issue does not exist, you can open a new issue, as a rule two types of [issue form](https://github.com/FGA-GCES/MSR2021Replication/tree/docs_contributing/.github/ISSUE_TEMPLATE) are used.

they are:
- [Bug report](./issues/bug_report.md) and the [Standard](./issues/standard.md)

#### Solve a problem

Please review our [existing issues](https://github.com/FGA-GCES/MSR2021Replication/issues) to find one that interests you. You can narrow your search using `tags` as filters. See [Labels](https://github.com/FGA-GCES/MSR2021Replication/labels) for more information. As a general rule, we do not attribute problems to anyone. If you find an issue to resolve, you can open a PR with a fix.

### make branches

The original repository must be forked. As a rule, creations of branches in the original fork will not be accepted.

### Commit your update

Commit the changes once you are happy with them. Don't forget to follow the commit policy used in this project.

Visit [commit politics](./docs_contributing/commit_politics.md) for more information!

### development politics

The code must follow the guidelines found in the official documents of each technology used in this project.

### Pull Request

When you're finished with the changes, create a pull request, also known as a PR.
- Fill the "Ready for review" template so that we can review your PR. This template helps reviewers understand your changes as well as the purpose of your pull request.
- Don't forget to [link PR to issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue) if you are solving one.
- Enable the checkbox to [allow maintainer edits](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork) so the branch can be updated for a merge.
Once you submit your PR, a Docs team member will review your proposal. We may ask questions or request additional information.
- We may ask for changes to be made before a PR can be merged, either using [suggested changes](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/incorporating-feedback-in-your-pull-request) or pull request comments. You can apply suggested changes directly through the UI. You can make any other changes in your fork, then commit them to your branch.
- As you update your PR and apply changes, mark each conversation as [resolved](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/commenting-on-a-pull-request#resolving-conversations).
- If you run into any merge issues, checkout this [git tutorial](https://github.com/skills/resolve-merge-conflicts) to help you resolve merge conflicts and other issues.

### Your PR is merged!

Congratulations :tada::tada: The GitHub team thanks you :sparkles:.

Once your PR is merged, your contributions will be publicly visible on the [GitHub docs](https://docs.github.com/en).

Now that you are part of the GitHub docs community, see how else you can [contribute to the docs](/contributing/types-of-contributions.md).
57 changes: 57 additions & 0 deletions docs/_sources/README.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# introduction to the project

## Replication package for the MSR2021 "Challenges in Developing Desktop Web Apps: a Study of Stack Overflow and GitHub" paper

## Authors: Gian Luca Scoccia, Partizio Migliarini, Marco Autili

### Abstract

Software companies have an interest in reaching the maximum amount of potential customers while, at the same time, providing a frictionless experience. Desktop web app frameworks are promising in this respect, allowing developers and companies to reuse existing code and knowledge of web applications to create cross-platform apps integrated with native APIs. Despite their growing popularity, existing challenges in employing these technologies have not been documented, and it is hard for individuals and companies to weigh benefits and pros against drawbacks and cons.
In this paper, we address this issue by investigating the challenges that developers frequently experience when adopting desktop web app frameworks. To achieve this goal, we mine and apply topic modeling techniques to a dataset of 10,822 Stack Overflow posts related to the development of desktop web applications. Analyzing the resulting topics, we found that: i) developers often experience issues regarding the build and deployment processes for multiple platforms; ii) reusing exist- ing libraries and development tools in the context of desktop applications is often cumbersome; iii) it is hard to solve issues that arise when interacting with native APIs. Furthermore, we confirm our finding by providing evidence that the identified issues are also present in the issue reports of 453 open-source applications publicly hosted on GitHub.

Paper preprint available [HERE](MSR2021_preprint.pdf).

## Online appendix

The online appendix with the complete discussion of all topics mentioned in the paper is available [HERE](online_appendix.md).

## Replication package

Data used in the study is available in the folder [data/processed](data/processed).

Raw data (w/o cleaning & filtering) is available in the folder [data/raw](data/raw).

## Scripts
### Tags selection

Selection of tags related to desktop web apps questions, by means of significance and relevance metrics, is performed by the scripts: [extract_tagset_from_csv.py](notebook/extract_tagset_from_csv.py) and [createT.py](notebook/create_T.py).

### Selection of relevant Stack Overflow questions

The queries used to select relevant Stack Overflow questions from SOTorrent are available in the file: [so_torrent_queries.txt](so_torrent_queries.txt)

First query selects relevant questions (based on their tags).

Second query was used to collect accepted answers for the questions returned by the first query.

### Topic modeling

Topic modeling was executed by means of the [Mallet tool](http://mallet.cs.umass.edu).

The commands used to execute the tool from the command line is provided in the [mallet_instructions.txt](mallet_instructions.txt) file.

### Statistical analysis

Scripts used to analyze the collected data are available in the folder [notebook](notebook). The Python scripts in the folder were used to perform data cleaning and exploratory analysis. The statistical tests performed in the study were implemented in the R language and are available in the file [tests.r](notebook/tests.r)

### StackOverflow datasets

To analyse StackOverflow datasets, run the jupyter notebook using the following command:

```sh
jupyter-notebook SO_dataset_analysis.ipynb
```

This notebooks run the scripts to clean the dataset, run the Mallet Tool and analyse the results.

For more instructions on how to run the scripts access the [Getting Started](./docs/getting_started.md) document.
61 changes: 61 additions & 0 deletions docs/_sources/docs_contributing/CODE_OF_CONDUCT.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Contributor Covenant Code of Conduct

with the objective of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project
and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity
and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual
identity and orientation.

# Our Standards

### **Examples of behavior that contributes to a positive environment for our community include:**

↳ Demonstrating empathy and kindness toward other people

↳ Being respectful of differing opinions, viewpoints, and experiences

↳ Giving and gracefully accepting constructive feedback

↳ Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience

↳ Focusing on what is best not just for us as individuals, but for the overall community

### **Examples of unacceptable behavior include:**

↳ The use of sexualized language or imagery, and sexual attention or advances of any kind

↳ Trolling, insulting or derogatory comments, and personal or political attacks

↳ Public or private harassment

↳ Publishing others’ private information, such as a physical or email address, without their explicit permission

↳ Other conduct which could reasonably be considered inappropriate in a professional setting

# Execution Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and
fair corrective action in response to any behavior they deem inappropriate, threatening, offensive or harmful.

Community leaders have the right and responsibility to remove, edit or reject comments, commits, codes, wiki edits, issues and
other contributions that do not align with this Code of Conduct and will communicate the reasons for moderation decisions when
appropriate.

# Scope

This Code of Conduct applies to all community spaces and also applies when an individual officially represents the community in
public spaces. Examples of representing our community include using an official email address and linking to the project
repository.

# Attribution

This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at
https://www.contributor-covenant.org/version/1/4/code-of-conduct.html









23 changes: 23 additions & 0 deletions docs/_sources/docs_contributing/commit_politics.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# commit politics

Changes must be made following a pattern, indicating the issue resolved and the functionality (or fix) added.

**➔ Use tags to define the purpose of the commit:**

`ADD` : when to add a new feature

`DEL` : If it is a commit related to removing something

`UPDATE` : when to update some functionality

`FIX` : for referencing fixes

`DOC` : to indicate documentation

`REFACT` : indicates code refactoring

`DOC` : indicates relationship with documentation

## Example commit structure:

git commit -m "[tag] (Issue #x) : descriptive message"
134 changes: 134 additions & 0 deletions docs/_sources/docs_contributing/getting_started.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Getting started with using the MSR2021Replication

## Install Mallet

To install the Mallet tool, first is necessary to have the Apache ant build tool installed. Install the binary from [https://ant.apache.org/](https://ant.apache.org/) and follow the [manual instructions](https://ant.apache.org/manual/install.html#getBinary) to configure it.

With ant installed and configured, open the Mallet 2.0.8 folder in the MSR2021Replication repository at `mallet/mallet-2.0.8` and run the following command:

```sh
$ ant
```

The Mallet tool will be available to use at `mallet/mallet-2.0.8/bin/mallet`.

## Run with jupyter notebook

The jupyter notebook can be used for StackOverflow datasets. To run the jupyter notebook run the following command on the repository root.

```sh
$ jupyter-notebook SO_dataset_analysis.ipynb
```

Follow the notebook instructions to import the correct dataset and run the scripts.

## Run with bash

### Install python libraries

Run the following command to install the libraries in the scripts:

```sh
$ pip install -r notebook/requirements.txt
```

Open a Python3 console with the command:

```sh
$ python3
```

Inside the console download the nltk packages by running the following code:
```py
import nltk

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('word_tokenize')
nltk.download('tokenize')
nltk.download('stem')
```
### Export the variables

Use the following commands to export the variables so scripts can use the correct path to the dataset and output folder.

```sh
# Export path to the raw dataset
$ export DATASET_PATH=./tcc/so_questions.csv

# Export the output path
$ export OUTPUT_PATH=./output

# Export the number of topics division
$ export TOPICS_NUM=15
```
### Prepare dataset

To run the MSR2021Replication, it is necessary to run the following script to parse the `.csv` dataset, clean it and create documents to be used by the mallet tool.

```sh
$ python3 prepare_dataset.py
```
### Run the Mallet tool

Run mallet instructions:

```sh
$ mallet/mallet-2.0.8/bin/mallet import-dir --input $OUTPUT_PATH/so_data/ --output $OUTPUT_PATH/so.mallet --keep-sequence --remove-stopwords --extra-stopwords extra_stopwords/so.txt
```

```sh
$ mallet/mallet-2.0.8/bin/mallet train-topics --random-seed 100 --input $OUTPUT_PATH/so.mallet --num-topics 15 --optimize-interval 20 --output-state $OUTPUT_PATH/so-topic-state.gz --output-topic-keys $OUTPUT_PATH/so_keys.txt --output-doc-topics $OUTPUT_PATH/so_composition.txt --diagnostics-file $OUTPUT_PATH/so_results/so_diagnostics.xml

```

### Parse mallet results

After running the mallet tool, run the following script.

```sh
$ python3 manage_results.py
```

This script will create document files for each topic containg all the questions related to this topic.


## Run with docker

### first steps

- Download docker and docker compose

- Extract the so_questions.csv.zip file present in TCC folder

### start docker container

Run the docker container

``` make start ```

### Install dependencies and exec docker

To install the dependencies and enter the container

``` make init ```

### ↳ follow the next steps using bash opened by the make init command:

### Prepare data

Prepare data

``` make prepare ```

### Process data

Process the data with mallet

``` make process ```

### Results

Process the results

```make results```
Loading

0 comments on commit a4c3895

Please sign in to comment.