forked from gianlucascoccia/MSR2021Replication
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DOC] (Issue #8) : adding the docs to GitHub pages.
- Loading branch information
Showing
99 changed files
with
7,112 additions
and
106 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: d2483fc74d7b28ac83b5605bb2443898 | ||
config: 3d28de955ed6b7ae24203c537e424466 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Empty file.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Welcome to StackOverflow Mining contributing guide <!-- omit in toc --> | ||
|
||
Thank you for investing your time in contributing to our project! :sparkles:. | ||
|
||
Read our [Code of Conduct](./docs_contributing/CODE_OF_CONDUCT.md) to keep our community approachable and respectable. | ||
|
||
In this guide you will get an overview of the contribution workflow from opening an issue, creating a PR, reviewing, and merging the PR. | ||
|
||
## New contributor guide | ||
|
||
To get an overview of the project, read the [README](README.md). | ||
|
||
We categorize our issues avaluating two metrics: | ||
- Skill level | ||
- Funccionality | ||
See at [labels tab](https://github.com/FGA-GCES/MSR2021Replication/labels). | ||
|
||
|
||
## Getting started | ||
|
||
To understand project's file structure take a look at [Getting Started](./docs_contributing/getting_started.md). | ||
|
||
|
||
### Issues | ||
|
||
#### Create a new issue | ||
|
||
If you identify a problem with the docs or code, check to see if a problem already exists. | ||
|
||
If a related issue does not exist, you can open a new issue, as a rule two types of [issue form](https://github.com/FGA-GCES/MSR2021Replication/tree/docs_contributing/.github/ISSUE_TEMPLATE) are used. | ||
|
||
they are: | ||
- [Bug report](./issues/bug_report.md) and the [Standard](./issues/standard.md) | ||
|
||
#### Solve a problem | ||
|
||
Please review our [existing issues](https://github.com/FGA-GCES/MSR2021Replication/issues) to find one that interests you. You can narrow your search using `tags` as filters. See [Labels](https://github.com/FGA-GCES/MSR2021Replication/labels) for more information. As a general rule, we do not attribute problems to anyone. If you find an issue to resolve, you can open a PR with a fix. | ||
|
||
### make branches | ||
|
||
The original repository must be forked. As a rule, creations of branches in the original fork will not be accepted. | ||
|
||
### Commit your update | ||
|
||
Commit the changes once you are happy with them. Don't forget to follow the commit policy used in this project. | ||
|
||
Visit [commit politics](./docs_contributing/commit_politics.md) for more information! | ||
|
||
### development politics | ||
|
||
The code must follow the guidelines found in the official documents of each technology used in this project. | ||
|
||
### Pull Request | ||
|
||
When you're finished with the changes, create a pull request, also known as a PR. | ||
- Fill the "Ready for review" template so that we can review your PR. This template helps reviewers understand your changes as well as the purpose of your pull request. | ||
- Don't forget to [link PR to issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue) if you are solving one. | ||
- Enable the checkbox to [allow maintainer edits](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork) so the branch can be updated for a merge. | ||
Once you submit your PR, a Docs team member will review your proposal. We may ask questions or request additional information. | ||
- We may ask for changes to be made before a PR can be merged, either using [suggested changes](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/incorporating-feedback-in-your-pull-request) or pull request comments. You can apply suggested changes directly through the UI. You can make any other changes in your fork, then commit them to your branch. | ||
- As you update your PR and apply changes, mark each conversation as [resolved](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/commenting-on-a-pull-request#resolving-conversations). | ||
- If you run into any merge issues, checkout this [git tutorial](https://github.com/skills/resolve-merge-conflicts) to help you resolve merge conflicts and other issues. | ||
|
||
### Your PR is merged! | ||
|
||
Congratulations :tada::tada: The GitHub team thanks you :sparkles:. | ||
|
||
Once your PR is merged, your contributions will be publicly visible on the [GitHub docs](https://docs.github.com/en). | ||
|
||
Now that you are part of the GitHub docs community, see how else you can [contribute to the docs](/contributing/types-of-contributions.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# introduction to the project | ||
|
||
## Replication package for the MSR2021 "Challenges in Developing Desktop Web Apps: a Study of Stack Overflow and GitHub" paper | ||
|
||
## Authors: Gian Luca Scoccia, Partizio Migliarini, Marco Autili | ||
|
||
### Abstract | ||
|
||
Software companies have an interest in reaching the maximum amount of potential customers while, at the same time, providing a frictionless experience. Desktop web app frameworks are promising in this respect, allowing developers and companies to reuse existing code and knowledge of web applications to create cross-platform apps integrated with native APIs. Despite their growing popularity, existing challenges in employing these technologies have not been documented, and it is hard for individuals and companies to weigh benefits and pros against drawbacks and cons. | ||
In this paper, we address this issue by investigating the challenges that developers frequently experience when adopting desktop web app frameworks. To achieve this goal, we mine and apply topic modeling techniques to a dataset of 10,822 Stack Overflow posts related to the development of desktop web applications. Analyzing the resulting topics, we found that: i) developers often experience issues regarding the build and deployment processes for multiple platforms; ii) reusing exist- ing libraries and development tools in the context of desktop applications is often cumbersome; iii) it is hard to solve issues that arise when interacting with native APIs. Furthermore, we confirm our finding by providing evidence that the identified issues are also present in the issue reports of 453 open-source applications publicly hosted on GitHub. | ||
|
||
Paper preprint available [HERE](MSR2021_preprint.pdf). | ||
|
||
## Online appendix | ||
|
||
The online appendix with the complete discussion of all topics mentioned in the paper is available [HERE](online_appendix.md). | ||
|
||
## Replication package | ||
|
||
Data used in the study is available in the folder [data/processed](data/processed). | ||
|
||
Raw data (w/o cleaning & filtering) is available in the folder [data/raw](data/raw). | ||
|
||
## Scripts | ||
### Tags selection | ||
|
||
Selection of tags related to desktop web apps questions, by means of significance and relevance metrics, is performed by the scripts: [extract_tagset_from_csv.py](notebook/extract_tagset_from_csv.py) and [createT.py](notebook/create_T.py). | ||
|
||
### Selection of relevant Stack Overflow questions | ||
|
||
The queries used to select relevant Stack Overflow questions from SOTorrent are available in the file: [so_torrent_queries.txt](so_torrent_queries.txt) | ||
|
||
First query selects relevant questions (based on their tags). | ||
|
||
Second query was used to collect accepted answers for the questions returned by the first query. | ||
|
||
### Topic modeling | ||
|
||
Topic modeling was executed by means of the [Mallet tool](http://mallet.cs.umass.edu). | ||
|
||
The commands used to execute the tool from the command line is provided in the [mallet_instructions.txt](mallet_instructions.txt) file. | ||
|
||
### Statistical analysis | ||
|
||
Scripts used to analyze the collected data are available in the folder [notebook](notebook). The Python scripts in the folder were used to perform data cleaning and exploratory analysis. The statistical tests performed in the study were implemented in the R language and are available in the file [tests.r](notebook/tests.r) | ||
|
||
### StackOverflow datasets | ||
|
||
To analyse StackOverflow datasets, run the jupyter notebook using the following command: | ||
|
||
```sh | ||
jupyter-notebook SO_dataset_analysis.ipynb | ||
``` | ||
|
||
This notebooks run the scripts to clean the dataset, run the Mallet Tool and analyse the results. | ||
|
||
For more instructions on how to run the scripts access the [Getting Started](./docs/getting_started.md) document. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Contributor Covenant Code of Conduct | ||
|
||
with the objective of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project | ||
and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity | ||
and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual | ||
identity and orientation. | ||
|
||
# Our Standards | ||
|
||
### **Examples of behavior that contributes to a positive environment for our community include:** | ||
|
||
↳ Demonstrating empathy and kindness toward other people | ||
|
||
↳ Being respectful of differing opinions, viewpoints, and experiences | ||
|
||
↳ Giving and gracefully accepting constructive feedback | ||
|
||
↳ Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience | ||
|
||
↳ Focusing on what is best not just for us as individuals, but for the overall community | ||
|
||
### **Examples of unacceptable behavior include:** | ||
|
||
↳ The use of sexualized language or imagery, and sexual attention or advances of any kind | ||
|
||
↳ Trolling, insulting or derogatory comments, and personal or political attacks | ||
|
||
↳ Public or private harassment | ||
|
||
↳ Publishing others’ private information, such as a physical or email address, without their explicit permission | ||
|
||
↳ Other conduct which could reasonably be considered inappropriate in a professional setting | ||
|
||
# Execution Responsibilities | ||
|
||
Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and | ||
fair corrective action in response to any behavior they deem inappropriate, threatening, offensive or harmful. | ||
|
||
Community leaders have the right and responsibility to remove, edit or reject comments, commits, codes, wiki edits, issues and | ||
other contributions that do not align with this Code of Conduct and will communicate the reasons for moderation decisions when | ||
appropriate. | ||
|
||
# Scope | ||
|
||
This Code of Conduct applies to all community spaces and also applies when an individual officially represents the community in | ||
public spaces. Examples of representing our community include using an official email address and linking to the project | ||
repository. | ||
|
||
# Attribution | ||
|
||
This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at | ||
https://www.contributor-covenant.org/version/1/4/code-of-conduct.html | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# commit politics | ||
|
||
Changes must be made following a pattern, indicating the issue resolved and the functionality (or fix) added. | ||
|
||
**➔ Use tags to define the purpose of the commit:** | ||
|
||
`ADD` : when to add a new feature | ||
|
||
`DEL` : If it is a commit related to removing something | ||
|
||
`UPDATE` : when to update some functionality | ||
|
||
`FIX` : for referencing fixes | ||
|
||
`DOC` : to indicate documentation | ||
|
||
`REFACT` : indicates code refactoring | ||
|
||
`DOC` : indicates relationship with documentation | ||
|
||
## Example commit structure: | ||
|
||
git commit -m "[tag] (Issue #x) : descriptive message" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
# Getting started with using the MSR2021Replication | ||
|
||
## Install Mallet | ||
|
||
To install the Mallet tool, first is necessary to have the Apache ant build tool installed. Install the binary from [https://ant.apache.org/](https://ant.apache.org/) and follow the [manual instructions](https://ant.apache.org/manual/install.html#getBinary) to configure it. | ||
|
||
With ant installed and configured, open the Mallet 2.0.8 folder in the MSR2021Replication repository at `mallet/mallet-2.0.8` and run the following command: | ||
|
||
```sh | ||
$ ant | ||
``` | ||
|
||
The Mallet tool will be available to use at `mallet/mallet-2.0.8/bin/mallet`. | ||
|
||
## Run with jupyter notebook | ||
|
||
The jupyter notebook can be used for StackOverflow datasets. To run the jupyter notebook run the following command on the repository root. | ||
|
||
```sh | ||
$ jupyter-notebook SO_dataset_analysis.ipynb | ||
``` | ||
|
||
Follow the notebook instructions to import the correct dataset and run the scripts. | ||
|
||
## Run with bash | ||
|
||
### Install python libraries | ||
|
||
Run the following command to install the libraries in the scripts: | ||
|
||
```sh | ||
$ pip install -r notebook/requirements.txt | ||
``` | ||
|
||
Open a Python3 console with the command: | ||
|
||
```sh | ||
$ python3 | ||
``` | ||
|
||
Inside the console download the nltk packages by running the following code: | ||
```py | ||
import nltk | ||
|
||
nltk.download('punkt') | ||
nltk.download('stopwords') | ||
nltk.download('word_tokenize') | ||
nltk.download('tokenize') | ||
nltk.download('stem') | ||
``` | ||
### Export the variables | ||
|
||
Use the following commands to export the variables so scripts can use the correct path to the dataset and output folder. | ||
|
||
```sh | ||
# Export path to the raw dataset | ||
$ export DATASET_PATH=./tcc/so_questions.csv | ||
|
||
# Export the output path | ||
$ export OUTPUT_PATH=./output | ||
|
||
# Export the number of topics division | ||
$ export TOPICS_NUM=15 | ||
``` | ||
### Prepare dataset | ||
|
||
To run the MSR2021Replication, it is necessary to run the following script to parse the `.csv` dataset, clean it and create documents to be used by the mallet tool. | ||
|
||
```sh | ||
$ python3 prepare_dataset.py | ||
``` | ||
### Run the Mallet tool | ||
|
||
Run mallet instructions: | ||
|
||
```sh | ||
$ mallet/mallet-2.0.8/bin/mallet import-dir --input $OUTPUT_PATH/so_data/ --output $OUTPUT_PATH/so.mallet --keep-sequence --remove-stopwords --extra-stopwords extra_stopwords/so.txt | ||
``` | ||
|
||
```sh | ||
$ mallet/mallet-2.0.8/bin/mallet train-topics --random-seed 100 --input $OUTPUT_PATH/so.mallet --num-topics 15 --optimize-interval 20 --output-state $OUTPUT_PATH/so-topic-state.gz --output-topic-keys $OUTPUT_PATH/so_keys.txt --output-doc-topics $OUTPUT_PATH/so_composition.txt --diagnostics-file $OUTPUT_PATH/so_results/so_diagnostics.xml | ||
|
||
``` | ||
|
||
### Parse mallet results | ||
|
||
After running the mallet tool, run the following script. | ||
|
||
```sh | ||
$ python3 manage_results.py | ||
``` | ||
|
||
This script will create document files for each topic containg all the questions related to this topic. | ||
|
||
|
||
## Run with docker | ||
|
||
### first steps | ||
|
||
- Download docker and docker compose | ||
|
||
- Extract the so_questions.csv.zip file present in TCC folder | ||
|
||
### start docker container | ||
|
||
Run the docker container | ||
|
||
``` make start ``` | ||
|
||
### Install dependencies and exec docker | ||
|
||
To install the dependencies and enter the container | ||
|
||
``` make init ``` | ||
|
||
### ↳ follow the next steps using bash opened by the make init command: | ||
|
||
### Prepare data | ||
|
||
Prepare data | ||
|
||
``` make prepare ``` | ||
|
||
### Process data | ||
|
||
Process the data with mallet | ||
|
||
``` make process ``` | ||
|
||
### Results | ||
|
||
Process the results | ||
|
||
```make results``` |
Oops, something went wrong.