Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gitter] Misclasifying Pull Requests and Issues #1028

Open
k----n opened this issue Jan 30, 2022 · 1 comment
Open

[Gitter] Misclasifying Pull Requests and Issues #1028

k----n opened this issue Jan 30, 2022 · 1 comment
Labels

Comments

@k----n
Copy link

k----n commented Jan 30, 2022

Here is some data that is gitter enriched for a pull request:

{
        "_index" : "gitter_enriched_raw",
        "_type" : "items",
        "_id" : "a9a4a861b3011c2bafdef977c72b205419449b4d",
        "_score" : 5.3138795,
        "_source" : {
          "metadata__updated_on" : "2016-04-30T00:34:26.399000+00:00",
          "metadata__timestamp" : "2022-01-30T03:29:44.263053+00:00",
          "offset" : null,
          "origin" : "https://gitter.im/shuup/shuup",
          "tag" : "https://gitter.im/shuup/shuup",
          "uuid" : "a9a4a861b3011c2bafdef977c72b205419449b4d",
          "unread" : 0,
          "text_analyzed" : "Looks like those issues should be fixed with this bugfix: https://github.com/shoopio/shoop/pull/441",
          "readBy" : 15,
          "issues" : [
            {
              "repo" : "shoopio/shoop",
              "number" : "441"
            }
          ],
          "id" : "5723fd92e10a59c061074eed",
          "url_hostname" : [ ],
          "tz" : 0,
          "fromUser_id" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
          "fromUser_uuid" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
          "fromUser_name" : "Shawn Her Many Horses",
          "fromUser_user_name" : "",
          "fromUser_domain" : null,
          "fromUser_gender" : "Unknown",
          "fromUser_gender_acc" : 0,
          "fromUser_org_name" : "Unknown",
          "fromUser_bot" : false,
          "fromUser_multi_org_names" : [
            "Unknown"
          ],
          "author_id" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
          "author_uuid" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
          "author_name" : "Shawn Her Many Horses",
          "author_user_name" : "",
          "author_domain" : null,
          "author_gender" : "Unknown",
          "author_gender_acc" : 0,
          "author_org_name" : "Unknown",
          "author_bot" : false,
          "author_multi_org_names" : [
            "Unknown"
          ],
          "project" : "shuup/shuup",
          "project_1" : "shuup/shuup",
          "grimoire_creation_date" : "2016-04-30T00:34:26.399000+00:00",
          "is_gitter_message" : 1,
          "repository_labels" : [ ],
          "metadata__filter_raw" : null,
          "metadata__gelk_version" : "0.99.0",
          "metadata__gelk_backend_name" : "GitterEnrich",
          "metadata__enriched_on" : "2022-01-30T04:54:27.841150+00:00"
        }
      },

There should be is_pull according to here:

entity['is_pull'] = entity['repo'] + ' #' + entity['number']

Maybe the regex isn't working?

HTML_LINK_REGEX = re.compile("href=[\"\'](.*?)[\"\']")


Here's a case with an issue:

{
        "_index" : "gitter_enriched_raw",
        "_type" : "items",
        "_id" : "963132bc57c2bf58a906aca2dc1f91fdeb65f76a",
        "_score" : 5.2737937,
        "_source" : {
          "metadata__updated_on" : "2017-01-27T08:45:38.335000+00:00",
          "metadata__timestamp" : "2022-01-30T03:29:41.738953+00:00",
          "offset" : null,
          "origin" : "https://gitter.im/shuup/shuup",
          "tag" : "https://gitter.im/shuup/shuup",
          "uuid" : "963132bc57c2bf58a906aca2dc1f91fdeb65f76a",
          "unread" : 0,
          "text_analyzed" : "https://github.com/shuup/shuup/issues/361 -> i tried this",
          "readBy" : 17,
          "issues" : [
            {
              "repo" : "shuup/shuup",
              "number" : "361"
            }
          ],
          "id" : "588b08b25309d6b3587415c3",
          "url_hostname" : [ ],
          "tz" : 8,
          "fromUser_id" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
          "fromUser_uuid" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
          "fromUser_name" : "aoy12",
          "fromUser_user_name" : "",
          "fromUser_domain" : null,
          "fromUser_gender" : "Unknown",
          "fromUser_gender_acc" : 0,
          "fromUser_org_name" : "Unknown",
          "fromUser_bot" : false,
          "fromUser_multi_org_names" : [
            "Unknown"
          ],
          "author_id" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
          "author_uuid" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
          "author_name" : "aoy12",
          "author_user_name" : "",
          "author_domain" : null,
          "author_gender" : "Unknown",
          "author_gender_acc" : 0,
          "author_org_name" : "Unknown",
          "author_bot" : false,
          "author_multi_org_names" : [
            "Unknown"
          ],
          "project" : "shuup/shuup",
          "project_1" : "shuup/shuup",
          "grimoire_creation_date" : "2017-01-27T08:45:38.335000+00:00",
          "is_gitter_message" : 1,
          "repository_labels" : [ ],
          "metadata__filter_raw" : null,
          "metadata__gelk_version" : "0.99.0",
          "metadata__gelk_backend_name" : "GitterEnrich",
          "metadata__enriched_on" : "2022-01-30T04:54:15.833892+00:00"
        }
      }

There should be an is_issue key according to:

if links_found[i].split('/')[-2] == 'issues':

It's probably a regex issue again?

@k----n
Copy link
Author

k----n commented Jan 31, 2022

Sometimes the pull request or issue is referred to in a span tag e.g.:

"data" : {
            "id" : "5723fd92e10a59c061074eed",
            "text" : "Looks like those issues should be fixed with this bugfix: https://github.com/shoopio/shoop/pull/441",
            "html" : """Looks like those issues should be fixed with this bugfix: <span data-link-type="issue" data-issue="441" data-issue-repo="shoopio/shoop" class="issue">shoopio/shoop#441</span>""",
            "sent" : "2016-04-30T00:34:26.399Z",
            "unread" : false,
            "readBy" : 15,
            "urls" : [ ],
            "mentions" : [ ],
            "issues" : [
              {
                "repo" : "shoopio/shoop",
                "number" : "441"
              }
            ],

Even when it's a pull request, it's linked as an "issue" in the span tag.

So github will need to be queried to classify as either pull request or issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants