-
Notifications
You must be signed in to change notification settings - Fork 114
Long conjunctive query phrases make TCAT not start #334
Comments
Oops sorry I was neglecting the |
Twitter Standard streaming API request parameters says about track
Does it make any sense to document this issue like so, without writing this test against TCAT's codebase <?php
/**
* See https://github.com/digitalmethodsinitiative/dmi-tcat/issues/334 for the issue
*/
use PHPUnit\Framework\TestCase;
final class QueryPhraseTest extends TestCase
{
/**
* Test that any phrase is within the Twitter API spec
* https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters.html
*
* @dataProvider phraseProvider
*/
public function testPhraseLengthIsWithinTwitterApiTrackSpecs($phrase)
{
$minLength = 0;
$maxLength = 60;
$this->assertGreaterThanOrEqual($minLength, strlen($phrase));
$this->assertLessThanOrEqual($maxLength, strlen($phrase));
}
/**
* Data providers
*/
public function phraseProvider()
{
return [
["#Chinabigbrother #Citizenrating #Chinasurveillance #Socialcredit #Chinablackmirror #Digitaldictatorship #Digitalcensorship #Socialcreditsystem"],
["#Chinabigbrother #Citizenrating #Chinasurveillance #Socialcredit",
"#Chinablackmirror #Digitaldictatorship #Digitalcensorship #Socialcreditsystem"],
["#Chinabigbrother #Citizenrating",
"#Chinasurveillance #Socialcredit",
"#Chinablackmirror #Digitaldictatorship",
"#Digitalcensorship #Socialcreditsystem"],
];
}
} Which currently gives us
Anyway, towards a solution I would be inclined to fortify the web UI from Lines 488 to 493 in 85504de
to something like if(params['type']=='track') {
var _nrOfPhrases = validateNumberOfPhrases(params['oldphrases'].split(",").length,_newphrases.split(",").length);
if(!_nrOfPhrases) {
alert("With this query you will exceed the number of allowed queries (400) to the Twitter API. Please reduce the number of phrases.");
return false;
}
var _lenOfPhrases = validateLengthOfPhrases(_newphrases);
if(!_lenOfPhrases) {
alert("One or more of the phrases exceeds the allowed length (60 characters) to the Twitter API. Please shorten your phrases.");
return false; with |
Or alternatively fortify the web UI in here Line 750 in 85504de
|
Hi @xmacex, thanks for submitting this issue. As you stated, the phrase does not make any sense as it would only track tweets which contain exactly "#Chinabigbrother #Citizenrating #Chinasurveillance #Socialcredit #Chinablackmirror #Digitaldictatorship #Digitalcensorship #Socialcreditsystem". TCAT should indeed validate the length of phrases, although it should not insert arbitrary comma's thus modifying the phrases input by the user. Instead, the user could get a warning indicating that the phrase to track is too long. Best, Erik |
@ErikBorra I suggest let's keep the query design (Rogers 2017; and of course a whole body of literature in information retrieval) issues separate from input string validation. Of course as STS scholars this is a moment of reflection about how this divide exactly gets constantly made and re-made in the everyday practices 😆 |
Anyway I'd be happy to work on a pull request. I imagine input validation at JS as suggested in #334 (comment), but if someone can nudge me towards what would be a good place to introduce some fortification further down the stack, in I'll do that too while I'm on it. The function dmi-tcat/capture/query_manager.php Line 42 in 4613a15
|
Working on it here https://github.com/xmacex/dmi-tcat/tree/phrases_must_not_exceed_60chrs_each. I really don't know how to provide tests for Javascript functions buried in PHP 😟 |
A terminological observation about myself: I seem to have converged towars speaking of |
Hi @xmacex Sorry for the late reply on this and thank you for your pull request. It looks sound in principle. Some quick remarks before I can merge this. Maybe we should add a maximum length to the |
Thanks @dentoir. The Twitter API documentation for I implemented |
Hi. The following, long conjunctive query phrase
#Chinabigbrother #Citizenrating #Chinasurveillance #Socialcredit #Chinablackmirror #Digitaldictatorship #Digitalcensorship #Socialcreditsystem
will make TCAT not restart after the routine reload which is performed shortly after a query bin is added, as observed inlogs/controller.php
I am waiting for the process
dmitcat_track.php
to respawn via the normal cronjob by monitoring its appearance inps afx |grep 'var/www/dmi-tcat' |grep "track"
listing. It won't be there. Removing the bin from the usual web UI brings thedmitcat_track.php
back, after the routine reload is done (in a dozen or so seconds).The query phrase is long, 142 characters. If I arbitrarily split it into two by placing a disjuctive comma in the middle
#Chinabigbrother #Citizenrating #Chinasurveillance #Socialcredit, #Chinablackmirror #Digitaldictatorship #Digitalcensorship #Socialcreditsystem
the same issue still persists.However, if I further split these into two disjunctions each,
#Chinabigbrother #Citizenrating, #Chinasurveillance #Socialcredit, #Chinablackmirror #Digitaldictatorship, #Digitalcensorship #Socialcreditsystem
, things work as usual. Of course any actual query design is out the window by this point (if it ever was there, but that's an another matter).I observed the unabridged query phrase go into the database. The bin gets it's entries in
tcat_query_bins_*
tables, as usual.I am not particularly expert in PHP / SQL webapp debugging, so if someone can please guide me how to get more information about the program flow, where to request debug messages, throw PHP exceptions and stack traces, or maybe even run the
dmitcat_track.php
under XDebug or another debugging tool, I would appreciate it.I hope I will be able the send a pull request once I have a better idea where to get more information what goes wrong.
The text was updated successfully, but these errors were encountered: