-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import csv downloaded with academictwitteR produces bin with empty usernames #439
Comments
Hello alejandrofeged, sorry for the delayed response. I am not familiar with the academictwitteR library, but I have a hunch on what your problem may be. In order for the import-auto-csv.php file to work correctly, you will need to modify to map the column names from your csv to the column names expected by TCAT. On this line of import-auto-csv.php the assumptions function which essentially converts the csv columns to the necessary columns for TCAT. You will want to go through your csv file and add lines to the assumptions function with each field. E.g.:
Let me know if that helps you any. |
Thank you so much! it did the trick! I was changing the names of the columns on the original file, but replacing their names into the lines you suggested worked like a charm. any clues as to how to import the dates correctly? The column has the same name. |
I'm glad that worked out for you! For the date, if you map the date column to |
it does help but I think it will require more tweaking, it is not importing the correct dates, but at least it is a progress. One more question: I have been manually creating the bins, if I import to a non created bin it does the process but it does not display in the interface. |
You’ll have to find the error message in the logs for me to help with that. It should create the bin with the second argument you provide (election2016). Though it will not create it if it already exists. |
Thanks for raising this issue and @dale-wahl for your answers. With the exception of urls, I've found I can import most fields from academictwitteR's Tidy output format with the following mappings:
Although a full csv export of the data imported into TCAT returns a populated urls column, running the Tweets Stats module on "url frequency" returns nothing. The populated urls in the full export are all in the native Twitter url format e.g. https://t.co/... and the "urls_expanded" and "urls_followed" columns are empty. I'd be grateful if anyone has insights into how to address this. Thanks |
Hello Laurie, The import script looks for columns named 'urls' or 'expanded_urls', which you can see here, and then attempts to parse them. The script is essentially looking for a comma deliminated list (e.g., http://someplace.com, https://someplace_else.com). If no 'urls' or 'expanded_urls' column is provided in your csv, the script will actually attempt to read the tweet text itself and parse out any urls there (which is what I think it may be doing in your case). Right here is where the url parsing starts. I think you will want to either rename your url list column to 'urls' or 'expanded_urls' and see what results you get after that. You could also add the column name used to this array here. I am not entirely sure if that will resolve your issue with the frequency module, but from the information provided, I want to make sure you are importing that data correctly. Let me know if that helps. |
Hi Dale My attempts to follow your suggestions have so far been unsuccessful – I detail them here in case useful for others. Mapping both ‘url’ and ‘expanded_url’ columns from academictwitteR to the ‘urls’ and ‘urls_expanded’ fields in TCAT results in double entries in the database and confuses the frequency module. Mapping one or other R columns to only the ‘urls’ field in TCAT results in identical entries for urls_expanded and urls_followed (e.g. urls = bit.ly/…, urls_expanded = bit.ly/…, urls_followed = bit.ly/…). As you point out, if no urls column is provided the script parses out the t.co/ url from the tweet text into 'urls' but leaves NULL values in 'urls_expanded' and 'urls_followed'. The only solution I can find that gives a meaningful frequency analysis (i.e. frequencies of the urls followed rather than t.co/ or bit.ly links) is to decode them in R (using the longurl library) and map the followed url to the TCAT field ‘urls’ …which is convoluted and creates the same problem of identical entries for urls_expanded and urls_followed. Urls aside, it’s helpful that most fields can be simply imported with above mapping. Thanks |
Hey Laurie, Could you point me to the dataset you are trying to import? I would like to test it myself. The identical entries for When you set up TCAT there is a setting to turn on a URL Expander. It is set to The |
Thank you Dale
The dataset was constructed using the same procedure outline by @alejandrofeged. The academictwitteR github is here.
I tried commenting out these lines and tested on two TCAT installations (both with URL Expander turned on) but it doesn’t seem to have worked. Looking through the urlexpand.log file I can’t see anything relating to the bins of imported data. The urls column is still t.co links and the expanded/followed urls are NULL. |
I have downloaded a dataset of tweets using the library academictwitteR in R. When I convert it to a csv file, and try to import to my DMI-TCAT machine, it does fine except usernames are empty.
I have posted a detailed question on StackOverflow.
The code used to download and prepare the data is:
I later run the php import function using the terminal in my multipass machine:
php import-auto-csv.php data-tweets.csv elections2016
where elections2016 is a previously created bin (I noticed if I don't create it in advance the bin does not show on the capture or analysis interfaces).
The dates are also set as 0000-00-00 for all tweets, but many other fields are imported correctly.
Any help is appreciated.
The text was updated successfully, but these errors were encountered: