Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using twarc-network programatically #11

Open
DavidHernandez opened this issue Aug 24, 2022 · 3 comments
Open

Using twarc-network programatically #11

DavidHernandez opened this issue Aug 24, 2022 · 3 comments

Comments

@DavidHernandez
Copy link

Hi,

I am using twarc from a custom code I implemented. I use it to extract tweets and store them into a database. Now, I would like to use some of those tweets to create a network graph using twarc-network. Instead of re-extracting the tweets from Twitter or reverse engineering the jsonl format to export my data to that format, is it possible to easily use twarc-network programmatically to be able to pass the tweets directly from my DB and generate the files?

Thanks,
David.

@igorbrigadir
Copy link

It's a bit awkward:

It's always advisable to keep the original json files twarc produces when retrieving - this way you can always run anything on top of those, but if you don't have those anymore, or you have some specific selection happening through your database, you can also use twarc-network manually too, by calling https://github.com/DocNow/twarc-network/blob/main/twarc_network/__init__.py#L83 directly (infile doesn't have to be a file, it can be an iterable) it expects to iterate over lines of text though, so you have to pass it strings it can parse as tweets.

It doesn't have to be exactly the same as the original json, but it has to resemble it - a tweet still has to have an author dict with a username in it for example. So the process could be: Retrieve rows from your DB, construct a dict that resembles the original json, turn it into a string and pass it to the function.

This is a bit awkward - so if you no longer have original json, and you only have DB records in some custom format, it's easier to rewrite your own https://github.com/DocNow/twarc-network/blob/main/twarc_network/__init__.py#L83 function so it runs through your rows correctly - that way everything else will work along with that.

@DavidHernandez
Copy link
Author

It should not be a big problem, because even if I store my own version of the structure of the tweets I retrieve, I also store the original output of the twitter API just in case I needed it. And this seems to be the case. I will give it a try. If I find an easy way to do it, I will open a pull request to the repo to add instructions of how to use it programmatically.

@edsu
Copy link
Member

edsu commented Aug 27, 2022

@DavidHernandez if you can come up with a way to make twarc-network more useful programmatically while preserving the existing behavior that would be most welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants