Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on preprocessing cresci_15 for BotRGCN #23

Open
von1000 opened this issue Feb 3, 2023 · 5 comments
Open

Issue on preprocessing cresci_15 for BotRGCN #23

von1000 opened this issue Feb 3, 2023 · 5 comments

Comments

@von1000
Copy link

von1000 commented Feb 3, 2023

Thank you for sharing your code!
I'm reproducing BotRGCN on dataset cresci_15 using your code. However, after running "preprocess_1.py", in "each_user_tweets.npy", user_tweets for each user are empty. I found this is because all 'source_id' lead to KeyError in line 215. How can I fix this bug? Thank you!

@leopoldwhite
Copy link
Member

Thank you for sharing your code! I'm reproducing BotRGCN on dataset cresci_15 using your code. However, after running "preprocess_1.py", in "each_user_tweets.npy", user_tweets for each user are empty. I found this is because all 'source_id' lead to KeyError in line 215. How can I fix this bug? Thank you!

Hi, thank you for your interest in our work! "preprocess_1.py" is reuploaded and this bug is fixed now. Sorry for the inconvenience.

dict={i:[] for i in range(len(user))}
for i in tqdm(range(len(user))):
dict[edge.iloc[i]['source_id']].append(tweet['text'][edge.iloc[i]['target_id']+len(user)])

@von1000
Copy link
Author

von1000 commented Feb 13, 2023

Thank you for your quick response!
After running the updated preprocess code, I'm still a little confused about the generated "each_user_tweets". In this file, only the first user has corresponding tweets, the tweets for other users are all empty. The picture below is part of the "each_user_tweets". Am I doing something wrong?
Screen Shot 2023-02-13 at 9 59 02 AM

@BunsenFeng
Copy link
Contributor

@leopoldwhite

@leopoldwhite
Copy link
Member

Sorry for the inconvenience. The code to get all users' tweets should be:
image

However, to save excution time and storage, the updated code only save the index of each tweet (mapped target_id) into each_user_tweets.npy,

for i in tqdm(range(len(edge))):
dict[edge.iloc[i]['source_id']].append(edge.iloc[i]['target_id'])

which can be easily used in preprocess_2.py.

@znagzanglong
Copy link

When I tried to train BotRGCN, I encountered a problem. I didn’t know what description.npy was.
Loading labels... Finished
Traceback (most recent call last):
File "C:\kust\xuesu\code\TwiBot-22-master\TwiBot-22-master\src\BotRGCN\cresci_15\train.py", line 24, in
des_tensor,tweets_tensor,num_prop,category_prop,edge_index,edge_type,labels,train_idx,val_idx,test_idx=dataset.dataloader()
File "C:\kust\xuesu\code\TwiBot-22-master\TwiBot-22-master\src\BotRGCN\cresci_15\Dataset.py", line 344, in dataloader
des_tensor=self.Des_embbeding()
File "C:\kust\xuesu\code\TwiBot-22-master\TwiBot-22-master\src\BotRGCN\cresci_15\Dataset.py", line 72, in Des_embbeding
description=np.load(self.root+'description.npy',allow_pickle=True)
File "C:\Users\Administrator\anaconda3\envs\TwiBot-22\lib\site-packages\numpy\lib\npyio.py", line 427, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: './processed_data/description.npy'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants