Issue on preprocessing cresci_15 for BotRGCN #23

von1000 · 2023-02-03T07:29:30Z

Thank you for sharing your code!
I'm reproducing BotRGCN on dataset cresci_15 using your code. However, after running "preprocess_1.py", in "each_user_tweets.npy", user_tweets for each user are empty. I found this is because all 'source_id' lead to KeyError in line 215. How can I fix this bug? Thank you!

leopoldwhite · 2023-02-03T10:38:41Z

Thank you for sharing your code! I'm reproducing BotRGCN on dataset cresci_15 using your code. However, after running "preprocess_1.py", in "each_user_tweets.npy", user_tweets for each user are empty. I found this is because all 'source_id' lead to KeyError in line 215. How can I fix this bug? Thank you!

Hi, thank you for your interest in our work! "preprocess_1.py" is reuploaded and this bug is fixed now. Sorry for the inconvenience.

TwiBot-22/src/BotRGCN/cresci_15/preprocess_1.py

Lines 210 to 212 in f2f4687

    
           dict={i:[] for i in range(len(user))} 
        
           for i in tqdm(range(len(user))): 
        
               dict[edge.iloc[i]['source_id']].append(tweet['text'][edge.iloc[i]['target_id']+len(user)])

von1000 · 2023-02-13T02:06:27Z

Thank you for your quick response!
After running the updated preprocess code, I'm still a little confused about the generated "each_user_tweets". In this file, only the first user has corresponding tweets, the tweets for other users are all empty. The picture below is part of the "each_user_tweets". Am I doing something wrong?

BunsenFeng · 2023-02-13T04:01:49Z

@leopoldwhite

leopoldwhite · 2023-02-13T06:07:43Z

Sorry for the inconvenience. The code to get all users' tweets should be:

However, to save excution time and storage, the updated code only save the index of each tweet (mapped target_id) into each_user_tweets.npy,

TwiBot-22/src/BotRGCN/cresci_15/preprocess_1.py

Lines 211 to 212 in b2b9ee8

    
           for i in tqdm(range(len(edge))): 
        
               dict[edge.iloc[i]['source_id']].append(edge.iloc[i]['target_id'])

which can be easily used in preprocess_2.py.

znagzanglong · 2023-09-25T09:52:48Z

When I tried to train BotRGCN, I encountered a problem. I didn’t know what description.npy was.
Loading labels... Finished
Traceback (most recent call last):
File "C:\kust\xuesu\code\TwiBot-22-master\TwiBot-22-master\src\BotRGCN\cresci_15\train.py", line 24, in
des_tensor,tweets_tensor,num_prop,category_prop,edge_index,edge_type,labels,train_idx,val_idx,test_idx=dataset.dataloader()
File "C:\kust\xuesu\code\TwiBot-22-master\TwiBot-22-master\src\BotRGCN\cresci_15\Dataset.py", line 344, in dataloader
des_tensor=self.Des_embbeding()
File "C:\kust\xuesu\code\TwiBot-22-master\TwiBot-22-master\src\BotRGCN\cresci_15\Dataset.py", line 72, in Des_embbeding
description=np.load(self.root+'description.npy',allow_pickle=True)
File "C:\Users\Administrator\anaconda3\envs\TwiBot-22\lib\site-packages\numpy\lib\npyio.py", line 427, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: './processed_data/description.npy'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on preprocessing cresci_15 for BotRGCN #23

Issue on preprocessing cresci_15 for BotRGCN #23

von1000 commented Feb 3, 2023

leopoldwhite commented Feb 3, 2023

von1000 commented Feb 13, 2023

BunsenFeng commented Feb 13, 2023

leopoldwhite commented Feb 13, 2023

znagzanglong commented Sep 25, 2023

Issue on preprocessing cresci_15 for BotRGCN #23

Issue on preprocessing cresci_15 for BotRGCN #23

Comments

von1000 commented Feb 3, 2023

leopoldwhite commented Feb 3, 2023

von1000 commented Feb 13, 2023

BunsenFeng commented Feb 13, 2023

leopoldwhite commented Feb 13, 2023

znagzanglong commented Sep 25, 2023