Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twitter scrapper error #367

Open
mahajnay opened this issue Nov 1, 2021 · 7 comments
Open

twitter scrapper error #367

mahajnay opened this issue Nov 1, 2021 · 7 comments

Comments

@mahajnay
Copy link

mahajnay commented Nov 1, 2021

Hi all,

While using twitter scrapper,

I have this code

from twitterscraper import query_tweets
import datetime as dt
import pandas as pd

begin_date = dt.date(2020,3,1)
end_date = dt.date(2021,11,1)

limit = 100
lang = 'english'

tweets = query_tweets('vaccinesideeffects', begindate = begin_date, enddate = end_date, limit = limit, lang = lang)
df = pd.DataFrame(t.dict for t in tweets)

df = df['text']

df

Getting below error


AttributeError Traceback (most recent call last)
in
----> 1 from twitterscraper import query_tweets
2 import datetime as dt
3 import pandas as pd
4
5 begin_date = dt.date(2020,3,1)

~/opt/anaconda3/lib/python3.8/site-packages/twitterscraper/init.py in
11
12
---> 13 from twitterscraper.query import query_tweets
14 from twitterscraper.query import query_tweets_from_user
15 from twitterscraper.query import query_user_info

~/opt/anaconda3/lib/python3.8/site-packages/twitterscraper/query.py in
74 yield start + h * i
75
---> 76 proxies = get_proxies()
77 proxy_pool = cycle(proxies)
78

~/opt/anaconda3/lib/python3.8/site-packages/twitterscraper/query.py in get_proxies()
47 soup = BeautifulSoup(response.text, 'lxml')
48 table = soup.find('table',id='proxylisttable')
---> 49 list_tr = table.find_all('tr')
50 list_td = [elem.find_all('td') for elem in list_tr]
51 list_td = list(filter(None, list_td))

AttributeError: 'NoneType' object has no attribute 'find_all'

@Suizer
Copy link

Suizer commented Nov 2, 2021

Same for me

@expl0r3rgu1
Copy link

Same issue here

@KamilsobC
Copy link

KamilsobC commented Nov 12, 2021

It tries to grab table from https://free-proxy-list.net with id ='proxylisttable' but it doesnt exist.
You need to remove it from line 48 :
table = soup.find('table',id='proxylisttable')
to
table = soup.find('table')

@barniker
Copy link

barniker commented Nov 30, 2021

fixed this error using Pandas:

    import pandas as pd 
    ...
    def get_proxies():    
    resp = requests.get(PROXY_URL)
    df = pd.read_html(resp.text)[0]
    list_ip=list(df['IP Address'].values)
    list_ports=list(df['Port'].values.astype(str))
    list_proxies = [':'.join(elem) for elem in list(zip(list_ip, list_ports))]

however, this still does not work.

list_of_tweets = query_tweets("Trump OR Clinton", 10) returns:

Exception: Traceback (most recent call last):
  File "/Users/rmartin/Desktop/Envs/crypto_env/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV) Job: 0.

@AdrienMau
Copy link

Same error here on python 3.9

@NafiGit
Copy link

NafiGit commented May 21, 2022

It tries to grab table from https://free-proxy-list.net with id ='proxylisttable' but it doesnt exist. You need to remove it from line 48 : table = soup.find('table',id='proxylisttable') to table = soup.find('table')

thanks, it solved my problem

@vedanta28
Copy link

@NafiGit How did you edit the code in their repository?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants