twitter.cursor(twitter.search, since_id='some id', q='your_query') return duplicated results #325

Closed
opened 2014-05-19 16:54:06 -07:00 by weasteam · 5 comments
weasteam commented 2014-05-19 16:54:06 -07:00 (Migrated from github.com)

for result in twitter.cursor(twitter.search,q = "embomolmed.embopress.org",since_id='467233427826434048',result_type="mixed",count=100):
count += 1
print "----------------------"
print count
print result["id"]
print result["created_at"]
print result["text"]
if count % 100 == 0:
print count

The code display the tweet 468418418199494656 and 467233427826434048 endlessly.

However, the same code for the tweepy works, only one 468418418199494656 was displayed.

for result in twitter.cursor(twitter.search,q = "embomolmed.embopress.org",since_id='467233427826434048',result_type="mixed",count=100): count += 1 print "----------------------" print count print result["id"] print result["created_at"] print result["text"] if count % 100 == 0: print count The code display the tweet 468418418199494656 and 467233427826434048 endlessly. However, the same code for the tweepy works, only one 468418418199494656 was displayed.
s0ckz commented 2014-06-02 16:04:15 -07:00 (Migrated from github.com)

Take a look if this solves your problem: https://github.com/ryanmcgrath/twython/pull/332

Take a look if this solves your problem: https://github.com/ryanmcgrath/twython/pull/332
jshorish commented 2014-11-27 02:29:48 -08:00 (Migrated from github.com)

I've had the same problem with twitter.cursor(twitter.search, return_pages=True, **params) returning the same page of results with each loop of the cursor.

I found that in twython/api.py for the cursor function, the "max_id" parameter is not being set--rather, the "since_id" parameter alone is being set, which does not allow the cursor to retrieve earlier pages.

I modified the hasattr(function, 'iter_metadata') if clause with the following (api.py line 502 +):

if hasattr(function, 'iter_metadata'):
    params['max_id'] = (int(results[-1]['id_str']) - 1)
    params['since_id'] = int(content[function.iter_metadata].get('since_id_str'))
else:
    since_id = content[0]['id_str']
    params['since_id'] = (int(since_id) - 1)

Note that the part of the block which affects twitter.search is when hasattr(...) evaluates to True--so I didn't modify the content of the else clause as that might be correct for non-search functions(?).

In addition I removed the if 'max_id' not in params: (api.py line 499) clause, so that the max_id parameter could be set as above--otherwise the generator would skip this after the second yield. This is a kludge since I don't create the cursor with the "max_id" parameter in "params"--indeed the Twitter dev docs recommend that for the first page return, only the 'count' param should be passed. But a full patch would naturally test for the max_id param being passed by the user to twitter.cursor.

I've had the same problem with `twitter.cursor(twitter.search, return_pages=True, **params)` returning the same page of results with each loop of the cursor. I found that in twython/api.py for the cursor function, the "max_id" parameter is not being set--rather, the "since_id" parameter alone is being set, which does not allow the cursor to retrieve earlier pages. I modified the `hasattr(function, 'iter_metadata')` if clause with the following (api.py line 502 +): ``` python if hasattr(function, 'iter_metadata'): params['max_id'] = (int(results[-1]['id_str']) - 1) params['since_id'] = int(content[function.iter_metadata].get('since_id_str')) else: since_id = content[0]['id_str'] params['since_id'] = (int(since_id) - 1) ``` Note that the part of the block which affects twitter.search is when `hasattr(...)` evaluates to `True`--so I didn't modify the content of the else clause as that might be correct for non-search functions(?). In addition I removed the `if 'max_id' not in params:` (api.py line 499) clause, so that the max_id parameter could be set as above--otherwise the generator would skip this after the second `yield`. This is a kludge since I don't create the cursor with the "max_id" parameter in "params"--indeed the Twitter dev docs [recommend](https://dev.twitter.com/rest/public/timelines) that for the first page return, only the 'count' param should be passed. But a full patch would naturally test for the max_id param being passed by the user to `twitter.cursor`.
jpadilla commented 2014-12-05 07:00:11 -08:00 (Migrated from github.com)

Also running into this issue.

Also running into this issue.
JordanRickmanUCI commented 2016-08-11 19:08:17 -07:00 (Migrated from github.com)

I had the same problem, and tracked it down to the usage of since_id instead of max_id in the cursor method, as @jam1123 described. I didn't want to deal with trying to fix the cursor method, so I'm just paginating manually using the search method and passing max_id as a parameter.

I had the same problem, and tracked it down to the usage of `since_id` instead of `max_id` in the `cursor` method, as @jam1123 described. I didn't want to deal with trying to fix the cursor method, so I'm just paginating manually using the `search` method and passing `max_id` as a parameter.
reutsharabani commented 2017-09-17 04:10:49 -07:00 (Migrated from github.com)

Ran into the same problem (using since_id)

Ran into the same problem (using `since_id`)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: code/twython#325
No description provided.