Streaming is one post behind #202

Closed
opened 2013-05-27 01:56:33 -07:00 by grantstephens · 14 comments
grantstephens commented 2013-05-27 01:56:33 -07:00 (Migrated from github.com)

Good Morning
I am trying to stream tweets that are aimed at a user, so the line looks like
stream.statuses.filter(track="@RexFuzzle")
However I see that in order for it to update and print the output I need to tweet to the account twice and on the second tweet is prints the first one, on the third tweet is prints the second one... etc. Is this standard for streaming?

Good Morning I am trying to stream tweets that are aimed at a user, so the line looks like stream.statuses.filter(track="@RexFuzzle") However I see that in order for it to update and print the output I need to tweet to the account twice and on the second tweet is prints the first one, on the third tweet is prints the second one... etc. Is this standard for streaming?
michaelhelmick commented 2013-05-27 17:33:51 -07:00 (Migrated from github.com)

Hrm, sounds peculiar. I'll check this out tomorrow morning!!

Hrm, sounds peculiar. I'll check this out tomorrow morning!!
michaelhelmick commented 2013-05-28 13:18:51 -07:00 (Migrated from github.com)

@RexFuzzle This issue is kind of sort of an issue in requests. I've been in IRC all day and we're trying to figure out a solution!

@RexFuzzle This issue is kind of sort of an issue in `requests`. I've been in IRC all day and we're trying to figure out a solution!
bbirand commented 2013-06-06 08:24:41 -07:00 (Migrated from github.com)

I also had this problem, and asked it on the twitter forums. They suggested that it would be because of the unflushed read buffer.

https://dev.twitter.com/discussions/18292

I also had this problem, and asked it on the twitter forums. They suggested that it would be because of the unflushed read buffer. https://dev.twitter.com/discussions/18292
michaelhelmick commented 2013-06-06 08:36:34 -07:00 (Migrated from github.com)

Sorry for not updating you guys. It's because the requests library we are using is blocking data being sent to Twython until the buffer receives enough data. Twitter is sending empty lines and urllib3 is telling requests there is nothing there, when really there is. (If I remember correctly)

See: https://github.com/shazow/urllib3/issues/186

Sorry for not updating you guys. It's because the `requests` library we are using is blocking data being sent to Twython until the buffer receives enough data. Twitter is sending empty lines and `urllib3` is telling `requests` there is nothing there, when really there is. (If I remember correctly) See: https://github.com/shazow/urllib3/issues/186
cyroxx commented 2013-06-12 11:34:09 -07:00 (Migrated from github.com)

Just ran into the same problem, which is bad for my current use case as I really need live data. Let's hope this will get fixed soon :)

Just ran into the same problem, which is bad for my current use case as I really need live data. Let's hope this will get fixed soon :)
michaelhelmick commented 2013-06-18 11:03:26 -07:00 (Migrated from github.com)

@RexFuzzle @cyroxx @bbirand This will be fixed in Twython 3.0.1. As soon as https://github.com/kennethreitz/requests/pull/1425 is merged into requests I'll update the dependencies and I'll ship it! :shipit: :)

@RexFuzzle @cyroxx @bbirand This will be fixed in Twython 3.0.1. As soon as https://github.com/kennethreitz/requests/pull/1425 is merged into `requests` I'll update the dependencies and I'll ship it! :shipit: :)
jpanganiban commented 2013-06-23 11:20:35 -07:00 (Migrated from github.com)

Yay! Just in time! 👍

Yay! Just in time! :+1:
jpanganiban commented 2013-06-23 13:43:45 -07:00 (Migrated from github.com)

Just submitted a pull-request. It seems that the default iter_lines chunk_size (being 512) might have caused this. Some tweet objects being sent by the streaming api are less than 512 bytes long.

Just submitted a pull-request. It seems that the default iter_lines chunk_size (being 512) might have caused this. Some tweet objects being sent by the streaming api are less than 512 bytes long.
michaelhelmick commented 2013-06-23 14:07:46 -07:00 (Migrated from github.com)

It needs to be 1, but even then with the current release of requests reading that small chunk of data is not possible. We need to wait for the next release of requests

Sent from my iPhone

On Jun 23, 2013, at 4:43 PM, Jesse Panganiban notifications@github.com wrote:

Just submitted a pull-request. It seems that the default iter_lines chunk_size (being 512) might have caused this. Some tweet objects being sent by the streaming api are less than 512 bytes long.


Reply to this email directly or view it on GitHub.

It needs to be 1, but even then with the current release of requests reading that small chunk of data is not possible. We need to wait for the next release of requests Sent from my iPhone > On Jun 23, 2013, at 4:43 PM, Jesse Panganiban notifications@github.com wrote: > > Just submitted a pull-request. It seems that the default iter_lines chunk_size (being 512) might have caused this. Some tweet objects being sent by the streaming api are less than 512 bytes long. > > — > Reply to this email directly or view it on GitHub.
jpanganiban commented 2013-06-23 23:01:19 -07:00 (Migrated from github.com)

@michaelhelmick Just tested with the currently released requests. Had it working. 😃

EDIT: Also tested with 1.2.2 and it's also working. 😃

@michaelhelmick Just tested with the currently released requests. Had it working. :smiley: EDIT: Also tested with 1.2.2 and it's also working. :smiley:
michaelhelmick commented 2013-06-24 08:30:09 -07:00 (Migrated from github.com)

@jpanganiban Alright, good stuff! I'll have to talk to @Lukasa to why we didn't figure this out before haha. I'll check it out a little bit later today. Keep your eyes out for a new release tonight or tomorrow! :)

@jpanganiban Alright, good stuff! I'll have to talk to @Lukasa to why we didn't figure this out before haha. I'll check it out a little bit later today. Keep your eyes out for a new release tonight or tomorrow! :)
Lukasa commented 2013-06-24 08:47:25 -07:00 (Migrated from github.com)

Much like the devil, when you speak my name I appear. =D

The problem we had was the following code (found here for context):

while 1:
    chunk = self.raw.read(chunk_size, decode_content=True)
    if not chunk:
        break
    yield chunk
self._content_consumed = True

For those of you not au fait with urllib3, decode_content transparently decodes gzipped or deflated data. This is in principle awesome. However, the argument to .read() defines the number of bytes to be read off the wire, not the number of decompressed bytes.

With very small quantities of data, it's possible that you won't actually read enough of the wire initially to actually decompress anything. For an example of this, take a look at the urllib3 tests, particularly the gzip one here. This tests has to read 11 bytes off the wire before getting a non-empty string.

Requests will read small amounts of data off the wire and, if the string is empty, will assume that it has exhausted the data and break the generator. This is wrong.

To get guaranteed correct streaming behaviour here, the chunk size must be 1 byte, because the call into urllib3 (both .stream() and .read()) will block until they've read at least that much data. If you request, for example, 10 bytes, but Twitter returns a 5 byte tweet to you, you won't see it until the next one comes in because you've only got 5 bytes: not enough to return.

This meant we had an awkward pair of situations: 1-byte reads were the only acceptable size, but they didn't work properly for gzipped data. Hence my changes to urllib3 and Requests.

As for the feature working in current Requests versions (e.g. without the .stream() change), I'm doubtful that it works properly. I haven't seen any changes in the library that would make it work properly aside from my own. It would be interesting to see a demonstration where this works, without ever falling behind, for a long period of time.

Much like the devil, when you speak my name I appear. =D The problem we had was the following code (found [here](https://github.com/kennethreitz/requests/blob/v1.2.3/requests/models.py#L542) for context): ``` python while 1: chunk = self.raw.read(chunk_size, decode_content=True) if not chunk: break yield chunk self._content_consumed = True ``` For those of you not _au fait_ with urllib3, `decode_content` transparently decodes gzipped or deflated data. This is in principle awesome. However, the argument to `.read()` defines the number of bytes to be read off the wire, not the number of decompressed bytes. With very small quantities of data, it's possible that you won't actually read enough of the wire initially to actually decompress anything. For an example of this, take a look at the urllib3 tests, particularly the gzip one [here](https://github.com/shazow/urllib3/blob/1.6/test/test_response.py#L111). This tests has to read 11 bytes off the wire before getting a non-empty string. Requests will read small amounts of data off the wire and, if the string is empty, will assume that it has exhausted the data and break the generator. This is wrong. To get guaranteed correct streaming behaviour here, the chunk size _must_ be 1 byte, because the call into urllib3 (both `.stream()` and `.read()`) will block until they've read at least that much data. If you request, for example, 10 bytes, but Twitter returns a 5 byte tweet to you, you won't see it until the next one comes in because you've only got 5 bytes: not enough to return. This meant we had an awkward pair of situations: 1-byte reads were the only acceptable size, but they didn't work properly for gzipped data. Hence my changes to urllib3 and Requests. As for the feature working in current Requests versions (e.g. without the `.stream()` change), I'm doubtful that it works properly. I haven't seen any changes in the library that would make it work properly aside from my own. It would be interesting to see a demonstration where this works, without ever falling behind, for a long period of time.
michaelhelmick commented 2013-09-25 09:59:29 -07:00 (Migrated from github.com)

This will be fixed with a 3.1.0 release today guys! Been waiting on requests 2.0 to come out! Sorry for the wait! :)

This will be fixed with a `3.1.0` release today guys! Been waiting on `requests` 2.0 to come out! Sorry for the wait! :)
michaelhelmick commented 2013-09-25 15:50:41 -07:00 (Migrated from github.com)

FYI: Twython 3.1.0 is out on PyPi so go grab it! :)

FYI: Twython 3.1.0 is out on PyPi so go grab it! :)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: code/twython#202
No description provided.