Proposed feature: handle Tweet entities #224

Closed
opened 2013-06-19 10:49:16 -07:00 by trevoriancox · 10 comments
trevoriancox commented 2013-06-19 10:49:16 -07:00 (Migrated from github.com)

Twython does a great job of allowing you to access the data from Twitter. Would it be appropriate to take it a step further and add a bit of processing of Tweet data?

In particular, how about a function for obtaining Tweet text with the entities substituted in? (https://dev.twitter.com/docs/tweet-entities).

Note that URL's in Tweet text are always t.co links, but this is not what is shown by Twitter clients. Instead there is a display_url and the expanded_url. It would be nice to have the library access the text as HTML with all of this taken care of.

Re-Tweets may have a URL at the end of the text truncated, e.g. "http://t.co/abc...", so if you don't do the entity substitution you could be displaying a broken link.

Making things more complex, for a re-Tweet, entities.urls is empty, and you have to refer to retweeted_status.entities.urls.

Twython does a great job of allowing you to access the data from Twitter. Would it be appropriate to take it a step further and add a bit of processing of Tweet data? In particular, how about a function for obtaining Tweet text with the entities substituted in? (https://dev.twitter.com/docs/tweet-entities). Note that URL's in Tweet text are always t.co links, but this is not what is shown by Twitter clients. Instead there is a display_url and the expanded_url. It would be nice to have the library access the text as HTML with all of this taken care of. Re-Tweets may have a URL at the end of the text truncated, e.g. "http://t.co/abc...", so if you don't do the entity substitution you could be displaying a broken link. Making things more complex, for a re-Tweet, entities.urls is empty, and you have to refer to retweeted_status.entities.urls.
trevoriancox commented 2013-06-19 10:54:25 -07:00 (Migrated from github.com)

Here's a start at how to deal with some of those issues without changes to Twython:

            if 'retweeted_status' in tweet:
                tweet = tweet['retweeted_status']
            txt = tweet["text"]
            if 'entities' in tweet:
                if 'urls' in tweet['entities'] and len(tweet['entities']['urls']) > 0:
                    # only replace one; it would be tricky to do more.
                   url = tweet['entities']['urls'][-1]
                   txt = txt[:url['indices'][0]] + url['expanded_url'] + txt[url['indices'][1]:]
Here's a start at how to deal with some of those issues without changes to Twython: ``` python if 'retweeted_status' in tweet: tweet = tweet['retweeted_status'] txt = tweet["text"] if 'entities' in tweet: if 'urls' in tweet['entities'] and len(tweet['entities']['urls']) > 0: # only replace one; it would be tricky to do more. url = tweet['entities']['urls'][-1] txt = txt[:url['indices'][0]] + url['expanded_url'] + txt[url['indices'][1]:] ```
michaelhelmick commented 2013-06-19 11:00:17 -07:00 (Migrated from github.com)

It may be a nice function, but there is so much stuff could be customized.

My understanding is you want the returned tweet to look like this:
Hey, this is a tweet with a link in it, check it out <a href="http://t.co/fasf">http://t.co/fasf</a>
Check out <a href="http://t.co/fasf">@mikehelmick</a> on Twitter

or something like that?

What if people want to add classes to each <a> specifically, or add other attributes like (onClick or id or target)
I feel it may be best to keep it up to the user to find and replace media entities

It may be a nice function, but there is so much stuff could be customized. My understanding is you want the returned tweet to look like this: `Hey, this is a tweet with a link in it, check it out <a href="http://t.co/fasf">http://t.co/fasf</a>` `Check out <a href="http://t.co/fasf">@mikehelmick</a> on Twitter` or something like that? What if people want to add classes to each `<a>` specifically, or add other attributes like (`onClick` or `id` or `target`) I feel it may be best to keep it up to the user to find and replace media entities
trevoriancox commented 2013-06-19 11:16:41 -07:00 (Migrated from github.com)

In my particular case, I was running Tweet.text through Django's urlize, which gave me broken links in some cases, and didn't show the display_url. So my argument is:

  1. Many users of Twython don't know much about Twitter and just want a simple way to get Tweets.
  2. Simply accessing tweet.text does not actually get the text that users see in other Twitter clients (e.g. display_url), and it may include broken link text.

Therefore, how about a function for the basic case of dealing with entities? Users who want more customization can code it themselves. (Or add onClick etc using DOM manipulation as I do.)

In my particular case, I was running Tweet.text through Django's urlize, which gave me broken links in some cases, and didn't show the display_url. So my argument is: 1) Many users of Twython don't know much about Twitter and just want a simple way to get Tweets. 2) Simply accessing tweet.text does not actually get the text that users see in other Twitter clients (e.g. display_url), and it may include broken link text. Therefore, how about a function for the basic case of dealing with entities? Users who want more customization can code it themselves. (Or add onClick etc using DOM manipulation as I do.)
ryanmcgrath commented 2013-06-19 11:24:58 -07:00 (Migrated from github.com)

I think Mike raises a valid point, but I can also see the use case. I wouldn't be against a convenience function to do this, provided it's essentially a "this is what you get, if you need more you're on your own" type of deal.

e.g, a @staticmethod Twython.html_for_tweet(tweet).

Not saying let's implement this, moreso just weighing in. I could go either way on this one.

I think Mike raises a valid point, but I can also see the use case. I wouldn't be against a convenience function to do this, provided it's essentially a "this is what you get, if you need more you're on your own" type of deal. e.g, a `@staticmethod` `Twython.html_for_tweet(tweet)`. Not saying let's implement this, moreso just weighing in. I could go either way on this one.
michaelhelmick commented 2013-06-19 11:25:28 -07:00 (Migrated from github.com)

^^^^ I was just going to say that haha

^^^^ I was just going to say that haha
michaelhelmick commented 2013-06-19 11:29:48 -07:00 (Migrated from github.com)

Hrm, I see your point. Maybe after they get the timeline.

from twython import Twython

t = Twython(...)
tweets = t.get_home_timeline()

for tweet in tweets:
    if 'retweeted_status' in tweet:
        tweet = tweet['retweeted_status']

    html_for_tweet = Twython.html_for_tweet(tweet)

and Twython.html_for_tweet(tweet) would look something like:

def html_for_tweet(tweet):
    if 'entities' in tweet:
        # regex find and replace for grabbing multiple urls, mentions, hashtags here
    return tweet
Hrm, I see your point. Maybe after they get the timeline. ``` python from twython import Twython t = Twython(...) tweets = t.get_home_timeline() for tweet in tweets: if 'retweeted_status' in tweet: tweet = tweet['retweeted_status'] html_for_tweet = Twython.html_for_tweet(tweet) ``` and `Twython.html_for_tweet(tweet)` would look something like: ``` python def html_for_tweet(tweet): if 'entities' in tweet: # regex find and replace for grabbing multiple urls, mentions, hashtags here return tweet ```
trevoriancox commented 2013-06-19 11:30:59 -07:00 (Migrated from github.com)

Thanks for the discussion. It is not trivial... there's a learning curve here for someone who is not a Twitter expert and just looking for basic functionality. I'm sure many others have missed cases like handling truncated retweets. If we don't add it to Twython, it could end up being another library project.

Thanks for the discussion. It is not trivial... there's a learning curve here for someone who is not a Twitter expert and just looking for basic functionality. I'm sure many others have missed cases like handling truncated retweets. If we don't add it to Twython, it could end up being another library project.
michaelhelmick commented 2013-06-19 11:41:15 -07:00 (Migrated from github.com)

It's definitely something I'd like in Twython! I'm a bit busy this week. I'll try to mock something up Saturday! :) Thanks for the suggestion!

It's definitely something I'd like in Twython! I'm a bit busy this week. I'll try to mock something up Saturday! :) Thanks for the suggestion!
trevoriancox commented 2013-06-19 11:47:15 -07:00 (Migrated from github.com)

Awesome! :)

ps. This hack of mine loses the "RT " at the beginning of the tweet. I don't like the idea of just prepending a hardcoded string back in (what if the Twitter convention changes?). But you have to work with the retweet's original text since that's what the 'indices' refer to.

if 'retweeted_status' in tweet:
            tweet = tweet['retweeted_status']

Sorry, just trying to make sure this uses up your whole Saturday. :)

Awesome! :) ps. This hack of mine loses the "RT " at the beginning of the tweet. I don't like the idea of just prepending a hardcoded string back in (what if the Twitter convention changes?). But you have to work with the retweet's original text since that's what the 'indices' refer to. ``` if 'retweeted_status' in tweet: tweet = tweet['retweeted_status'] ``` Sorry, just trying to make sure this uses up your whole Saturday. :)
trevoriancox commented 2013-06-19 14:01:43 -07:00 (Migrated from github.com)

Just checked my Django Mezzanine site's Twitter feed: there must be Python code for this in that project.

Just checked my Django Mezzanine site's Twitter feed: there must be Python code for this in that project.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: code/twython#224
No description provided.