html_for_tweet double expanding links #447

Closed
opened 2017-07-05 07:13:01 -07:00 by danielsamuels · 1 comment
danielsamuels commented 2017-07-05 07:13:01 -07:00 (Migrated from github.com)

Tweet URL: https://twitter.com/CCS_multipoint/status/882520917007532033

Repro:

from twython import Twython

twitter = Twython(TWITTER_API_KEY, TWITTER_API_SECRET)

s = twitter.show_status(id='882520917007532033')
print Twython.html_for_tweet(s)

Output:

Use Cases, Trials and Making 5G a Reality <a href="<a href="https://t.co/W0uArTMk9N" class="twython-url">buff.ly/2sEhrgO</a>" class="twython-url">buff.ly/2sEhrgO</a> <a href="https://twitter.com/search?q=%235G" class="twython-hashtag">#5G</a> <a href="https://twitter.com/search?q=%23innovation" class="twython-hashtag">#innovation</a> via <a href="https://twitter.com/5GWorldSeries" class="twython-mention">@5GWorldSeries</a> <a href="<a href="https://t.co/W0uArTMk9N" class="twython-url">buff.ly/2sEhrgO</a>" class="twython-url">buff.ly/2sEhrgO</a>

You can see that all of the base level twython-url anchors have their hrefs being another <a class="twython-url", which is obviously not valid. I have a feeling that this is due to the t.co being replaced with a buff.ly, and perhaps that being replaced again?

Tweet URL: https://twitter.com/CCS_multipoint/status/882520917007532033 Repro: ```python from twython import Twython twitter = Twython(TWITTER_API_KEY, TWITTER_API_SECRET) s = twitter.show_status(id='882520917007532033') print Twython.html_for_tweet(s) ``` Output: ```html Use Cases, Trials and Making 5G a Reality <a href="<a href="https://t.co/W0uArTMk9N" class="twython-url">buff.ly/2sEhrgO</a>" class="twython-url">buff.ly/2sEhrgO</a> <a href="https://twitter.com/search?q=%235G" class="twython-hashtag">#5G</a> <a href="https://twitter.com/search?q=%23innovation" class="twython-hashtag">#innovation</a> via <a href="https://twitter.com/5GWorldSeries" class="twython-mention">@5GWorldSeries</a> <a href="<a href="https://t.co/W0uArTMk9N" class="twython-url">buff.ly/2sEhrgO</a>" class="twython-url">buff.ly/2sEhrgO</a> ``` You can see that all of the base level `twython-url` anchors have their hrefs being another `<a class="twython-url"`, which is obviously not valid. I have a feeling that this is due to the t.co being replaced with a buff.ly, and perhaps that being replaced again?
lewiscollard commented 2017-07-06 04:15:43 -07:00 (Migrated from github.com)

Here's a minimal reproduction case based on that tweet. It appears to happen when a URL entity shares a url attribute with another URL entity.

from twython import Twython

twitter = Twython()

tweet = {
    'entities': {
        'hashtags': [],
        'user_mentions': [],
        'symbols': [],
        'urls': [
            {
                'display_url': 'buff.ly/2sEhrgO',
                'expanded_url': 'http://buff.ly/2sEhrgO',
                'indices': [42, 65],
                'url': 'https://t.co/W0uArTMk9N',
            },
            {
                'display_url': 'buff.ly/2sEhrgO',
                'expanded_url': 'http://buff.ly/2sEhrgO',
                'indices': [101, 124],
                'url': 'https://t.co/W0uArTMk9N',
            }
        ],
    },
    'full_text': 'Use Cases, Trials and Making 5G a Reality https://t.co/W0uArTMk9N #5G #innovation via @5GWorldSeries https://t.co/W0uArTMk9N',
}

# Checkpoint 1:
print(twitter.html_for_tweet(tweet))

# Change the full URL for the first entity.
tweet['entities']['urls'][0]['url'] = 'https://twython.readthedocs.io/'

# Checkpoint 2:
print(twitter.html_for_tweet(tweet))

Checkpoint 1 output:

Use Cases, Trials and Making 5G a Reality <a href="<a href="https://t.co/W0uArTMk9N" class="twython-url">buff.ly/2sEhrgO</a>" class="twython-url">buff.ly/2sEhrgO</a> #5G #innovation via @5GWorldSeries <a href="<a href="https://t.co/W0uArTMk9N" class="twython-url">buff.ly/2sEhrgO</a>" class="twython-url">buff.ly/2sEhrgO</a>

Checkpoint 2 output:

Use Cases, Trials and Making 5G a Reality <a href="https://twython.readthedocs.io/" class="twython-url">buff.ly/2sEhrgO</a> #5G #innovation via @5GWorldSeries <a href="https://twython.readthedocs.io/" class="twython-url">buff.ly/2sEhrgO</a>

FWIW, messing with the display_url and expanded_url instead of url between checkpoints 1 and 2 still results in the double-expansion behaviour.

Hope this helps :)

Here's a minimal reproduction case based on that tweet. It appears to happen when a URL entity shares a `url` attribute with another URL entity. ```python from twython import Twython twitter = Twython() tweet = { 'entities': { 'hashtags': [], 'user_mentions': [], 'symbols': [], 'urls': [ { 'display_url': 'buff.ly/2sEhrgO', 'expanded_url': 'http://buff.ly/2sEhrgO', 'indices': [42, 65], 'url': 'https://t.co/W0uArTMk9N', }, { 'display_url': 'buff.ly/2sEhrgO', 'expanded_url': 'http://buff.ly/2sEhrgO', 'indices': [101, 124], 'url': 'https://t.co/W0uArTMk9N', } ], }, 'full_text': 'Use Cases, Trials and Making 5G a Reality https://t.co/W0uArTMk9N #5G #innovation via @5GWorldSeries https://t.co/W0uArTMk9N', } # Checkpoint 1: print(twitter.html_for_tweet(tweet)) # Change the full URL for the first entity. tweet['entities']['urls'][0]['url'] = 'https://twython.readthedocs.io/' # Checkpoint 2: print(twitter.html_for_tweet(tweet)) ``` Checkpoint 1 output: ```html Use Cases, Trials and Making 5G a Reality <a href="<a href="https://t.co/W0uArTMk9N" class="twython-url">buff.ly/2sEhrgO</a>" class="twython-url">buff.ly/2sEhrgO</a> #5G #innovation via @5GWorldSeries <a href="<a href="https://t.co/W0uArTMk9N" class="twython-url">buff.ly/2sEhrgO</a>" class="twython-url">buff.ly/2sEhrgO</a> ``` Checkpoint 2 output: ```html Use Cases, Trials and Making 5G a Reality <a href="https://twython.readthedocs.io/" class="twython-url">buff.ly/2sEhrgO</a> #5G #innovation via @5GWorldSeries <a href="https://twython.readthedocs.io/" class="twython-url">buff.ly/2sEhrgO</a> ``` FWIW, messing with the `display_url` and `expanded_url` instead of `url` between checkpoints 1 and 2 still results in the double-expansion behaviour. Hope this helps :)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: code/twython#447
No description provided.