How I created TwitterPub and why you should use it

Recently I started working on a project named TwitterPub. The goal of the project was to allow you to follow and interact with Twitter accounts over Mastodon and other ActivityPub software. It’s still very much a work in progress and doesn’t completely work yet. This blog post is to summarize what I learned as well as a progress report of what parts I’m still working on.

Pulling data from Twitter without their API

A while back Twitter suspended my personal account without providing a reason. They are within their rights to do this although I wish I had been given a reason. They denied my appeal as well. Since I am not allowed to have a Twitter account anymore I had to get creative with how I pull data from their platform. If there’s a will then there’s well me. Thankfully Twitter still lets logged out users view public profiles and tweets and that’s enough for me. I was able to have Ruby pull and parse this information. To prevent Twitter blocking the requests later on, I sent falsified HTTP Headers to pretend to be Firefox 71 running on macOS Catalina the requests look legit to me ?

Pulling and parsing a Twitter Profile

To pull and phrase a Twitter profile without their API I used HTTParty and Nokogiri to pull and phrase the pages. I selected elements by their CSS class. Due to how Twitter generates its pages it made it relatively straightforward to pull data from a profile. The drawback is that if they change classnames the code will have to be updated and they could quite easily break TwitterPub if they decided to do so.

# Get Profile Object from a Twitter Profile URL
def get_twitter_profile(url)
    # Get page and parse with Nokogiri
    page = HTTParty.get(url, {
      headers: {
        "User-Agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0",
        "Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language" => "en-US,en;q=0.5",
        "Cache-Control" => "max-age=0",
        "Connection" => "keep-alive",
        "DNT" => "1",
        "Upgrade-Insecure-Requests" => "1",
        }
      }
    )
    parsed_page = Nokogiri::HTML(page)
  
    # Assign data from page to variables
    display_name = parsed_page
                    .css('.ProfileHeaderCard-nameLink')
                    .inner_text
  
    account_name = parsed_page
                    .css('.username .u-linkComplex-target')[0]
                    .inner_text
  
    bio = parsed_page
            .css('.ProfileHeaderCard-bio')
            .inner_text
  
    website_link = parsed_page
                    .css('.ProfileHeaderCard-urlText')
                    .inner_text
                    .gsub("\n", "")
                    .gsub(" ", "")
  
    location = parsed_page
                .css('.ProfileHeaderCard-locationText')
                .inner_text
                .gsub("\n", "")
                .gsub("              ", "")
                .gsub("  ", "")
  
    join_date = parsed_page
                  .css('.ProfileHeaderCard-joinDateText')
                  .inner_text
  
    avatar_url = parsed_page
                  .css('.ProfileAvatar-image')
                  .attr('src')
                  .value
  
    header_url = parsed_page
                  .css('.ProfileCanopy-headerBg')
                  .children[1]
                  .attr("src")
  
    # Create and return a profile object
    profile = Profile.new(display_name, account_name, bio, website_link, location, join_date, avatar_url, header_url)
    return profile
  end

Pulling and parsing a Twitter Status

I used the same approach except less specific parsing was required for statuses.

# Get Tweet Object from a Twitter Status URL
def get_twitter_status(url)
    puts url
    # Get page and parse with Nokogiri
    page = HTTParty.get(url, {
      headers: {
        "User-Agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0",
        "Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language" => "en-US,en;q=0.5",
        "Cache-Control" => "max-age=0",
        "Connection" => "keep-alive",
        "DNT" => "1",
        "Upgrade-Insecure-Requests" => "1",
        }
      }
    )
    parsed_page = Nokogiri::HTML(page)
  
    # Assign data from page to variables
    status = parsed_page
              .css('.tweet-text')
              .inner_text
  
    timestamp = parsed_page
                  .css('.metadata')
                  .children[1]
                  .inner_text
  
    # Create and return a status object
    status = Tweet.new(status, timestamp)
    return status
  end

How Twitter could stop me from doing this

I think the easiest way by technical means would be to randomize the CSS class-names. I would then have to enumerate through the DOM in another more difficult way. Twitter could also add some “checking your browser” style screen but it may interfere with the mobile apps. They could also require an account to view public profiles and tweets (then whitelist Google and other search engines). Lastly Twitter could have their corporate lawyers send a cease and desist letter and use the legal route. I don’t think they care but I’ve read they do block IP Addresses which hit too many 404s. I guess I’ll see what happens in practice.

ActivityPub is complicated

Turning this data into ActivityPub responses is more difficult. I had to implement a lot of different actors and then an outbox. To properly implement an outbox (required to display posts on a profile) you have to store copies of the profile and put into a complicated JSON response. I’m still working on using sqlite to do this. I’m also looking at what I’d need to do to copy every tweet from a profile and remove duplicates with close to 100% accuracy. It’s very much a work in progress but I hope to do more soon.

What’s next

I want to add the ability to reply to tweets over mastodon. I’m still researching ways to go about this. I will likely have to implement parts of this with Twitter’s API for a third party to run their own relay if interested. It’s the only way to reply to Twitter accounts that I’m aware of. I want to get a working demo out there as well. I hope you found this post insightful and are interested in seeing the completion of TwitterPub.

%d bloggers like this: