π§΅ View Thread
π§΅ Thread (33 tweets)

Alright, The Time Has Come. T h e A l g o r i t h m #RecSys Party Parrot Time. https://t.co/MmMlS26YGl

The blog post that goes along with the release of the twitter "For You" algorithmic timeline algorithm is here: https://t.co/IUAJUR3F33 and is a great summary

There's also a second repository for the ML components of the twitter timeline recsys: https://t.co/gopop0Do2k A very deliciously code filled Friday today!

I'm still digging through but on first look it's a substantive chunk of code, so worth digging into. There's extra docs in the repos and comments in code that are very useful and give some nice insights - especially if you're interested in recsys at scale! https://t.co/R3qVn03Yi7

Like all Recommender Systems, it's a "symphony" of component parts and data. Don't expect to find any "gotchas" like specific rules to promote or demote culture war things. (Obviously i haven't looked thoroughly yet but i'm pretty sure you won't find stuff like that there)

(I say this because there was a massive amount of screeching "JUST YOU WAIT" from people who don't know how these systems work, and only want a fresh set of sticks to beat each other with, and i want to avoid tired partisan arguments about it all)

A great way that sums up the effort of how ML models and recommender systems are deployed at scale: "The pipeline runs ~5 billion times per day and completes in under 1.5 seconds. A single pipeline execution requires 220 sec of CPU time ~150x the latency you perceive on the app"

AH FUCK SAKE LMAO π€‘ https://t.co/sBlOONHevB https://t.co/PCdee9Lmby


How it Started / How It's Going https://t.co/azDZ4By5DB


Another good thread focusing on the Algo and implications here https://t.co/loLQdwCWW9 i'm gonna keep looking too.

So, one thing to point out from the outset about these "features" everyone is going to clown on, rightfully, is that while the code defines the features, there's no actual way to say for sure how much weight they would actually have on the end result. Could be that: ...

... it was just easy to put these features in and get people "above" to get off your back, and claim you've solved it - while the rest of the complex system ends up ignoring them. But anecdotally the feature did work, remember that time everyone's feed was all Elon? That was it.

So, this part about UkraineCrisisTopic doesn't automatically imply there was some nefarious censorship necessarily, in fact it might be the opposite: It appears in "Public Safety Rules" https://t.co/fl50vAK0QU and https://t.co/aYRLXu55hV more importantly https://t.co/jxaiMMNNgM

The "PublicInterestRules" look like the parts of code that may be dealing with misinformation and propaganda, so it's the warning labels that sometimes get attached to tweets. Not something you can claim has a specific bias for any side - just that it was a feature and was active

Also, separately to the Safety flags, it appears the "Elon" / "Democrat" / "Republican" features are more likely debugging features, for A/B tests and metrics, not something to specifically rank on https://t.co/ihRRfLLb3W

Again to highlight a key point about how these systems work: There are features described in the code but separately there are settings and parameters and configurations in the live system that may or may not have bigger effects on the system output. This is key to keep in mind

There's still a bunch of stuff "missing" to get the full picture https://t.co/EMGCF1CWGO It's like releasing a book of recipes, but not saying if the food you make is actually using slightly altered / extra ingredients / different amounts or exactly how you do things in practice

So, for reference, Twitter "Topics": a giant list of a subset they assign to tweets: https://t.co/4f9219xLHF there's many many manually added ones, ephemeral ones, etc. They're messy. These are for search / filters / entity detection / metadata https://t.co/TXPptmrpaX

Urgh. I'm gonna have to load this scala code into a proper IDE to get a handle on it. Can't see how parts people point out relate to one another - very hard to get a sense of exactly what fits where just via github and searching for variable keywords for interesting words.

Like, there's some very interesting, non obvious things like https://t.co/yQY84t6Fxk , what happened on April 1 2019? Why is that the threshold date for LowQualityProxyLabelStart?

The most fun part for me, is reading what Tweeps that worked on Twitter systems have to say: https://t.co/PgW0McEBc5

I'm really glad for these kinds of aspects of it <3 https://t.co/zuiRgDZOZt i hope this part doesn't get overlooked, because it's a really important one. This kind of release is an extremely rare and valuable look inside, for software devs and recommender systems practitioners.

With all the excitement i'm only catching up on the Space now: https://t.co/LDldI9NV68

The space has a few nuggets of wisdom but it's also slightly awkward, partly because it's nerds and partly because it's the bossman there with engineers fearing for their jobs, and awkward Spaces technical issues and some awkward off topic questions lol

Does anyone want a few nuggets / algo hacks that are fun in the mean time?https://t.co/ftcZnKWkIV

Sadly, for people like me at the hard limit of Followings, this does indeed limit your reach, and requires some pruning and unfollows??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? https://t.co/xqkOKItQc5

An obvious one, but yeah - Twitter Blue gives you a HUGE boost in being recommended and shown in Timelines: https://t.co/uViWVhtraZ

A note on the intent behind adding a "question marks" feature: https://t.co/VxG4Y9icw4 What do you think? :P

Going to wrap this initial thread up here and continue the more technical Recommender Systems stuff in the repository, and the other thread: https://t.co/m6ci1rUPho https://t.co/DgIyfuJ7tq

Today, continuing the Twitter Algorithm Deep Dive, i'm hoping to spend some time digging into the Machine Learning components in the second repository, not the one everyone was keyword searching yesterday, as fun as that was. And to fill out some more general recsys info. https://t.co/YsIt1RJgGB