π§΅ View Thread
π§΅ Thread (18 tweets)

Today, continuing the Twitter Algorithm Deep Dive, i'm hoping to spend some time digging into the Machine Learning components in the second repository, not the one everyone was keyword searching yesterday, as fun as that was. And to fill out some more general recsys info. https://t.co/YsIt1RJgGB

@vboykis Here's the repository where we're hoping to keep notes on diving into these Recommender Systems releases from Twitter: https://t.co/m6ci1rUPho feel free to add PRs / edit / suggest things in issues! Questions / Suggestions all welcome!

@vboykis Twitter used a Bert based model for embedding tweet text. Back a few years ago when they ran the Recsys Challenge, and provided "features" which were masked, people figured out that it was some sort of Bert style model, but here we have a reference to it: https://t.co/HoiZgT8rzU

@vboykis The RecSys Challenge from 2020 and 2021 are very usefulto go through for more context on this Twitter Recommender Release. Much of the stuff that was discussed back then, applies to the code released now. The Recsys Challenge was essentially exactly the same task as For You recs.

@vboykis 2020 Challenge https://t.co/TbMHlfZnSd & video: https://t.co/SmszkG26bM 2021 Challenge https://t.co/uy4dP0hs5s & video: https://t.co/715SiCdLyg These are very much Recommender Systems Academic presentations, so don't expect much exciting reveals unless you're into recsys.

The Ranking component is a MaskNet. Haven't read the paper in detail so i can't say much about it yet, but code lives here: https://t.co/hecSvVZv18 The key takeaway is: It's optimizing for the good ol' reliable industry standard, Click Through Rate (CTR) https://t.co/GwMeaFaAek

While i'm having fun clicking around python code, a quick observation: https://t.co/F5YTKoYlLX "For You" tab is definitely not "Infinite Scroll". It's by definition at most 1500 tweets. In practice after filtering it's down to a few hundred, you can definitely "finish" twitter!

I broadly agree here at a high level: https://t.co/ofUh3L5hbt there's just too much missing in these releases to do anything substantive with, but the ML repo DOES have some good things to go on, and together with other sources, is very useful to explore the associated research.

Strictly speaking, it's not correct to say that the Twitter ranking algo is optimizing for "CTR" like i did above, it's actually a blend of things: https://t.co/COrCvy1Y5t and the input features for Ranking are different to the ones pointed out for Candidate Retrieval:

This is another great post here if you want to get up to speed: https://t.co/FzjC3OQxVe https://t.co/aBjWze4LAz

And for the ml repo "recap" project, https://t.co/hecSvVZv18 thank / blame @alth0u π https://t.co/tW1zeMoZal

@alth0u REAL AS FUCK https://t.co/BtGY14AU38

Straight from the source: https://t.co/0gK4aBtDRu And remember...

Graphs are just sparkling Matrices https://t.co/vCLMKlM9ur

Engagement based attacks like this are very possible. You may need multiple accounts in good standing, because i assume you can't just buy a bunch of bots? But big chunks of Trust & Safety rules are missing in the code release so no way to know exactly: https://t.co/K7j88W4UCa

The mass reporting / blocking tactic isn't new, but now we have some code and default weights to point to: https://t.co/zat6BgX3eT

Yesterday, Twitter added some changes to the algo code https://t.co/YrcXat13K4 it is still unknown and unlikely that this exact code is the one running in production, so the changes are more performative in my opinion, but still, worth reading and there's some good clarifications