I am looking for scalable way to do the following:
- User login
- Fetch all Friends from Twitter
- Fetch all Followers from Twitter
- Display all Friends which aren’t Followers
The Problem: How can this be done in a scalable way? An user can have up to 2 million friends or followers. Currently I’m storing both inside an SQLite table and compare them through a loop. When the user comes back the table is cleared and process starts again.
This works fine on 100 – 1000 Friends, but will be tricky with 500000 Friends. I can’t cache the lists because they can change every moment.
Does anyone know a good way to handle such big amount of data?
Advertisement
Answer
I don’t know what your database looks like, but this is how I would set it up.
CREATE TABLE twitter_users ( user_id INTEGER PRIMARY KEY NOT NULL, screen_name VARCHAR(20) NOT NULL ); CREATE TABLE friends ( friend_id INTEGER PRIMARY KEY NOT NULL ); CREATE TABLE followers ( follower_id INTEGER PRIMARY KEY NOT NULL );
Then you can use this SQL to get the friends who are not followers.
SELECT friend_id, screen_name FROM friends LEFT JOIN followers ON follower_id = friend_id LEFT JOIN twitter_users ON user_id = friend_id WHERE follower_id IS NULL
If the screen name is NULL
it means they are not in your twitter_users
table. You can look up the missing users and store them for later. Screen names can change so you might need to update the table periodically.
Use the friends/ids and followers/ids APIs to get a list of friend and follower ids 5,000 at a time. Use the users/lookup API to get up to 100 screen names. If a user has 2,000,000 friends it will take 400 api calls to get the list of ids so you should still cache the list at least for popular users.