support for multiple access tokens

This commit is contained in:
nanos 2023-04-10 16:28:09 +01:00
parent d0cb212315
commit 8235cda859
3 changed files with 103 additions and 66 deletions

View file

@ -59,12 +59,14 @@ If you want to, you can of course also run FediFetcher locally as a cron job:
2. Install requirements: `pip install -r requirements.txt`
3. Then simply run this script like so: `python find_posts.py --access-token=<TOKEN> --server=<SERVER>` etc. (Read below, or run `python find_posts.py -h` to get a list of all options.)
An example script can be found in the [`examples`](https://github.com/nanos/FediFetcher/tree/main/examples) folder.
An [example script](./examples/FediFetcher.sh) can be found in the `examples` folder.
When using a cronjob, we are using file based locking to avoid multiple overlapping executions of the script. The timeout period for the lock can be configured using `--lock-hours`.
If you are running FediFetcher locally, my recommendation is to run it manually once, before turning on the cron job: The first run will be significantly slower than subsequent runs, and that will help you prevent overlapping during that first run.
If you wish to run FediFetcher against for multiple users on your instance, you can supply the `--access-token` argument multiple times, with different access tokens for different users. That will allow you to fetch replies and/or backfill profiles for multiple users on your account. Have a look at the [sample script provided](./examples/FediFetcher-multiple-users.sh).
*Note:* if you wish to run FediFetcher using Windows Task Scheduler, you can rename the script to the `.pyw` extension instead of `.py`, and it will run silently, without opening a console window.
### 2.3) Run FediFetcher from a container
@ -78,7 +80,7 @@ The same rules for running this as a cron job apply to running the container: do
Persistent files are stored in `/app/artifacts` within the container, so you may want to map this to a local folder on your system.
An example Kubernetes CronJob for running the container is included in the [`examples`](https://github.com/nanos/FediFetcher/tree/main/examples) folder.
An [example Kubernetes CronJob](./examples/k8s-cronjob.yaml) for running the container is included in the `examples` folder.
### Configuration options
@ -100,7 +102,7 @@ Please find the list of all configuration options, including descriptions, below
| Environment Variable Name | Command line flag | Required? | Notes |
|:---------------------------------------------------|:----------------------------------------------------|-----------|:------|
| -- | `--access-token` | Yes | The access token. If using GitHub action, this needs to be provided as a Secret called `ACCESS_TOKEN` |
| -- | `--access-token` | Yes | The access token. If using GitHub action, this needs to be provided as a Secret called `ACCESS_TOKEN`. If running as a cron job or a container, you can supply this argument multiple times, to fetch posts for multiple users on your instance. |
|`MASTODON_SERVER`|`--server`|Yes|The domain only of your mastodon server (without `https://` prefix) e.g. `mstdn.thms.uk`. |
| `HOME_TIMELINE_LENGTH` | `--home-timeline-length` | No | Provide to fetch remote replies to posts in the API-Key owner's home timeline. Determines how many posts we'll fetch replies for. Recommended value: `200`.
| `REPLY_INTERVAL_IN_HOURS` | `--reply-interval-in-hours` | No | Provide to fetch remote replies to posts that have received replies from users on your own instance. Determines how far back in time we'll go to find posts that have received replies. Recommend value: `0` (disabled). Requires an access token with `admin:read:accounts`.

View file

@ -0,0 +1,33 @@
# This script is a sample script that you can schedule
# to run every 10 minutes from your cron job.
# Supply any other arguments, as you see fit.
# In this script, FediFetcher will fetch remote replies for multiple
# users on your instance
# TOKEN1, TOKEN2, and TOKEN3 belong to 3 different users here.
# Sample schedule:
# */10 * * * * /usr/bin/bash /path/to/FediFetcher.sh
###################### IMPORTANT ######################
# #
# YOU SHOULD RUN THIS SCRIPT MANUALLY AT LEAST ONCE #
# WITH YOUR CHOSEN ARGUMENTS, TO AVOID CONCURRENT #
# EXECUTIONS OF FEDIFETCHER! #
# #
###################### IMPORTANT ######################
cd /path/to/FediFetcher
python3 find_posts.py \
--access-token=TOKEN1 \
--access-token=TOKEN2 \
--access-token=TOKEN3 \
--server=your.server.social \
--home-timeline-length=200 \
--max-followings=80 \
--from-notifications=1 \
--lock-hours=1

View file

@ -15,7 +15,7 @@ import uuid
argparser=argparse.ArgumentParser()
argparser.add_argument('--server', required=True, help="Required: The name of your server (e.g. `mstdn.thms.uk`)")
argparser.add_argument('--access-token', required=True, help="Required: The access token can be generated at https://<server>/settings/applications, and must have read:search, read:statuses and admin:read:accounts scopes")
argparser.add_argument('--access-token', action="append", required=True, help="Required: The access token can be generated at https://<server>/settings/applications, and must have read:search, read:statuses and admin:read:accounts scopes. You can supply this multiple times, if you want tun run it for multiple users.")
argparser.add_argument('--reply-interval-in-hours', required = False, type=int, default=0, help="Fetch remote replies to posts that have received replies from users on your own instance in this period")
argparser.add_argument('--home-timeline-length', required = False, type=int, default=0, help="Look for replies to posts in the API-Key owner's home timeline, up to this many posts")
argparser.add_argument('--user', required = False, default='', help="Use together with --max-followings or --max-followers to tell us which user's followings/followers we should backfill")
@ -826,76 +826,78 @@ if __name__ == "__main__":
all_known_users = OrderedSet(list(known_followings) + list(recently_checked_users))
if arguments.reply_interval_in_hours > 0:
"""pull the context toots of toots user replied to, from their
original server, and add them to the local server."""
user_ids = get_active_user_ids(arguments.server, arguments.access_token, arguments.reply_interval_in_hours)
reply_toots = get_all_reply_toots(
arguments.server, user_ids, arguments.access_token, seen_urls, arguments.reply_interval_in_hours
)
known_context_urls = get_all_known_context_urls(arguments.server, reply_toots,parsed_urls)
seen_urls.update(known_context_urls)
replied_toot_ids = get_all_replied_toot_server_ids(
arguments.server, reply_toots, replied_toot_server_ids, parsed_urls
)
context_urls = get_all_context_urls(arguments.server, replied_toot_ids)
add_context_urls(arguments.server, arguments.access_token, context_urls, seen_urls)
for token in arguments.access_token:
if arguments.reply_interval_in_hours > 0:
"""pull the context toots of toots user replied to, from their
original server, and add them to the local server."""
user_ids = get_active_user_ids(arguments.server, token, arguments.reply_interval_in_hours)
reply_toots = get_all_reply_toots(
arguments.server, user_ids, token, seen_urls, arguments.reply_interval_in_hours
)
known_context_urls = get_all_known_context_urls(arguments.server, reply_toots,parsed_urls)
seen_urls.update(known_context_urls)
replied_toot_ids = get_all_replied_toot_server_ids(
arguments.server, reply_toots, replied_toot_server_ids, parsed_urls
)
context_urls = get_all_context_urls(arguments.server, replied_toot_ids)
add_context_urls(arguments.server, token, context_urls, seen_urls)
if arguments.home_timeline_length > 0:
"""Do the same with any toots on the key owner's home timeline """
timeline_toots = get_timeline(arguments.server, arguments.access_token, arguments.home_timeline_length)
known_context_urls = get_all_known_context_urls(arguments.server, timeline_toots,parsed_urls)
add_context_urls(arguments.server, arguments.access_token, known_context_urls, seen_urls)
if arguments.home_timeline_length > 0:
"""Do the same with any toots on the key owner's home timeline """
timeline_toots = get_timeline(arguments.server, token, arguments.home_timeline_length)
known_context_urls = get_all_known_context_urls(arguments.server, timeline_toots,parsed_urls)
add_context_urls(arguments.server, token, known_context_urls, seen_urls)
# Backfill any post authors, and any mentioned users
if arguments.backfill_mentioned_users > 0:
mentioned_users = []
cut_off = datetime.now(datetime.now().astimezone().tzinfo) - timedelta(minutes=60)
for toot in timeline_toots:
these_users = []
toot_created_at = parser.parse(toot['created_at'])
if len(mentioned_users) < 10 or (toot_created_at > cut_off and len(mentioned_users) < 30):
these_users.append(toot['account'])
if(len(toot['mentions'])):
these_users += toot['mentions']
if(toot['reblog'] != None):
these_users.append(toot['reblog']['account'])
if(len(toot['reblog']['mentions'])):
these_users += toot['reblog']['mentions']
for user in these_users:
if user not in mentioned_users and user['acct'] not in all_known_users:
mentioned_users.append(user)
# Backfill any post authors, and any mentioned users
if arguments.backfill_mentioned_users > 0:
mentioned_users = []
cut_off = datetime.now(datetime.now().astimezone().tzinfo) - timedelta(minutes=60)
for toot in timeline_toots:
these_users = []
toot_created_at = parser.parse(toot['created_at'])
if len(mentioned_users) < 10 or (toot_created_at > cut_off and len(mentioned_users) < 30):
these_users.append(toot['account'])
if(len(toot['mentions'])):
these_users += toot['mentions']
if(toot['reblog'] != None):
these_users.append(toot['reblog']['account'])
if(len(toot['reblog']['mentions'])):
these_users += toot['reblog']['mentions']
for user in these_users:
if user not in mentioned_users and user['acct'] not in all_known_users:
mentioned_users.append(user)
add_user_posts(arguments.server, arguments.access_token, filter_known_users(mentioned_users, all_known_users), recently_checked_users, all_known_users, seen_urls)
add_user_posts(arguments.server, token, filter_known_users(mentioned_users, all_known_users), recently_checked_users, all_known_users, seen_urls)
if arguments.max_followings > 0:
log(f"Getting posts from last {arguments.max_followings} followings")
user_id = get_user_id(arguments.server, arguments.user, arguments.access_token)
followings = get_new_followings(arguments.server, user_id, arguments.max_followings, all_known_users)
add_user_posts(arguments.server, arguments.access_token, followings, known_followings, all_known_users, seen_urls)
if arguments.max_followers > 0:
log(f"Getting posts from last {arguments.max_followers} followers")
user_id = get_user_id(arguments.server, arguments.user, arguments.access_token)
followers = get_new_followers(arguments.server, user_id, arguments.max_followers, all_known_users)
add_user_posts(arguments.server, arguments.access_token, followers, recently_checked_users, all_known_users, seen_urls)
if arguments.max_followings > 0:
log(f"Getting posts from last {arguments.max_followings} followings")
user_id = get_user_id(arguments.server, arguments.user, token)
followings = get_new_followings(arguments.server, user_id, arguments.max_followings, all_known_users)
add_user_posts(arguments.server, token, followings, known_followings, all_known_users, seen_urls)
if arguments.max_followers > 0:
log(f"Getting posts from last {arguments.max_followers} followers")
user_id = get_user_id(arguments.server, arguments.user, token)
followers = get_new_followers(arguments.server, user_id, arguments.max_followers, all_known_users)
add_user_posts(arguments.server, token, followers, recently_checked_users, all_known_users, seen_urls)
if arguments.max_follow_requests > 0:
log(f"Getting posts from last {arguments.max_follow_requests} follow requests")
follow_requests = get_new_follow_requests(arguments.server, arguments.access_token, arguments.max_follow_requests, all_known_users)
add_user_posts(arguments.server, arguments.access_token, follow_requests, recently_checked_users, all_known_users, seen_urls)
if arguments.max_follow_requests > 0:
log(f"Getting posts from last {arguments.max_follow_requests} follow requests")
follow_requests = get_new_follow_requests(arguments.server, token, arguments.max_follow_requests, all_known_users)
add_user_posts(arguments.server, token, follow_requests, recently_checked_users, all_known_users, seen_urls)
if arguments.from_notifications > 0:
log(f"Getting notifications for last {arguments.from_notifications} hours")
notification_users = get_notification_users(arguments.server, arguments.access_token, all_known_users, arguments.from_notifications)
add_user_posts(arguments.server, arguments.access_token, notification_users, recently_checked_users, all_known_users, seen_urls)
if arguments.from_notifications > 0:
log(f"Getting notifications for last {arguments.from_notifications} hours")
notification_users = get_notification_users(arguments.server, token, all_known_users, arguments.from_notifications)
add_user_posts(arguments.server, token, notification_users, recently_checked_users, all_known_users, seen_urls)
if arguments.max_bookmarks > 0:
log(f"Pulling replies to the last {arguments.max_bookmarks} bookmarks")
bookmarks = get_bookmarks(arguments.server, arguments.access_token, arguments.max_bookmarks)
known_context_urls = get_all_known_context_urls(arguments.server, bookmarks,parsed_urls)
add_context_urls(arguments.server, arguments.access_token, known_context_urls, seen_urls)
if arguments.max_bookmarks > 0:
log(f"Pulling replies to the last {arguments.max_bookmarks} bookmarks")
bookmarks = get_bookmarks(arguments.server, token, arguments.max_bookmarks)
known_context_urls = get_all_known_context_urls(arguments.server, bookmarks,parsed_urls)
add_context_urls(arguments.server, token, known_context_urls, seen_urls)
with open(KNOWN_FOLLOWINGS_FILE, "w", encoding="utf-8") as f:
f.write("\n".join(list(known_followings)[-10000:]))