List of ideas
Here's a prioritized list of ideas in AI safety. You can also vote on which ideas you think seem most promising or add your own to get feedback from others.
Prioritization research template
Figuring out what idea to start is called prioritization research. Here's a list of questions you can research for each idea you're considering.
Instructions
- Jump around. Don’t do in order. Only do the “To start” section first. For the rest, jump around based on what you think is most tractable and important and relevant.
- Do it roughly in order of things most likely to rule something out.
- Desk research → talking to experts/beneficiaries → MVP
- You cannot use this on just one idea! No idea is a “good” idea. It can only be good relative to another idea. Make sure you’re comparing at least 5 ideas, ideally 50 or more.
- You can only do as well as the best option you’re considering. So consider a lot of ideas!
To start
- Set an intention for how long you want to spend researching this. It’s easy to get lost in a rabbit hole.
- Quick ruling out factors.
- What are the factors most likely to rule out this intervention? Look into those first.
- Set a recurring reminder every hour to take a step back and ask yourself “Is this the best thing to be focusing on right now, given limited time?”
Theory of change
- How does this:
- Decrease x-risk
- Decrease s-risk
- What would this look like if worked out really well? (5% chance of happening)?
- Write out a very explicit theory of change
- Play steelman solitaire on each of the nodes in the theory of change
- Make a lean canvas of the intervention.
- Customers=beneficiaries
- Revenue=impact
- Fill it out in this order
- Describe your ideal beneficiary (e.g. a recently graduated ML PhD who’s read the sequences and is looking to work full time in AI safety but doesn’t want to work in academia)
- How long term are the measured effects?
- How strong is the reasoning/evidence connecting the intervention to reduced x/s-risk?
- What’s the estimated breakdown of the intervention’s effects on these different metrics?
- Quality-adjusted research-years
- New research directions
- Buying time / slowing down capabilities
- Which values to put into the AGI
- Brainstorm other potential outcomes/metrics that could be relevant
- Search for critiques of this intervention. What do people say is wrong with it?
- Who’s already crushing it in related fields? What could you copy from them?
- What does the worst case look like for this?
- How many beneficiaries would you need to help this year to meet your impact goals
- Is there an actual need for this? Talk to potential beneficiaries and ask them:
- How do they currently solve the problem?
- If they’re not doing anything to solve the problem, it might be because it’s not very important to them.
- How would your solution compare?
- How much would they pay for your solution if it existed?
- How do they currently solve the problem?
- Are there any sources of passive impact for this?
- How passive would it be? (eg how much effort/money would you have to put in per week as an ongoing basis?)
- What’s the decay rate of the impact? That is, how likely will the impact still be happening a year from now? 5 years? Etc.
- Are there any ways to make it more passive?
- Automation?
- Handing off?
- If handing off, how likely is it to be something you still think is high impact after handing it off? (E.g. how sensitive is it to minor perturbations in direction?)
- How long would this intervention rely on you?
- How hard would it be to replace you?
Crowdedness
- Are there any extremely strong charities that work in this area?
- Check the AI safety map for similar charities
- Google on EA Forum and LessWrong
- Google this: site:forum.effectivealtruism.org OR site:lesswrong.com [thing you’re searching for]
- Post around on EA channels asking if people know of anybody doing work in this area. Here’s a list of places you might post.
- Google for other cause areas doing similar things.
- How did it go?
- DM/email friends and acquaintances a short 1-2 line question:
- “Hey! Do you know of anybody who worked/works on [intervention] or anything similar?”
- If there are existing charities working in this area, this doesn’t necessarily mean you shouldn’t start something there. It matters the quality and quantity that they’re doing.
- What specific interventions are they doing?
- How’s their coverage compared to the need? What is your rough sense of the percentage of this problem that seems to be unresolved?
- If there are charities already doing the intervention in the field yet there are still coverage gaps, why is that? Is it money or is there some other obstacle to full coverage?
- What are ways your org could provide better or different services?
- Are the other players strong and competent?
- Are they doing a good job?
- Are they doing the most cost-effective intervention?
- How’s their execution?
- Are they doing a good job?
- What is the trend of the issue?
- Is this likely to still be an issue 5 years from now? Why or why not?
- Would someone else provide the service if you didn't?
- Keep in mind that almost all interventions will be done eventually, and usually you just move forward an intervention.
- How good are you as a fit for this compared to other players?
- How likely are these other players?
- What are their counterfactuals? (Though don’t get too lost in counterfactuals. There lies madness)
- Could you simply go and promote this idea and then somebody better would start it?
Timelines
- How does this work taking into account timelines?
- What are the likely range of effects in a short timelines world?
- What are the likely range of effects in a long timelines world?
- If you put high odds on short timelines, then will you be able to implement this intervention fast enough that it’ll make a difference?
Cost-effectiveness
- What are the best estimates for how many metrics per dollar this intervention can prevent?
- What is the monetary cost of the intervention?
- Fixed costs
- Variable costs
- Fixed costs
- What is the effect size of the intervention?
- List in confidence intervals.
- Consider doing a cost-effectiveness analysis. Use Guesstimate or something similar, that lets you model confidence intervals and uncertainty.
- Only do this if it’s really crucial. Most models like this vary a ton and don’t provide a ton of value, especially in AI safety.
- Only do this if it’s really crucial. Most models like this vary a ton and don’t provide a ton of value, especially in AI safety.
- What is the monetary cost of the intervention?
- Are there any alternative ways to approach this intervention that could make it more cost-effective?
- Automating it?
- Partnering with another organization?
- Pairing it with other interventions that have similar distribution channels?
- Increasing participation rates?
Scalability
- Do you want to stay small or grow?
- Size of program area
- On what scale could this intervention be delivered?
- What is the total number of people affected?
- What’s the trend of that over time?
- On what scale could this intervention be delivered?
- Speed of scaling
- What would the general process of scaling this intervention require?
- Will scaling this intervention rely on technology or skilled talent?
- Will scaling require partnering with a distribution system? Or partnering with another organization?
- Is scaling limited by the number or size of partners?
- Is the intervention a one-off thing or does it have to be regularly repeated?
- How fast could this be scaled-up?
- Are there other factors that may slow scaling?
- What would the general process of scaling this intervention require?
Potential downsides
- What are the ways that this could be net negative?
- How likely are these consequences? (Give a range)
- How bad would it be if they happened? (Give a range)
- How does this compare to the probability and scale of benefits? (Give a range)
- What are ways you could mitigate these risks? Put on a timer and spend at least 5 minutes brainstorming solutions.
- Are the negative effects reversible or stoppable?
- Unilateralist curse check:
- Ask 5-10 smart, informed, values aligned people about the idea.
- If any of them think it’s net negative, really dig into it and evaluate their arguments and see if you agree or disagree. Don’t do it if you’re convinced by their reasoning.
- If 50-90% of them think it’s negative, seriously consider that they might be right, even if you disagree with them.
- If >90% of them think it’s net negative, this is quite likely the unilateralist’s curse. Don’t do it!
- Are there any ways that this intervention could “poison the well”?
- How likely are these consequences? (Give a range)
- How bad would it be if they happened? (Give a range)
- How does this compare to the probability and scale of benefits? (Give a range)
- What are ways you could mitigate these risks? Put on a timer and spend at least 5 minutes brainstorming solutions.
- Are the negative effects reversible?
- Are there any ways that this intervention poses information hazards?
- How likely are these consequences? (Give a range)
- How bad would it be if they happened? (Give a range)
- How does this compare to the probability and scale of benefits?
- What are ways you could mitigate these risks? Put on a timer and spend at least 5 minutes brainstorming solutions.
- Are the negative effects reversible or stoppable?
- Reputation hazards
- Are there any reputation hazards posed by this intervention?
- How likely are these consequences? (Give a range)
- How bad would it be if they happened? (Give a range)
- How does this compare to the probability and scale of benefits? (Give a range)
- What are ways you could mitigate these risks? Put on a timer and spend at least 5 minutes brainstorming solutions.
- Are the negative effects reversible or stoppable?
- Opportunity costs
- Where would you be taking people / attention from in expectation? Would it be from something better than your intervention?
- How likely are these consequences? (Give a range)
- How bad would it be if they happened? (Give a range)
- How does this compare to the probability and scale of benefits? (Give a range)
- What are ways you could mitigate these risks? Put on a timer and spend at least 5 minutes brainstorming solutions.
- Are the negative effects reversible or stoppable?
Personal fit
- What’s the most similar thing you’ve done to this intervention in the past? How much did you like/dislike it?
- Why?
- Brainstorm potential fixes
- How does this fit with your personal interests?
- What are the parts you predict you’d find very enjoyable?
- What are the parts you predict you’d dislike?
- How does this fit with your personal skills?
- What skills do you have that would be particularly good for this?
- What skills do you lack that would be particularly important for this?
- If you don’t have the relevant skills or experience, how long and difficult would it be for you to:
- Hire for
- Acquire yourself
- If you don’t have the relevant skills or experience, how long and difficult would it be for you to:
- How does this fit with your personal experience?
- How many hours a week would you have to work for this to succeed?
- Is that a sustainable amount for you?
- What probability do you put on still enjoying this 5 years from now?
- Will this org require you to move?
- Are you in a position to move to that place?
- What about visas?
- What about family / relationships?
- What about liking the area?
- Are there ways to make it work remotely?
- Are you in a position to move to that place?
- Imagine 5 years from now and you hate running this org.
- What happened and why?
- Are there any ways you could prevent or mitigate those problems in advance? (e.g. delegate, change strategy, etc)
- What’s your plan B? What would you do if this org failed miserably?
Indirect positive effects
- Skill-building. Which skills would this likely lead you to developing?
- Would this intervention lead to gaining information on effective strategies?
- Boosting the reputation of related ideas or causes.
Logistical factors
- How difficult is the intervention to run?
- Do experts generally consider it simple or difficult relative to other interventions?
- What is the biggest barrier to successful implementation?
- Are there any legal reasons you can’t do this?
- Are there any ways to make it legal? (E.g. do it in a different country? Have different beneficiaries? Different org structure? Etc etc)
- How hard is it to test the effectiveness of this intervention?
- How feasible is it to run MVPs of this?
- How fast could this intervention be tested?
- What have been the stumbling blocks for charities or governments who have tried this intervention before?
- Are there any unexpected things that make this intervention vulnerable to lower impact if things don't go as planned? E.g. Reliance on tech, needing highly skilled people, low uptake of intervention, etc.
Flexibility / Option value
- Yourself:
- Imagine 5 years from now you no longer think this cause area is the best use of your time.
- How easily can you switch into another field?
- Which of the skills and experiences you gained doing this could you cross-apply to most other fields?
- Which skills and experiences wouldn’t cross-apply to many fields?
- How easily can you switch into another field?
- Imagine 5 years from now you no longer think this cause area is the best use of your time.
- Staff:
- To what degree will this intervention allow you to hire flexible staff who would be able to contribute to many different interventions?
- What skills will doing this intervention give your staff that you could use for future interventions?
- To what extent are the skills you’d develop to run this charity generalizable to other fields or orgs?
- Funding:
- To what extent would a charity implementing this intervention rely on a single large funder (e.g. Open PHil)?
- Would a charity like this lend itself to having a flexible donor base who would continue supporting growth into different intervention areas or updates based on changed evidence-base or new opportunities?
- Is there reason to think that this intervention will be difficult to acquire funding for, compared to other interventions?
- Is it possible that this program will be unusually capable of attracting funding relative to other interventions from people who would otherwise donate to less effective or ineffective charities?
- Logistics:
- Do you have a direct line of communication with the people you are helping?
- Does the infrastructure for this intervention lend itself to being used for many different interventions?
- Will the intervention be restricted to a fixed or small location?
- Are there many plausible ways of doing this intervention?
Broad subjective sense
- What are the key arguments against doing this intervention?
- What are the key arguments for doing this intervention?
- How does it seem from the reading you have done, past experience and people you have talked to?
- What do you not know?
- What are the remaining unanswered questions?
- How much more research might it take (if any) to feel confident before running this intervention?
- What do experts seem to broadly think of this area?
- How does this intervention broadly compare to the other interventions?
Summary
- Describe the intervention in one or two sentences.
- What is your provisional conclusion on this intervention?
- Promising enough to look at specific charities in?
- Ruled out due to X factors?
- Is it broadly strong or weak relative to other options?
- Give a two-sentence summary of this intervention.
- What are the biggest strengths and weaknesses of this intervention?
- How much time was spent on this research?