AlphaGo: The Documentary

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • alphafold

    Open source code for AlphaFold.

  • https://github.com/search?q=alphafold ... https://github.com/deepmind/alphafold

    How do I reframe this problem in terms of fundamental algorithmic complexity classes (and thus the Quantum Algorithm Zoo thing that might optimize the currently fundamentally algorithmically computationally hard part of the hot loop that is the cost driver in this implementation)?

    To cite in full from the MuZero blog post from December 2020: https://deepmind.com/blog/article/muzero-mastering-go-chess-... :

    > Researchers have tried to tackle this major challenge in AI by using two main approaches: lookahead search or model-based planning.

    > Systems that use lookahead search, such as AlphaZero, have achieved remarkable success in classic games such as checkers, chess and poker, but rely on being given knowledge of their environment’s dynamics, such as the rules of the game or an accurate simulator. This makes it difficult to apply them to messy real world problems, which are typically complex and hard to distill into simple rules.

    > Model-based systems aim to address this issue by learning an accurate model of an environment’s dynamics, and then using it to plan. However, the complexity of modelling every aspect of an environment has meant these algorithms are unable to compete in visually rich domains, such as Atari. Until now, the best results on Atari are from model-free systems, such as DQN, R2D2 and Agent57. As the name suggests, model-free algorithms do not use a learned model and instead estimate what is the best action to take next.

    > MuZero uses a different approach to overcome the limitations of previous approaches. Instead of trying to model the entire environment, MuZero just models aspects that are important to the agent’s decision-making process. After all, knowing an umbrella will keep you dry is more useful to know than modelling the pattern of raindrops in the air.

    > Specifically, MuZero models three elements of the environment that are critical to planning:

    > * The value: how good is the current position?

    > * The policy: which action is the best to take?

    > * The reward: how good was the last action?

    > These are all learned using a deep neural network and are all that is needed for MuZero to understand what happens when it takes a certain action and to plan accordingly.

    > Illustration of how Monte Carlo Tree Search can be used to plan with the MuZero neural networks. Starting at the current position in the game (schematic Go board at the top of the animation), MuZero uses the representation function (h) to map from the observation to an embedding used by the neural network (s0). Using the dynamics function (g) and the prediction function (f), MuZero can then consider possible future sequences of actions (a), and choose the best action.

    > MuZero uses the experience it collects when interacting with the environment to train its neural network. This experience includes both observations and rewards from the environment, as well as the results of searches performed when deciding on the best action.

    > During training, the model is unrolled alongside the collected experience, at each step predicting the previously saved information: the value function v predicts the sum of observed rewards (u), the policy estimate (p) predicts the previous search outcome (π), the reward estimate r predicts the last observed reward (u).

  • online-go.com

    Source code for the Online-Go.com web interface

  • The best way to learn the rules of the game, as well as basic skills, is an app called badukpop.

    https://badukpop.com/

    Once you've learned enough you can try playing via OGS, http://online-go.com

    Don't expect to win right away.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • gym

    A toolkit for developing and comparing reinforcement learning algorithms.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts