Simulation of Research using LLM and Tree Search

Let me share some weird stuff I built for fun during the weekend.

We can have an LLM read multiple existing papers and then generate a fictional subsequent study. By repeating this process and extending the Directed Acyclic Graph of papers, I thought that we could simulate how researchers read papers and generate new studies.

I built the simulation with the MCTS / UCB Tree Search-ish algorithm. The output is evaluated by another LLM as a reviewer, and the evaluation score is used to extend the descendants in a better direction while pruning bad branches.

The top node (yellow) is a real paper, while the other nodes are generated by the LLM.

https://gyazo.com/ba07fa6646599c86422a767df865b3f3https://gyazo.com/990ce241710c5fc368de1919e233b130https://gyazo.com/20e298f7291a51e1658a0a2161165f1f

The results is somewhat cherry-picked, so I'm not entirely confident in the conclusions. However, I do observe an increase in the evaluation of each node, which suggests that the exploitation of good nodes is working effectively.

https://gyazo.com/9d0284b412c276cd3bfd7cd4081452dd

It's interesting to see how the growth patterns change when we alter the definition of a 'good paper' provided to the LLMs."

I also observe the change of growth when we change the parameter c of UCB.

Putting more weight on exploration (c=0.3):

https://gyazo.com/9127cb13394b4b0d0f0737b72ff7e0fd

Putting more weight on exploitation (c=0.1):

https://gyazo.com/9dfc2f4a72b980e468a05e1faf80f2c2

Some potential directions:

We could also feed the human feedback (e.g. gaze attention) to grow the graph to relevant direction

The same approach can be taken for other brianstorming-type tasks (e.g. generating a game idea, story plot, etc)

it would be an interesting approach to create multiple agents with different “interests” and let them collaboratively grow the graph

https://github.com/blu3mo/simulacra-of-academia