Simulation of Research using LLM and Tree Search
Let me share some weird stuff I built for fun during the weekend.
We can have an LLM read multiple existing papers and then generate a fictional subsequent study. By repeating this process and extending the Directed Acyclic Graph of papers, I thought that we could simulate how researchers read papers and generate new studies.
I built the simulation with the MCTS / UCB Tree Search-ish algorithm. The output is evaluated by another LLM as a reviewer, and the evaluation score is used to extend the descendants in a better direction while pruning bad branches.
The top node (yellow) is a real paper, while the other nodes are generated by the LLM.
https://gyazo.com/ba07fa6646599c86422a767df865b3f3https://gyazo.com/990ce241710c5fc368de1919e233b130https://gyazo.com/20e298f7291a51e1658a0a2161165f1f
The results is somewhat cherry-picked, so I'm not entirely confident in the conclusions. However, I do observe an increase in the evaluation of each node, which suggests that the exploitation of good nodes is working effectively.
https://gyazo.com/9d0284b412c276cd3bfd7cd4081452dd
It's interesting to see how the growth patterns change when we alter the definition of a 'good paper' provided to the LLMs."
I also observe the change of growth when we change the parameter c of UCB.
Putting more weight on exploration (c=0.3):
https://gyazo.com/9127cb13394b4b0d0f0737b72ff7e0fd
Putting more weight on exploitation (c=0.1):
https://gyazo.com/9dfc2f4a72b980e468a05e1faf80f2c2
Some potential directions:
We could also feed the human feedback (e.g. gaze attention) to grow the graph to relevant direction
The same approach can be taken for other brianstorming-type tasks (e.g. generating a game idea, story plot, etc)
it would be an interesting approach to create multiple agents with different “interests” and let them collaboratively grow the graph
https://github.com/blu3mo/simulacra-of-academia