Recommender systems (RS) often leverage information about the similarity between items' features to make recommendations. Yet, many commonly used similarity functions make mathematical assumptions such as symmetry (i.e., Sim(a, b) = Sim(b, a)) that are inconsistent with how humans make similarity judgments. Moreover, most algorithm validations either do not directly measure users' behavior or fail to comply with methodological standards for psychological research. RS that are developed and evaluated without regard to users' psychology may fail to meet users' needs. To provide recommendations that do meet the needs of users, we must: 1) develop similarity functions that account for known properties of human cognition, and 2) rigorously evaluate the performance of these functions using methodologically sound user testing. Here, we develop a framework for evaluating users' judgments of similarity that is informed by best practices in psychological research methods. Employing users' fashion item similarity judgments collected using our framework, we demonstrate that a psychologically-informed similarity function (i.e., Tversky contrast model) outperforms a psychologically-naive similarity function (i.e., Jaccard similarity) in predicting users' similarity judgments.