AI Evaluates AI-Like Outputs More Favorably
In Hiring, the Standard of "Good Resume" Can Be Distorted by Expression Styles AI Finds Familiar
An Operational Problem That Can Be Reduced by Designing Evaluation Structures

This paper is research demonstrating that the fairness issue in the AI era is no longer limited to discrimination between human groups but is extending to a technical-stylistic power structure where generative AI and evaluating AI recognize and prefer each other. Instead of "does AI evaluate fairly," the authors ask "does an applicant become advantaged when the evaluating AI and the AI used for writing are the same." The authors use 2,245 actual human-written resumes, have multiple LLMs regenerate the core summary of each resume, and then have an evaluating LLM compare the two versions. Importantly, candidate education, career, and capability information is kept the same, with only the expression method changed.

.

AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights
Jiannan Xu, Gujie Li, Jane Yi Jiang, 2026

.

From Human vs. AI Bias to AI vs. AI Bias
 The authors pay attention to the situation where LLMs enter both sides of decision-making simultaneously. Applicants polish resumes with LLMs, and companies screen resumes with LLMs. At this point, the evaluating AI may prefer documents written by AI over those written by humans, and further may prefer documents written by itself over those written by other AIs. The paper views this bias not as the result of external social discrimination being reflected in AI but as an interactive bias internally arising from the coupling of AI generation systems and AI evaluation systems.

When we typically think about AI fairness, we worry about "does AI follow human biases." However, the paper says the fairness problems of the future must also include the question of "does AI prefer AI outputs." This can also be applied to domains such as paper review, content moderation, educational evaluation, and customer response quality evaluation. If human-written text and AI-written text are placed before the same evaluation system, and that evaluation system is also an LLM, evaluation criteria may be shaken by not only the quality of content but the familiarity of the generation style.

The Algorithm That Loves AI Over Humans, and 'Itself' Over Other AIs
 In most major models, self-preferencing bias in the 'LLM vs. human' frame appeared overwhelmingly.

  • Disadvantage of human resumes: Despite controlling content quality equally, large models such as GPT-4o and DeepSeek-V3 had a 67% to 82% higher probability of selecting resumes written by themselves compared to those written by humans.
  • Model variance: GPT-4o showed the strongest figures with self-preference bias exceeding 80%, with the tendency strengthening as model size increased.
  • Complex AI-vs-AI war: In the 'LLM vs. LLM' frame, results were somewhat mixed. DeepSeek-V3 strongly preferred its own outputs compared to other models (69% vs. LLaMA, 28% vs. GPT-4o), but GPT-4o and LLaMA 3.3-70B showed relatively inconsistent self-preference or sometimes preferred other models' text.

 The Person Who Used the 'Right Tool' Becomes Advantaged Over the More Capable Person
 The authors conducted hiring simulations for 24 job categories. As a result, applicants using the same LLM as the evaluator were found to have a 23-60% higher possibility of reaching the final candidate stage compared to applicants submitting resumes of the same level written by humans. The disadvantage was greater in business-related positions such as accounting, sales, finance, and business development, and relatively weak in agriculture, arts, and automotive sectors.

 These results demonstrate that hiring unfairness can be restructured in new ways. While in the past better education, better career history, and better network determined hiring opportunities, going forward "can I use the same AI as the evaluating AI" may also become competitiveness. It is not the person with superior capabilities but the person who better matched the expression style preferred by the evaluation model who passes first. Applicants cannot know what AI a company uses to evaluate resumes. Nevertheless, the market may tilt toward resume writing methods matched to specific LLM styles.

Lock-in Effect: Stylistic Fixation and Reduced Diversity
 If a specific evaluating LLM is widely used in the hiring market, applicants will try to write resumes to match the style preferred by that LLM. Over time, the standard of "a good resume" becomes not actually a good resume but the style of a resume that a specific LLM naturally generates and prefers. Then the style of the entire market will gradually standardize, and unique or distinctive human expression methods may be disadvantaged.

This problem does not end at hiring. If evaluating AI in paper review prefers AI-style papers, moderation AI in content platforms judges AI-style expressions as safer or higher quality, and in educational evaluation students' writing receives better scores the more it resembles the evaluating AI's style, the entire society's expression methods may converge toward styles AI prefers. As a result, there is risk that human divergence, imperfection, regionality, individuality, and indirect thinking will diminish. This extends beyond hiring fairness to the problem of "how can human expressive diversity survive in a world where AI evaluates."

Technical Solutions: Bias Is a 'Controllable Variable,' Not 'Instinct'
The paper's authors did not view this bias only as a model structural defect but empirically demonstrated that it can be sufficiently mitigated through operational strategies.

  • System prompt intervention: Simply providing explicit instructions to the evaluator to "ignore whether the resume was written by AI and focus only on content quality" was sufficient to reduce bias substantially.
  • Majority Voting Ensemble: Rather than entrusting to one large model with strong bias, involving small models with low self-awareness capabilities in the evaluation process to make decisions. This approach showed the effect of reducing self-preference bias by more than 60% in major models such as GPT-4o and LLaMA.

Is the Talent of the Future Determined Not by 'Ability' but by 'AI Optimization'?
If with the same capabilities "whether you used the same model as the company's evaluating AI" or "whether you have the style AI finds familiar" determines outcomes, this is a new form of technological power that undermines the principle of merit-based hiring.
AI fairness discussions are now expanding beyond correcting data biases to interactive biases arising at the intersection where AI meets AI. Companies must establish independent audit systems capable of verifying the neutrality of evaluation tools, and hiring platforms must strengthen transparency about AI usage. This paper poses the question of how to monitor the rules of a world created by AI, going beyond the era of using AI as a tool.