If an Admin enables multilingual document handling (To do so, navigate to Admin --> Settings --> Language & Region Settings ) for the Vault, each document in a multilingual Vault includes the Language field. For PDF and text-based files like HTML or CSV, Vault attempts to assign a language automatically upon import based on the document’s language.
If a PDF file or a text-based file contains two languages, how does Vault decide the language?
The first 1024 characters of the file are used to detect the language. If there is a mix of Japanese and English in those first 1024 characters, the one with the higher probability score is selected.
Vault Documentation: Vault Help