Using AI to Evaluate Undergraduate Transcripts
By Troy Lowry
I am often approached by people suggesting that AI might improve transcript processing with our CAS service. At LSAC we handle a lot of undergraduate transcripts — more than 300,000 in a typical year. Each of these are carefully looked at and verified, and the grades are then standardized so that they all use the same grading scale and rules. This allows GPAs from different schools to be more accurately compared to each other.
Highly trained transcript coders currently carry out the crucial work of standardizing academic grades. These coders, well-versed in the grading systems of myriad educational institutions, undertake converting these grades into a uniform format. It's an art form requiring acute attention to detail, a deep understanding of various educational norms, and the ability to swiftly adapt to new grading frameworks.
Given the large scale of this operation, the question of leveraging Artificial Intelligence (AI) for faster and perhaps more efficient processing invariably arises. We have tested many trials of both in-house and vendor products that we thought might provide this benefit, but we have yet to find a product that works well.
Many people assume that when schools send transcripts electronically to LSAC, they send them in a text format that's easy for computers to read. That's actually not the case. What LSAC almost always gets is more like a photo or image of the transcript, not the actual text-based data.
To make this image useful if we wish to do automated processing, LSAC has to go through an extra step. We use something called Optical Character Recognition, or OCR for short. This technology scans the image of the transcript and picks out each individual word and number. It's like teaching the computer to read the picture so that the information can be organized and analyzed.
Our tests have found this OCR to be about 98% accurate, meaning that for every 100 letters it scans, 98 of them will be correctly identified. This may seem like a high success rate, but mistakes in transcript evaluations have very real consequences for CAS registrants, and thus 98% is not good enough. To get to 100% we need humans to double check this work. Our testing has found that it is quicker and more accurate for our experts to identify and input all grades themselves than to fix the 10 or so errors that appear on every page of the transcript.
A large problem has to do with identifying grading periods. LSAC does not just record grades from transcripts; we record them by grading period, so admissions offices can easily see trends in applicants’ work. I can say, anecdotally, that over the years I’ve heard many admissions officers say that a strong upward GPA trend can help offset a somewhat lower overall GPA. Without LSAC’s CAS service, these trends would be much more difficult and time consuming for admissions officers to spot.
The problem arises in that there is no standard for how time scales are shown on transcripts. While humans can figure out organization by date on transcripts quite readily, all of our automated tools have had real troubles in this area. We have had some success in having different “templates” by school to guide the automation process, but we find even within schools the way they handle grading periods is not always standard. Here again, we found that it took more time to double check and clean up after the automation than it took to simply have humans handle the task in the first place.
We thought the new generation of Large Language Models (LLMs, not to be confused with the Legum Magister degree in law!) might better be able to understand and handle the time scale issue. So far, our testing has not shown much improvement in this area.
For instance, we get a lot of transcripts from UCLA, a little over 3,500 in the past five years, so we tested these using a generative AI algorithm. We attempted to train the AI by taking the OCR of 3,000 of the transcripts and the resulting data that our transcript experts had input. We then fed in the remaining 500 transcripts to the trained model to see if it could accurately code the transcripts. The results were underwhelming. Much of what the AI produced was nonsensical, and little of it was correct. LLMs often require very large datasets with millions or billions of examples to produce accurate results. Three thousand five hundred just wasn’t enough for even a close approximation.
In short, while it may one day be possible to automate CAS transcript evaluation, for the foreseeable future it relies on human experts who can handle novel situations that AI and other computer automations have not been capable of.