Skip to main content
SearchLoginLogin or Signup

About This Project

Details on making the 1919 Texas Rangers Investigation Reports machine-readable.

Published onJul 21, 2021
About This Project

I first learned about the 1919 Texas House of Representatives Committee Report on the investigation into the Texas Rangers for racial violence along the U.S.-Mexico border while reading The Injustice Never Leaves You by scholar Monica Muñoz Martinez in 2020. I learned more about it in the recently released Reverberations of Racial Violence: Critical Reflections on the History of the Border.

When I went to locate the testimony transcripts, I was surprised to find they’ve never been published. In fact, as far as I can tell, they are only available as photographs of the original documents converted into PDF. This makes them virtually impossible to search or show up in any indexed search results.

I tried OCRing and otherwise converting it into machine-readable text, but they turned out poorly.

In Doing Digital History, I discovered Transkibus, which output the best version so far. It even picked up text from the pages beneath the page it was scanning. As no OCR is perfect, it left in unnecessary line breaks and other errant marks that I’m currently working on cleaning up. Those are the versions you see here.

I’m using Sublime Text 3’s regex feature to delete the extra white space and line breaks resulting from the Transkibus OCR. As I’m only just learning regex, it’s taking much longer and is much more tedious than likely needs to be, but that’s the nature of the task when you’re an amateur. I’m uploading new versions without line breaks as I go. There are also a few places where things were garbled and some of the “Q” for Questioner and “A” for Answerer in the testimony has been replaced by a question mark. I’ll have to go through later to identify those and make sure places the number two appears isn’t a second questioner but instead another OCR fail.

Connections
1 of 2
A Commentary on this Pub
A Commentary on this Pub
Comments
0
comment
No comments here
Why not start the discussion?