They may lack the sweep of a novel, the pathos of a play or the beauty of a poem, but the facts and figures collected through the census tell us a great deal about ourselves, and about the generations who came before us.
Just ask Steven Ruggles, a historical demographer at the University of Minnesota who has built a career deciphering census data to trace the history of the family in the Western world. By mining public records, Ruggles can learn how family structures have changed over time: who and when people marry, when they have children, where and how people live, the ways people make a living.
In recent years, easy access to digitised records has turbocharged the work. As director of the Institute for Social Research and Data Innovation at his university, Ruggles launched the world’s largest database linking census records and other historical data in 1993.
Known as IPUMS, the collection tracks US Census data from 1790 to the present, as well as data from more than 100 national statistical agencies around the world, and a variety of other archives.
IPUMS is one of the first databases to track individuals over time and across locations – a goldmine for demographers and many other sorts of scholars, says Ruggles, who appraised the future of historical demography in the 2012 Annual Review of Sociology and, with co-authors in the same journal, detailed in 2018 how demographers are trying to link census records. “That’s what’s really becoming a huge deal in the world of historical demography,” he says.
In 2021 alone, thus far scholars around the world using IPUMS have published papers on maternal health, tobacco control, rental affordability and demography methodology. Ruggles told Knowable why digging into these sorts of data is so tantalising for demographers. This conversation has been edited for length and clarity.
Will small businesses recover from Covid?
The 2020 census has arrived. Here’s why the population count matters.
Why assemble this massive trove of demographic data?
In the 1970s, I was interested in the impact of demographic change on family composition. There was this enormous demographic transition starting in the 19th century, where mortality fell and then fertility fell. The world was transformed by it.
I wanted to study what that shift meant for family living arrangements. Earlier, in the 16th or 17th century, people had lots of kids but married really late. Most people died before their grandchildren were born, so the potential for multigenerational families was sharply constrained. It exploded later on, when “corporate families” – patriarchal family systems based on land or business ownership – became the norm.
Of course, you couldn’t really measure this with the data that existed, which were fragmented – reports on census materials, compiled by demographers working on their own, for individual towns in isolated years. So, I started out working with microsimulations: demographic models that let you construct and build up virtual populations with known behaviour over time, keeping track of all the relationships.
That’s where I started, but I got disillusioned with demographic modelling. You can only take it so far. So, in the 1980s I started working on historical data collection, and I’ve been doing that ever since.
Black-and-white photo of women sitting at rows of desks entering data, surrounded by documents and books.
Keypunch operators enter data for the 1940 US Census. Historical demographers depend on census reports, among other records, to understand population changes over time. Linking records – across generations, or with other data such as military records – opens a wealth of new research possibilities.
What kinds of data do you collect and disseminate?
It’s individual-level US Census data, which at a basic level includes the answers every American family submits for census surveys every 10 years. Individual data from the 1950s onwards are all guarded within the Census Bureau because of confidentiality rules, but earlier reports are publicly accessible. We have recovered and organised access to similar records for a bunch of other countries, too – in all, 109 national statistical agencies. The only big countries we’re missing are Japan and Australia, where we are still trying to persuade them to share.
What information is included in individual-level census data?
It varies. Your typical census collects information about household size and composition, and also about work, occupation, hours worked, weeks worked last year and educational attainment. Often there’s information about housing, too – what kind of plumbing a family has, what their house’s walls and the roof are made of, what they use for cooking fuel, that sort of stuff.
The cool thing is, all of the individuals enumerated in a census are nested into families, so you know relationships among the people – who is married and who is divorced and who is a parent and who is a child. This lets you construct additional variables to track, such as husband’s occupation or mother’s education, which might help you study how economic status relates to fertility, to cite an example.
Thanks to digitisation, we now have easy access to these data spanning all of US history. What we’re trying to do now is link it all together – to trace people across generations and across their lives and see what happened to them. That’s a tremendously exciting new development.
A staff member at the Central Bureau of Statistics in Khartoum helped IPUMS, the world’s largest collection of integrated population data, identify tapes storing data from the 1973 census of Sudan. IPUMS eventually managed to read most of the data on the tapes, but some were lost.
What does “linking data” mean?
We’re talking about following individuals, through their census data, across their entire lives. Linking them to their parents, and then on to their parents throughout their lives – over many generations. We can follow the individuals in other sources, too, including administrative, Social Security and military records, as well as all kinds of smaller data sets from other parts of society, such as a company that collects extremely rich data on its employees.
When we first attempted record linkage in the early 2000s, we thought, well, we can’t link everybody, we’re just going to try to get a representative set. But now we’re going to try to link everybody. It will take me through the end of my career.
Is the process automated, or does somebody have to go through and tag everything?
It’s got to be automated, because we’re going to have a billion records. We don’t have that many research assistants!
The US National Archives stores microfilm and old movies at remote sites including this cave in Lenexa, Kansas (cave wall visible behind workstation). Here, officials stored data from the 1960 census at 30 degrees Fahrenheit, on shrink-wrapped pallets. Researchers from the survey and records database IPUMS visited the cave when they restored lost data from the 1960 census.
- A Knowable Magazine report