Enabling Academic Research with Real-World Data Sets from Melissa for Education

By:   |   Updated: 2024-01-09   |   Comments   |   Related: > Data Quality Services


Problem

"It is a capital mistake to theorize before one has data." – this quote by Sherlock Holmes resonates quite well in our modern data-driven society. There are tons of data all around us, but few are actionable, curated, or prepared for use. Real-world data is expensive – it takes time to gather, shape, use, and present it. Then, when it comes to understanding the hidden nuances of different data, you can easily find yourself needing to invest substantial resources to prepare a reliable data set.

Solution

Economics Research

Megan Cui, a student at Harvard University, decided to study the effects of federal grant money in the state of Ohio. Her thesis examined how successful local American Job Centers financed by federal grants were in decreasing unemployment rates. The American Job Centers are where individuals can get help with unemployment, get their resume updated, or upskill to get employed. The study was mainly concerned with comparing unemployment rates before and after the federal grant distribution. To do the comparison, Megan needed reliable, accurate, and curated residential data for the state of Ohio.

After Melissa for Education promptly responded to her request, Megan could overlay the already available locations of the American Job Centers and the detailed parcel population data along with the latitude and longitude provided by Melissa. As a result, Megan could obtain clusters of the population and segment them according to their distance from a Job Center and compare the pre- and post-financing unemployment rates. After submitting her thesis, Megan revealed: "While the data was itself already amazing, the most amazing part about Melissa is the team behind the education program they were so encouraging, responsive, and excited about making sure I had everything I needed to explore the effects of American employment policies on local labor market outcomes. I can't wait to work with them again on future research projects."

If you have a project, thesis, or just need a real-world, ready-to-use population dataset enriched with geographical data, contact Melissa for Education for assistance.

Cyber Security Research

In the field of cyber security, having real data to train a machine learning model to recognize or segment users according to their location can be advantageous in discerning real from automated users. For example, Melissa for Education can not only provide geo data for US and Canada, including population numbers but also GIS city data. GIS (short for Geographic Information System) data is a form of geospatial technology that enables solutions for geographical analysis based on coordinates, locations, and areas.

Using a combination of geographical and GIS data, powerful visualizations can be obtained. For example, let's consider a map of cities in a state with their areas. Combined with IP geo-located data, the real-time location of visitors on a website can be determined and visualized. GIS data contains accurate definitions of the areas of cities and, therefore, can be used to implement triggers in case an object goes in or out of the area perimeter of a specific geographical object. Melissa for Education provides GIS data in industry-standard formats such as CSV, GDB, and GeoJSON. Researchers can plug these data in specific software or use its tabular form to merge with other datasets. Training machine learning models based on tabular data becomes possible this way.

Do you have a cyber security program at your university, or are you looking to conduct OSINT research? Melissa can assist with data for your projects. You can get started on the Melissa for Education portal.

Medical Research

Today, medical researchers have access to tons of data related to human physiology. Linking these data to external factors such as geographic or demographic data can be beneficial in enabling breakthrough discoveries. However, connecting these two types of data is not always possible. The latter type may not be available or in a shape and form that is difficult to consume.

In 2022, the Anschutz Medical Campus at the University of Colorado published a research paper studying the impact of elevation on microtia in newborns. While the researchers had already accumulated the patient health profile data, they did not have available elevation data for the patients. This is where Melissa for Education was again able to help. Melissa's Global Address Database contains master address data, including latitude, longitude, and elevation. The data is usually exposed via a programming interface with a paid subscription. However, Melissa for Education provided elevation data for 469,067 points of interest to the University of Colorado.

Figure 1: Correlating elevation and patient data

Figure 1: Correlating elevation and patient data

Health Data Compass, the organization that curates the medical data for the University of Colorado, was able to ingest the data from Melissa via the standard ETL (Extract, Transform, Load) process. Then, Compass provided a consolidated dataset to the medical researchers containing the combined patient data and elevation data.

The researchers could easily query patient data based on the elevation profile of specific clinics or the altitude of a patient's residence. Then, the results were clustered into different groups of one altitude versus another, enabling the researcher to correlate disease prevalence with altitude. As a result, the study delivered more context on previously known data and more confidence in the conclusive findings.

The Melissa for Education data specialists will be happy to assist other similar research efforts. They can provide you with data that can be used directly or be ingested in a university data warehouse.

The Melissa Experience

The Platform

The Melissa for Education program is designed to empower successful research cases. Let us examine the core features of the Melissa for Education platform that helps to deliver world-class research.

The platform launched in early 2023, and as of December, it offers many different readily available datasets. For example, for the United States, address data, FONE data, consumer, ZIP code, and deed and assessment data are available in industry-standard formats such as CSV. The platform offers Canadian and US geo data. Regarding robust GIS data, Melissa can offer, among others, GIS city, county, and ZIP code data.

Getting started with the Melissa for Education platform is straightforward. Head over to the portal and sign up with your college email account. Validating your .edu account is Melissa's way of ascertaining your affiliation with a higher education institution.

At sign-up, three account options exist: Student, Student Plus, and Instructor. By default when signing up, you get a Student account, which provides access to a lot of data sets with a limited number of records for local areas, including Rancho Santa Margarita, Orange County, and Los Angeles. To upgrade to Student Plus, you need to complete a short survey describing your research project needs. In return, you will get access to a more extensive set of records and a pack of 4K web API calls. The highest tier is the Instructor account, which provides access to a multitude of customized data sets. To sign up as an Instructor from an existing Student profile, you must reach out to Melissa and discuss your specific project needs.

Once you have the right type of account ready, you can download the data set as archived CSV files or another format (depending on the data set). If you are looking for an international data set, Melissa can help you too. International data is available, for example, for Europe. Melissa for Education will gladly support international students as well.

Figure 2: Melissa for Education homepage

Figure 2: Melissa for Education homepage

Once signed up, there are two ways to explore the data sets: Lookups and the Melissa API. Melissa Lookups is a user-friendly set of web forms allowing you to perform:

  • Global Address Check
  • Lookup Property Information
  • Business or Residential
  • Verify and Append Consumer Data
  • ZIP Codes by County and City
  • Identity Verification
  • Verify SSN and Name
  • Maps for County, ZIP code, Carrier route, Parcels, Building footprints, etc.
  • And more

Input data is provided one record at a time. All the lookup forms are listed on this page. Pick a form and try the service. Make sure to scroll down the page; more than 50 lookup forms are available over seven different categories.

Figure 3: Melissa Lookups

Figure 3: Melissa Lookups

Performing a lookup requires a certain number of credits. After signing up, you can claim 1000 free credits. The cost per lookup varies based on the service, ranging from 1 to 36 credits per lookup.

The second way to explore the datasets is the Melissa Cloud API, which is available once your Student or Student Plus account is active. With the API, you can perform multiple checks and data extractions programmatically and efficiently for multiple records. For example, you can provide the full name of a person with some address detail. If this is a real and known individual, the Melissa Cloud API can respond with their current data or indicate the probability of a real person. A similar flow is available for business entities, too. Some technical knowledge is required to take full advantage of the API service. Getting started is easy by using this page.

Figure 4: The Melissa developer portal

Figure 4: The Melissa developer portal

Datathons

Melissa's support for education does not end with only providing data to students. The Melissa team is actively involved with setting up datathons (a hackathon focused on data) and supporting universities that host such events. In April 2023, Melissa sponsored the University of California Irvine (UCI) datathon, where they provided challenges about address lookup and address interpolation.

Figure 5: UCI April 2023 Datathon, sponsored by Melissa for Education

Figure 5: UCI April 2023 Datathon, sponsored by Melissa for Education

The motivation behind organizing a datathon is to bring perspective to students. For example, it presents challenges based on real-world business problems and motivates students to apply their skills using different data science algorithms. Matt Eimers (Math ‘24), a student from UCI who won Melissa's challenge in the 2023 Datathon, shared: "The Melissa team was very informative and helpful when completing the datathon. It was my first datathon ever, and I attended all the workshops they had at the event, and they were able to help me learn more about what it's like being in data science. With their workshops, I was able to complete their dataset challenge."

Members of the faculty also admired the initiative. David Van Vranken, Associate Dean at UCI, stated: "Melissa has been an active partner of UCI for nearly a decade, providing insight and expertise in data science to faculty and students. Their hands-on involvement in and sponsorship of our Embark datathon helps give participants an up-close look at the power of data and the endless possibilities it brings to myriad situations. Melissa's new education portal is a welcome and much-needed resource that will enable students to learn and grow their data skills in preparation for their future endeavors."

Of course, this success would not be possible without the dedicated team behind Melissa for Education: Daniel Kha Le, Ha Phan, and Anna Cheong. Beyond the hackathon, Daniel, Ha, and Anna are also mentors at UCI. Using their valuable real-world data expertise, they assist students in the challenging journey of academic research. According to Daniel, if students are better aware that certain data are available, they may take a different path to widen and deepen their research. "This is exactly what we want to change in today's world: real, impactful data could be available to students", says Daniel.

Overall, the UCI 2023 Datathon was a resounding success. The way forward for Melissa for Education is already mapped out: data is the new currency, and more data equals more opportunities to extract value.

If you want to organize a similar event, such as your own data hackathon, Melissa for Education would love to support you in getting started with data and event specifics.

A World of Verifiable Analytics

Answering Your Data Needs

Melissa has always been committed to supporting all data professionals, whether a beginner or seasoned expert. Providing professional data quality services is Melissa's core business. Therefore, the datasets provided are compatible with common database platforms, such as Microsoft SQL Server.

For example, since 2008, SQL Server has supported the "geometry" data type designed to represent data in a Euclidean (flat) coordinate system. The GIS data sets that Melissa can provide, for example, the GIS County data set, have a specific column called ShapeWKT. This column contains a text representation in the well-known text format of vector geometry objects. This format is directly compatible with the SQL Server geometry data type. Therefore, the Melissa curated data sets could easily be imported to your database for further spatial analysis. The MSSQLTips.com community has an excellent collection of articles to help get you started. The Microsoft documentation for spatial data is also available. You can use the Melissa GIS City Data or County Data to get sample geospatial data and import it into your database.

The Right Solution for Any Sector

Melissa is a global intelligence and analytics company. As such, they supported a Bloomberg case study from 2022 dedicated to analyzing pre- and post-pandemic fluctuations in the Manhattan population. Melissa used the U.S. Postal Service change-of-address records to curate a detailed data set on address change. Then, they provided this data set to Bloomberg journalist Sarah Holder for detailed analysis. In short, the Melissa data proved the theory that more people are moving (back) to Manhattan now than before the pandemic. While logical, this conclusion is hardly visible or easy to prove without the supporting data.

In another case study, journalists examined how prone Americans are to move to areas considered to have high fire risk. The study found that people underestimate the fire risk a property can be subject to, but also that fires are burning in places where they have never burned before. To formulate these theories, journalists came up with all available fire hazards linked to Census Blocks, then tracts, for years 2010 to 2021, counting the number of times they have burned in that period. Melissa looked up the addresses in those blocks and other information to find trends and groupings. Then, distinct households who moved from where to where during the Pandemic year were counted, and if those locations were part of burn areas.

The added value of Melissa's data is two-pronged: 1) data quality by default and 2) having the data in a shape and form close to plug-and-play. From there, journalists and researchers can use different tools and methodologies to prove or disprove certain theories.

There is no limit to your data success story getting the support it deserves from Melissa. Whether you want to augment real estate, financial, health care, or insurance data, Melissa has the right data set for you. Check out more additional examples on this case study page, and feel free to reach out to contact Melissa directly.

Next Steps
  • Get started - https://www.melissa.com/education/ for your next project, research, thesis, dissertation, etc.
  • Reach out to Melissa to start a datathon at your university.
  • Share this program with your instructors at your university.
About the author
MSSQLTips author Hristo Hristov Hristo Hristov is a Data Scientist and Power Platform engineer with more than 12 years of experience. Between 2009 and 2016 he was a web engineering consultant working on projects for local and international clients. Since 2017, he has been working for Atlas Copco Airpower in Flanders, Belgium where he has tackled successfully multiple end-to-end digital transformation challenges. His focus is delivering advanced solutions in the analytics domain with predominantly Azure cloud technologies and Python. Hristo's real passion is predictive analytics and statistical analysis. He holds a masters degree in Data Science and multiple Microsoft certifications covering SQL Server, Power BI, Azure Data Factory and related technologies.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2024-01-09

Comments For This Article