A map of my readers

By Yanina Bellini Saibene

November 13, 2022

Generating content is time-consuming and making it bilingual takes even more time. So with the intention of using my time in the best possible way, I started asking myself if the effort of writing in Spanish and English is something that is useful to other people, besides me.

Website statistics

So the first thing that came to my mind is to start getting statistics on where people visit my site and what content they consume the most.

There are many options for obtaining this type of statistics, from Google Analytics to Pausible.io, and most of them can give you a list of countries with the number of users that visited your site.

My dataset has 3 variables: Position (the ranking), Country (the name), Users (total number). With that info and rstats I can make a map that show me from where are people reading what I wrote.

The Map

I use the following packages:

  • {tidyverse} for reading the CSV file with the data (read_csv), to wrangling the data (anti_join, mutate, case_when, %>%), and to draw the map (ggplot).
  • {rnaturalearth} for access to the polygons of the countries of the world (ne_countries).
  • {viridis} for the color scale of the map.
  • {ggthemes} to apply the theme_map() to our map.

First step. Load the packages and read the data


library(tidyverse)
library(rnaturalearth)
library(viridis)
library(ggthemes)

stats_webpage <- read_csv("stats_personal_webpage.csv")
mapa <- ne_countries(returnclass = "sf")

Second step. Add the info to the map

The countries' data have several variables with different version of the names of the nations, some of which are according to standards. We need to check if we can join the data we read from the CSV (now stored in stats_webpage) with the data with the polygons of the countries (now stored in mapa) by some of these variables.

I use the anti_join function because returns the rows in one dataset that are not in the other.


paises <- stats_webpage %>%
  anti_join(mapa, by = c("Country" = "brk_name")) %>%
  select(Country)

In the example I use the variable brk_name to do an anti_join with the variable Country. In this particular case, this function returns the names of the countries that are in the user data but are not in the country polygons.

The results show two countries can’t be join: Czechia and Dominican Republic. With this information, we can check how the names are written in both variables used in the join, and we can update the data using a mutate and a case_when so they can be joined.


stats_webpage <- stats_webpage %>%
    mutate(Country = case_when(
    Country == 'Czechia'	~ 'Czech Rep.', 
    Country == 'Dominican Republic'	~ 'Dominican Rep.', 
    TRUE ~ Country))
    

Now that we have all the user data associated to the country polygon we can make the map using ggplot:

  • The function geom_sf allow ggplot to make maps using vector data, in this case polygons.
  • The function scale_fill_viridis provide us a scale to paint the map.
  • We use theme_map() and theme() to get a clean minimal map and remove the legend.

ggplot() +
  geom_sf(data = mapa, aes(fill = Users), color = "grey95", size = 0.01) +
  scale_fill_viridis(
    trans = "log",
    breaks = scales::log_breaks(),                 
    direction = -1,
    option = "G") +
  theme_map() +
  theme(legend.position = "none") +
  coord_sf(ylim = c(-55, 90))
  

And here is the final result, a map with all the readers' countries painted by the number of users accessing my website.

We can save the map using ggsave


  ggsave("users_per_countries.png", dpi = 300, width = 10, height = 6)

The resulting map, besides astonishing me by the list of 74 countries from which at least one person has read my content, convinced me that the effort of trying to generate bilingual content can help more people.

What about you? Do you write some blog post?, Do you have some maps from your users?

Posted on:
November 13, 2022
Length:
3 minute read, 629 words
Tags:
rstats Maps English
See Also:
The stories behind your community numbers
Unlocking Insights from LatinR. Collaboration and Innovation in Data Science
Reproducible Open Science by and for All