Data extraction and transformation for the Florianopolis, State in Brazil¶
This notebook is part of the project "New Schools Location in Brazil" and it is focused on the data extraction and transformation for Florianopolis in Brazil.
Step 0. Import libraries and setup environment¶
Step 1. Define Area of Interest (AoI)¶
For this, we will:
- Download a administrative limits for Brazil, and get the multi-level admistrative limits of Florianopolis usig the geoBR library.
- Read schools data downloaded using the R package CensoBr.
code_state | abbrev_state | name_state | code_region | name_region | geometry | |
---|---|---|---|---|---|---|
0 | 11.0 | RO | Rondônia | 1.0 | Norte | MULTIPOLYGON (((-65.38150 -10.42907, -65.37916... |
1 | 12.0 | AC | Acre | 1.0 | Norte | MULTIPOLYGON (((-71.07772 -9.82774, -71.08554 ... |
2 | 13.0 | AM | Amazônas | 1.0 | Norte | MULTIPOLYGON (((-69.83766 -3.68659, -69.71230 ... |
3 | 14.0 | RR | Roraima | 1.0 | Norte | MULTIPOLYGON (((-63.96008 2.47313, -63.96041 2... |
4 | 15.0 | PA | Pará | 1.0 | Norte | MULTIPOLYGON (((-51.43248 -0.47334, -51.42843 ... |
Let's visualize the Santa Catarina State location in Brazil
Let's see the number of units and visualize each administrative level
Number of units | |
---|---|
Intermediate regions | 6 |
Meso regions | 7 |
Micro regions | 20 |
Municipalities | 295 |
Since Florianpolis is a municipality its size is perfect to work directly with UrbanPy.
Para we are doing the analysis at the Micro regions level, it could be interesting to analyze the Micro region that contains Florianopolis.
Step 2. Extract and process Census Sociodemographic geographical data¶
Step 2.1. Extracting Census Tracts Data from IBGE¶
Census Tract's Geometries are obtained in this notebook with the GeoBR
Python library
Number of census tracts in Florianopolis: 651
code_tract | zone | code_muni | name_muni | name_neighborhood | code_neighborhood | code_subdistrict | name_subdistrict | code_district | name_district | code_state | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1894 | 420540705000180 | URBANO | 4205407.0 | Florianópolis | Saco Dos Limões | 4.205407e+11 | 4.205407e+10 | NaN | 420540705.0 | Florianópolis | 42.0 | POLYGON ((-48.53268 -27.59787, -48.52997 -27.5... |
1895 | 420540705000181 | URBANO | 4205407.0 | Florianópolis | Córrego Grande | 4.205407e+11 | 4.205407e+10 | NaN | 420540705.0 | Florianópolis | 42.0 | POLYGON ((-48.51517 -27.60025, -48.51372 -27.6... |
1896 | 420540705000182 | URBANO | 4205407.0 | Florianópolis | Córrego Grande | 4.205407e+11 | 4.205407e+10 | NaN | 420540705.0 | Florianópolis | 42.0 | POLYGON ((-48.51218 -27.60001, -48.51204 -27.6... |
1897 | 420540705000183 | URBANO | 4205407.0 | Florianópolis | Córrego Grande | 4.205407e+11 | 4.205407e+10 | NaN | 420540705.0 | Florianópolis | 42.0 | POLYGON ((-48.51045 -27.59965, -48.51124 -27.6... |
1898 | 420540705000184 | URBANO | 4205407.0 | Florianópolis | Córrego Grande | 4.205407e+11 | 4.205407e+10 | NaN | 420540705.0 | Florianópolis | 42.0 | POLYGON ((-48.51005 -27.59963, -48.50912 -27.5... |
Let's visualize the census tracts in Florianopolis to get a sense of their size and granularity
We can clearly visualize that the census tracts differ in size depeding on the geographical location. For example, near city centers, the census tracts are smaller, while in the peri-urban areas, the census tracts are bigger.
There are some census tracts that seems to be on the ocean, we should handle them.
Now, we will read a dataset generated with the CensoBR
R package here. In this dataset we have census variables such as Income and Population by Age that are useful to identify areas where schools are needed.
Number of census tracts obtainer in R for Santa Catarina State: 308993
code_tract | V002 | V003 | pessoa01_V021 | pessoa01_V022 | pessoa01_V023 | pessoa01_V024 | pessoa01_V025 | pessoa01_V026 | pessoa01_V027 | ... | pessoa01_V044 | pessoa01_V045 | pessoa01_V046 | pessoa01_V047 | pessoa01_V048 | pessoa01_V049 | pessoa01_V050 | pessoa01_V051 | pessoa01_V052 | income_pc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 354660305000050 | 7 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 354660305000051 | 2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 354670205000001 | 569 | 495793.0 | 14.0 | 13.0 | 4.0 | 7.0 | 9.0 | 8.0 | 8.0 | ... | 11.0 | 5.0 | 6.0 | 8.0 | 11.0 | 10.0 | 4.0 | 10.0 | 3.0 | 871.340949 |
4 | 354670205000002 | 422 | 330949.0 | 9.0 | 6.0 | 10.0 | 7.0 | 7.0 | 8.0 | 10.0 | ... | 4.0 | 3.0 | 4.0 | 10.0 | 4.0 | 8.0 | 4.0 | 2.0 | 3.0 | 784.239336 |
5 | 354670205000003 | 722 | 570807.0 | 15.0 | 10.0 | 10.0 | 12.0 | 18.0 | 14.0 | 17.0 | ... | 8.0 | 10.0 | 14.0 | 11.0 | 4.0 | 11.0 | 14.0 | 9.0 | 6.0 | 790.591413 |
5 rows × 36 columns
Type of code_tract in para_cntr: int64 Type of code_tract in censobr_tract_income: object
Number of census tracts with associated data: 11858
Now we will aggregate population by age variables according to the brazilian school levels:
- Infantil: 3 months to 5 years old
- Elementary: 6 to 14 years old
- Middle: 15 to 17 years old
- High: 18 years old
Step 2.2. Download Meta's High Resolution Population Density¶
Let's see all the available datasets for Brazil in Humanitarian Data Exchange (HDX).
created | name | population | size_mb | url | |
---|---|---|---|---|---|
id | |||||
0 | 2019-06-11 | BRA_children_under_five_2019-06-01_csv.zip | Children (ages 0-5) | 159.14 | https://data.humdata.org/dataset/c17003d1-47f4... |
1 | 2019-06-11 | BRA_elderly_60_plus_2019-06-01_csv.zip | Elderly (ages 60+) | 159.42 | https://data.humdata.org/dataset/c17003d1-47f4... |
2 | 2019-06-11 | BRA_men_2019-06-01_csv.zip | Men | 160.20 | https://data.humdata.org/dataset/c17003d1-47f4... |
26 | 2019-06-11 | BRA_women_2019-06-01_csv.zip | Women | 160.21 | https://data.humdata.org/dataset/c17003d1-47f4... |
27 | 2019-06-11 | BRA_women_of_reproductive_age_15_49_2019-06-01... | Women of reproductive age (ages 15-49) | 160.03 | https://data.humdata.org/dataset/c17003d1-47f4... |
28 | 2019-06-11 | BRA_youth_15_24_2019-06-01_csv.zip | Youth (ages 15-24) | 159.79 | https://data.humdata.org/dataset/c17003d1-47f4... |
33 | 2019-06-25 | population_bra_northeast_2018-10-01.csv..zip | Overall population density | 71.24 | https://data.humdata.org/dataset/c17003d1-47f4... |
34 | 2019-06-25 | population_bra_northwest_2018-10-01.csv.zip | Overall population density | 13.46 | https://data.humdata.org/dataset/c17003d1-47f4... |
35 | 2019-06-25 | population_bra_southeast_2018-10-01.csv.zip | Overall population density | 159.42 | https://data.humdata.org/dataset/c17003d1-47f4... |
36 | 2019-06-25 | population_bra_southwest_2018-10-01.csv.zip | Overall population density | 45.96 | https://data.humdata.org/dataset/c17003d1-47f4... |
We will download the Overall population density dataset from HDX. This dataset contains the population density of Brazil at 30m2 resolution.
CPU times: user 287 ms, sys: 57 ms, total: 344 ms Wall time: 2.58 s
The obtained dataset is simple and contains the following columns:
latitude | longitude | population_2015 | population_2020 | geometry | |
---|---|---|---|---|---|
887751 | -27.853472 | -48.606528 | 0.518908 | 0.582712 | POINT (-48.60653 -27.85347) |
887752 | -27.853472 | -48.606250 | 0.518908 | 0.582712 | POINT (-48.60625 -27.85347) |
887753 | -27.853472 | -48.605972 | 0.518908 | 0.582712 | POINT (-48.60597 -27.85347) |
887754 | -27.853472 | -48.605694 | 0.518908 | 0.582712 | POINT (-48.60569 -27.85347) |
887755 | -27.853472 | -48.605139 | 0.518908 | 0.582712 | POINT (-48.60514 -27.85347) |
Number population data points for Florianopolis: 65086
Step 2.3. Aggregate Population Density in Hexagons (Uniform Spatial Units)¶
CPU times: user 101 ms, sys: 3.75 ms, total: 105 ms Wall time: 300 ms
0
((989, 2), (989, 3))
Step 2.4. Convert Census Tract Level Data to Hexagons Level¶
Census tract level numerical variables like population and income will be divided in hexagons proportionally to the Meta high-resolution population density.
('EPSG:4326', 'EPSG:4674')
hex | geometry | population_2020 | |
---|---|---|---|
0 | 88a91bcd09fffff | POLYGON ((-48.50489 -27.84191, -48.50072 -27.8... | NaN |
1 | 88a91bcd51fffff | POLYGON ((-48.52335 -27.84365, -48.51917 -27.8... | NaN |
2 | 88a91b425bfffff | POLYGON ((-48.46311 -27.69699, -48.45893 -27.6... | 19.249137 |
3 | 88a91b5565fffff | POLYGON ((-48.46642 -27.70402, -48.46224 -27.7... | 9.624569 |
4 | 88a91b4d57fffff | POLYGON ((-48.35799 -27.49173, -48.35381 -27.4... | NaN |
0%| | 0/651 [00:00<?, ?it/s]
hex 0 geometry 0 population_2020 474 pop_3_5_years_adj 474 pop_6_14_years_adj 474 pop_15_17_years_adj 474 pop_18_years_adj 474 V002_adj 474 V003_adj 474 dtype: int64
Compare the total number of census tracts and hexagons in Florianopolis
Census Tracts: 651 --> Hexagons: 989
Step 2.5. Visualize downsampled population data in hexagons¶
Let's see how the population and income data looks like in the hexagons. Visualizing data side by side is a good way to see if the data is consistent.
Another way to check population consistency is to compare population distribution with road network density. The idea is that areas with higher population density should have more roads.
We can clearly observe that the population values mostly are located around the road network. This is a good sign that our downscaling method is working as expected. Also, higher population values correspond to higher density of roads.
Step 2.6. Calculate and visualize income per capita¶
Step 3. Extract and process Schools geographical data¶
Number of schools in Brazil: 222936
abbrev_state | name_muni | code_school | name_school | education_level | education_level_others | admin_category | address | phone_number | government_level | private_school_type | private_government_partnership | regulated_education_council | service_restriction | size | urban | location_type | date_update | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RO | Porto Velho | 11000023 | EEEE ABNAEL MACHADO DE LIMA - CENE | Ensino Fundamental | Atendimento Educacional Especializado | Pública | AVENIDA AMAZONAS, 6492 ZONA LESTE. TIRADENTES.... | (69) 992083054 | Estadual | Não Informado | Não | Não | ESCOLA ATENDE EXCLUSIVAMENTE ALUNOS COM DEFICI... | Entre 51 e 200 matrículas de escolarização | Urbana | A escola não está em área de localização difer... | 2020-10-18 | POINT (-63.85401 -8.75846) |
1 | RO | Porto Velho | 11000040 | EMEIEF PEQUENOS TALENTOS | Educação Infantil | Pública | RUA CAETANO, 3256 PREDIO. CALADINHO. 76808-108... | (69) 32135237 | Municipal | Não Informado | Não | Sim | ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... | Entre 201 e 500 matrículas de escolarização | Urbana | A escola não está em área de localização difer... | 2020-10-18 | POINT (-63.88392 -8.79373) | |
2 | RO | Porto Velho | 11000058 | CENTRO DE ENSINO CLASSE A | Educação Infantil, Ensino Fundamental, Ensino ... | Privada | AVENIDA CARLOS GOMES, 1135 CENTRO. 76801-123 P... | (69) 32244473 | Privada | Particular | Não | Sim | ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... | Mais de 1000 matrículas de escolarização | Urbana | A escola não está em área de localização difer... | 2020-10-18 | POINT (-63.90199 -8.76073) | |
3 | RO | Porto Velho | 11000082 | CENTRO EDUCACIONAL PRESBITERIANO 15 DE NOVEMBRO | Educação Infantil, Ensino Fundamental | Privada | RUA ALMIRANTE BARROSO, 1483 SANTA BARBARA. 768... | (69) 32245636 | Privada | Particular | Não | Sim | ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... | Entre 51 e 200 matrículas de escolarização | Urbana | A escola não está em área de localização difer... | 2020-10-18 | POINT (-63.89618 -8.76520) | |
4 | RO | Porto Velho | 11000104 | CENTRO EDUC CORA CORALINA | Educação Infantil, Ensino Fundamental | Privada | RUA MEXICO, 1056 NOVA PORTO VELHO. 76820-190 P... | (69) 32252616 | Privada | Particular | Não | Sim | ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... | Entre 501 e 1000 matrículas de escolarização | Urbana | A escola não está em área de localização difer... | 2020-10-18 | POINT (-63.87547 -8.76863) |
Step 3.1. Aggregate categorical variables in hexagons¶
To work with categorical data we have to convert from text to numerical values. To do this we will create one column for each class in the caterogical variable and fill with 1 if the school belongs to that class and 0 otherwise. This process is known as One or Multi-hot encoding.
Example of education level columns assigned to the schools:
ensino_fundamental | educacao_infantil | ensino_medio | educacao_jovens_adultos | educacao_profissional | no_specified | |
---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 1 | 0 | 0 | 0 | 0 |
2 | 1 | 1 | 1 | 0 | 0 | 0 |
3 | 1 | 1 | 0 | 0 | 0 | 0 |
4 | 1 | 1 | 0 | 0 | 0 | 0 |
Example of administrative category columns assigned to the schools:
publica | privada | |
---|---|---|
0 | 1 | 0 |
1 | 1 | 0 |
2 | 0 | 1 |
3 | 0 | 1 |
4 | 0 | 1 |
(7213, 27)
((351, 27), (351,))
Once aggregated the categorical variables, we can visualize the number of schools per hexagon. We will overlay the schools points to see if the aggregation is working as expected.
Schools distribution seems to be consistent with the population distribution. This is a good sign that our aggregation method is working as expected.
Step 3.2. Aggregate numerical variables in hexagons¶
In this official dataset provided by IBGE, we don't have the enrollment data. We will use the enrollment data from the a previous schools dataset to estimate the number of students per school and hexagon.
Restrição de Atendimento | Escola | UF | Município | Localização | Categoria Administrativa | Dependência Administrativa | Categoria Escola Privada | Conveniada Poder Público | Regulamentação pelo Conselho de Educação | Porte da Escola | Etapas e Modalidade de Ensino Oferecidas | Outras Ofertas Educacionais | LAT_FINAL | LONG_FINAL | Matriculas totales | Matrículas oficiales | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ESCOLA ATENDE EXCLUSIVAMENTE ALUNOS COM DEFICI... | EEEE ABNAEL MACHADO DE LIMA - CENE | RO | Porto Velho | Urbana | Pública | Estadual | Não Informado | Não | Não | Entre 51 e 200 matrículas de escolarização | Ensino Fundamental | Atendimento Educacional Especializado | -8.758459 | -63.854011 | 159.0 | 94.0 |
1 | ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... | CENTRO DE ENSINO CLASSE A | RO | Porto Velho | Urbana | Privada | Privada | Particular | Não | Sim | Mais de 1000 matrículas de escolarização | Educação Infantil, Ensino Fundamental, Ensino ... | NaN | -8.760734 | -63.901986 | 1433.0 | 1433.0 |
2 | ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... | CENTRO EDUCACIONAL PRESBITERIANO 15 DE NOVEMBRO | RO | Porto Velho | Urbana | Privada | Privada | Particular | Não | Sim | Entre 51 e 200 matrículas de escolarização | Educação Infantil, Ensino Fundamental | NaN | -8.765205 | -63.896177 | 59.0 | 59.0 |
3 | ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... | CENTRO EDUC CORA CORALINA | RO | Porto Velho | Urbana | Privada | Privada | Particular | Não | Sim | Entre 501 e 1000 matrículas de escolarização | Educação Infantil, Ensino Fundamental | NaN | -8.768633 | -63.875471 | 636.0 | 636.0 |
4 | ESCOLA EM FUNCIONAMENTO E SEM RESTRIÇÃO DE ATE... | CENTRO EDUCACIONAL MOJUCA | RO | Porto Velho | Urbana | Privada | Privada | Particular | Não | Sim | Entre 201 e 500 matrículas de escolarização | Ensino Fundamental, Ensino Médio | NaN | -8.765028 | -63.891857 | 290.0 | 290.0 |
Separate matriculas by education level to calculate the number of students per hexagon separting by education level and target student population.
ensino_fundamental | educacao_infantil | ensino_medio | educacao_jovens_adultos | educacao_profissional | no_specified | |
---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 1 | 1 | 0 | 0 | 0 |
2 | 1 | 1 | 0 | 0 | 0 | 0 |
3 | 1 | 1 | 0 | 0 | 0 | 0 |
4 | 1 | 0 | 1 | 0 | 0 | 0 |
This maps shows the difference between the number of students and the target student population for each education level.
- Blue hexagons have negative values and represent areas where there are more students than the target population.
- Red hexagons have positive values and represent areas where there are less students than the target population.
Given this we could say that:
- Blue hexagons are city centers were there is less population but have a lot of students that come from other areas.
- Red hexagons are most present in rural areas were there is more target student population that have to travel to other areas to attend school.
- This second group could be in danger of not attending school due to the distance.
Step 4. Assing the hexagon zone (rural or urban)¶
We will use the census track variable to determine the urban area of Pará.
We can see that the urban areas are correctly identified. Considering the satellite image, urban areas are the ones with more buildings and roads.
Summary¶
We have extracted and processed schools and sociodemographic data from multiple source. The final results is a dataset on which each row is a H3 Hexagon. This will allow us to compare and analyze different microregions in Para. The variables obtained are the the following:
Source | Variable name | Type | Description |
---|---|---|---|
Meta's HRSM | population_2020 | Numerical | Total Population |
IBGE's Census* | pop_3_5_years_adj | Numerical | Population between 3 months and 5 years old |
IBGE's Census* | pop_6_14_years_adj | Numerical | Population between 6 and 14 years old |
IBGE's Census* | pop_15_17_years_adj | Numerical | Population between 15 and 17 years old |
IBGE's Census* | pop_18_years_adj | Numerical | Population between 18 years old |
IBGE's Census* | V002_adj | Numerical | Total Population |
IBGE's Census* | V003_adj | Numerical | Income (R$) |
INEP's School Census | ensino_fundamental | Numerical | Ensino Fundamental Schools Count |
INEP's School Census | educacao_infantil | Numerical | Educacao Infantil Schools Count |
INEP's School Census | ensino_medio | Numerical | Ensino Medio Schools Count |
INEP's School Census | educacao_jovens_adultos | Numerical | Educacao Jovens Adultos Schools Count |
INEP's School Census | educacao_profissional | Numerical | Educacao Profissional Schools Count |
INEP's School Census | no_specified | Numerical | Count of schools with unspecified type |
INEP's School Census | publica | Numerical | Public Schools Count |
INEP's School Census | privada | Numerical | Private Schools Count |
Custom School Census** | Matriculas totales | Numerical | Total Enrolled Students |
Custom School Census** | Matrículas oficiales | Numerical | Oficially Enrolled Students |
Custom School Census** | matric_total_[school type] | Numerical | Enrolled Students by School Type |
Custom School Census** | matric_oficial_[school type] | Numerical | Enrolled Students by School Type |
Custom School Census** | diff_pop_matric_total_[school type] | Numerical | Difference between target student population and enrolled students by School Type |
Custom School Census** | rate_pop_matric_total_[school type] | Numerical | Ratio between target student population and enrolled students by School Type |
IBGE's Urban Footprints | urban_area | Categorical | 1=Urban or 0=Rural |
Ours | income_pc | Numerical | Income Per Capita (R$) |
* IBGE's Census: Variables where downscaled proportionally to Meta's high resolution population data.
** Custom School Census: This dataset was provided by IDB Education team in Brazil (need to ask for the source)