Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

DBPedia automatically extracts data from Wikipedia and may contain links to the municipalities' websites.

The Portuguese language DBPedia extracts data from the Portuguese language Wikipedia, while the English language DBPedia extracts data from the English language Wikipedia. We are going to query them both.

Portuguese language DBPedia

The Portuguese language DBPedia does not use the dbo:country property, so getting Brazilian cities is a little tricky. Here we use having a link to the wiki page "States of Brazil" as a filter for getting only cities located in Brazil, instead.

The use of the foaf:homepage property is rare, so we have to resort to using a dbo:wikiPageExternalLink property in addition to that. Keep in mind that this will pollute the results to other pages which are not the official pages of the municipality, so we need to filter them out somehow. The simplest way of doing that is by using a SPARQL FILTER clause to get only containing .gov.br. Unfortunately, some municipality websites do not conform to that and will be missing in the query.

The following SPARQL query will extract links from the Portuguese language DBPedia:

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX dbp:<http://dbpedia.org/property/>
PREFIX dbr:<http://dbpedia.org/resource/>

SELECT ?city, ?name, ?state, ?link WHERE {
    ?city a dbo:City ;
        dbo:wikiPageWikiLink dbr:States_of_Brazil ;
        dbo:wikiPageExternalLink|foaf:homepage ?link .
    FILTER REGEX(STR(?link), ".gov.br")
    OPTIONAL {?city rdfs:label ?name}
    OPTIONAL {?city dbp:estado ?state}
}

Results in HTML and CSV.

English language DBPedia

This is query is a little more complicated compared to the Portuguese language DBPedia, because while the data is more structured, we cannot get information about the state directly. Other cities have no state information assigned.

At least for filtering by country we can simply use the dbo:country property to determine that a city is located in Brazil.

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX dbr:<http://dbpedia.org/resource/>
PREFIX dbp:<http://dbpedia.org/property/>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX yago:<http://dbpedia.org/class/yago/>

SELECT ?city, ?name, ?state_abbr, ?state_name, ?link, ?external_link WHERE {
    ?city a dbo:City ;
        dbo:country dbr:Brazil .
    OPTIONAL {
        ?city foaf:homepage ?link .
    }
    OPTIONAL {
        FILTER REGEX(STR(?external_link), ".gov.br")
        ?city dbo:wikiPageExternalLink ?external_link .
    }
    OPTIONAL {
        ?city rdfs:label ?name
        FILTER(LANG(?name) = "" || LANGMATCHES(LANG(?name), "pt"))
    }
    OPTIONAL {
        ?city dbo:isPartOf ?state .
        ?state a yago:WikicatStatesOfBrazil .
        ?state dbp:coordinatesRegion ?state_abbr .
    }
    OPTIONAL {
        ?city dbo:isPartOf ?state .
        ?state a yago:WikicatStatesOfBrazil .
        ?state rdfs:label ?state_name .
        FILTER(LANG(?state_name) = "" || LANGMATCHES(LANG(?state_name), "pt"))
    }
    OPTIONAL { # cities linked to a state whose URI has changed
        ?city dbo:isPartOf ?state_old_page .
        ?state_old_page dbo:wikiPageRedirects ?state .
        ?state a yago:WikicatStatesOfBrazil .
        ?state dbp:coordinatesRegion ?state_abbr .
    }
    OPTIONAL { # cities wrongfully linked to a city instead of state
        ?city dbo:isPartOf ?other_city .
        ?other_city dbo:isPartOf ?state .
        ?state a yago:WikicatStatesOfBrazil .
        ?state dbp:coordinatesRegion ?state_abbr .
    }
}

Results in HTML and CSV.