Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get "Long Answer Candidates" from wikipedia source code. #22

Open
anshkumar opened this issue Oct 16, 2020 · 0 comments
Open

How to get "Long Answer Candidates" from wikipedia source code. #22

anshkumar opened this issue Oct 16, 2020 · 0 comments

Comments

@anshkumar
Copy link

I want to convert the source code of the a wikipedia webpage into the format provided by you in the competition. For an example if we look at this webpage, I want to convert it into following format (as provided by you in the competition):

{
  "example_id": "-1220107454853145579",
  "question_text": "who is the south african high commissioner in london",
  "document_text": "High Commission of South Africa , London - wikipedia <H1> High Commission of South Africa , London </H1> <Table> <Tr> <Th_colspan=\"2\"> High Commission of South Africa in London </Th> </Tr> <Tr> <Td_colspan=\"2\"> </Td> </Tr> <Tr> <Th> Location </Th> <Td> Trafalgar Square , London </Td> </Tr> <Tr> <Th> Address </Th> <Td> Trafalgar Square , London , WC2N 5DP </Td> </Tr> <Tr> <Th> Coordinates </Th> <Td> 51 ° 30 ′ 30 '' N 0 ° 07 ′ 37 '' W  /  51.5082 ° N 0.1269 ° W  / 51.5082 ; - 0.1269 Coordinates : 51 ° 30 ′ 30 '' N 0 ° 07 ′ 37 '' W  /  51.5082 ° N 0.1269 ° W  / 51.5082 ; - 0.1269 </Td> </Tr> <Tr> <Th> High Commissioner </Th> <Td> Vacant </Td> </Tr> </Table> Balcony of South Africa House <P> The High Commission of South Africa in London is the diplomatic mission from South Africa to the United Kingdom . It is located at South Africa House , a building on Trafalgar Square , London . As well as containing the offices of the High Commissioner , the building also hosts the South African consulate . It has been a Grade II * Listed Building since 1982 . </P> <H2> Contents </H2> <Ul> <Li> 1 History </Li> <Li> 2 See also </Li> <Li> 3 References </Li> <Li> 4 External links </Li> </Ul> <H2> History ( edit ) </H2> <P> South Africa House was built by Holland , Hannen & Cubitts in the 1930s on the site of what had been Morley 's Hotel until it was demolished in 1936 . The building was designed by Sir Herbert Baker , with architectural sculpture by Coert Steynberg and Sir Charles Wheeler , and opened in 1933 . The building was acquired by the government of South Africa as its main diplomatic presence in the UK . During World War II , Prime Minister Jan Smuts lived there while conducting South Africa 's war plans . </P> <P> In 1961 , South Africa became a republic , and withdrew from the Commonwealth due to its policy of racial segregation . Accordingly , the building became an Embassy , rather than a High Commission . During the 1980s , the building , which was one of the only South African diplomatic missions in a public area , was targeted by protesters from around the world . During the 1990 Poll Tax Riots , the building was set alight by rioters , although not seriously damaged . </P> <P> The first fully free democratic elections in South Africa were held on the 27 April 1994 , and 4 days later , the country rejoined the Commonwealth , 33 years to the day after it withdrew upon becoming a republic . Along with country 's diplomatic missions in other Commonwealth countries , the mission once again became a High Commission . </P> <P> Today , South Africa House is no longer a controversial site , and is the focal point of South African culture in the UK . South African President Nelson Mandela appeared on the balcony of South Africa House in 1996 , as part of his official UK state visit . In 2001 , Mandela again appeared on the balcony of South Africa House to mark the seventh anniversary of Freedom Day , when the apartheid system was officially abolished . </P> <H2> See also ( edit ) </H2> <Ul> <Li> List of diplomatic missions of South Africa </Li> <Li> High Commission of Canada to the United Kingdom </Li> <Li> High Commission of Uganda , London </Li> </Ul> <H2> References ( edit ) </H2> <Table> <Tr> <Td> </Td> <Td> Wikimedia Commons has media related to South Africa House , London . </Td> </Tr> </Table> <Ol> <Li> ^ Jump up to : `` The London Diplomatic List '' ( PDF ) . 14 December 2013 . Archived from the original ( PDF ) on 11 December 2013 . </Li> <Li> Jump up ^ Historic England . `` Details from listed building database ( 1066238 ) '' . National Heritage List for England . Retrieved 28 September 2015 . </Li> <Li> Jump up ^ Cubitts 1810 -- 1975 , published 1975 </Li> <Li> Jump up ^ `` The east side of Trafalgar Square '' . BHO . Retrieved 22 November 2015 . </Li> <Li> Jump up ^ Palliser , David Michael ; Clark , Peter ; Daunton , Martin J. ( 2000 ) . The Cambridge Urban History of Britain : 1840 -- 1950 . Cambridge University Press . p. 126 . </Li> <Li> ^ Jump up to : South Africa returns to the Commonwealth fold , The Independent , 31 May 1994 </Li> <Li> Jump up ^ Burns , Danny ( 1992 ) . Poll tax rebellion . AK Press . p. 90 . </Li> <Li> Jump up ^ United Kingdom of Great Britain and Northern Ireland , Department of International Relations and Cooperation </Li> <Li> Jump up ^ Hero 's welcome for Mandela at concert . BBC News . April 30 , 2001 . </Li> </Ol> <H2> External links ( edit ) </H2> <Ul> <Li> Official site </Li> </Ul> <Table> <Tr> <Th_colspan=\"2\"> <Ul> <Li> </Li> <Li> </Li> <Li> </Li> </Ul> Diplomatic missions in the United Kingdom </Th> </Tr> <Tr> <Th> Africa </Th> <Td> <Ul> <Li> Algeria </Li> <Li> Angola </Li> <Li> Botswana </Li> <Li> Burundi </Li> <Li> Cameroon </Li> <Li> Democratic Republic of the Congo </Li> <Li> Egypt </Li> <Li> Equatorial Guinea </Li> <Li> Eritrea </Li> <Li> Ethiopia </Li> <Li> Gabon </Li> <Li> The Gambia </Li> <Li> Ghana </Li> <Li> Guinea </Li> <Li> Ivory Coast </Li> <Li> Kenya </Li> <Li> Lesotho </Li> <Li> Liberia </Li> <Li> Libya </Li> <Li> Malawi </Li> <Li> Mauritania </Li> <Li> Mauritius </Li> <Li> Morocco </Li> <Li> Mozambique </Li> <Li> Namibia </Li> <Li> Nigeria </Li> <Li> Rwanda </Li> <Li> Senegal </Li> <Li> Seychelles </Li> <Li> Sierra Leone </Li> <Li> South Africa </Li> <Li> South Sudan </Li> <Li> Sudan </Li> <Li> Swaziland </Li> <Li> Tanzania </Li> <Li> Togo </Li> <Li> Tunisia </Li> <Li> Uganda </Li> <Li> Zambia </Li> <Li> Zimbabwe </Li> </Ul> </Td> </Tr> <Tr> <Th> Americas </Th> <Td> <Ul> <Li> Antigua and Barbuda </Li> <Li> Argentina </Li> <Li> The Bahamas </Li> <Li> Barbados </Li> <Li> Belize </Li> <Li> Bolivia </Li> <Li> Brazil </Li> <Li> Canada </Li> <Li> Chile </Li> <Li> Colombia </Li> <Li> Costa Rica </Li> <Li> Cuba </Li> <Li> Dominica </Li> <Li> Dominican Republic </Li> <Li> Ecuador </Li> <Li> El Salvador </Li> <Li> Grenada </Li> <Li> Guatemala </Li> <Li> Guyana </Li> <Li> Haiti </Li> <Li> Honduras </Li> <Li> Jamaica </Li> <Li> Mexico </Li> <Li> Nicaragua </Li> <Li> Panama </Li> <Li> Paraguay </Li> <Li> Peru </Li> <Li> Saint Kitts and Nevis </Li> <Li> Saint Lucia </Li> <Li> Saint Vincent and the Grenadines </Li> <Li> Trinidad and Tobago </Li> <Li> United States of America </Li> <Li> Uruguay </Li> <Li> Venezuela </Li> </Ul> </Td> </Tr> <Tr> <Th> Asia </Th> <Td> <Ul> <Li> Afghanistan </Li> <Li> Armenia </Li> <Li> Azerbaijan </Li> <Li> Bahrain </Li> <Li> Bangladesh </Li> <Li> Brunei </Li> <Li> Cambodia </Li> <Li> China </Li> <Li> East Timor </Li> <Li> Georgia </Li> <Li> India </Li> <Li> Indonesia </Li> <Li> Iran </Li> <Li> Iraq </Li> <Li> Israel </Li> <Li> Japan </Li> <Li> Jordan </Li> <Li> Kazakhstan </Li> <Li> Kuwait </Li> <Li> Kyrgyzstan </Li> <Li> Laos </Li> <Li> Lebanon </Li> <Li> Malaysia </Li> <Li> Maldives </Li> <Li> Mongolia </Li> <Li> Myanmar </Li> <Li> Nepal </Li> <Li> North Korea </Li> <Li> Oman </Li> <Li> Pakistan </Li> <Li> The Philippines </Li> <Li> Qatar </Li> <Li> Saudi Arabia </Li> <Li> Singapore </Li> <Li> South Korea </Li> <Li> Sri Lanka </Li> <Li> Syria </Li> <Li> Tajikistan </Li> <Li> Thailand </Li> <Li> Turkey </Li> <Li> Turkmenistan </Li> <Li> United Arab Emirates </Li> <Li> Uzbekistan </Li> <Li> Vietnam </Li> <Li> Yemen </Li> </Ul> </Td> </Tr> <Tr> <Th> Europe </Th> <Td> <Ul> <Li> Albania </Li> <Li> Austria </Li> <Li> Belarus </Li> <Li> Belgium </Li> <Li> Bosnia and Herzegovina </Li> <Li> Bulgaria </Li> <Li> Croatia </Li> <Li> Cyprus </Li> <Li> Czech Republic </Li> <Li> Denmark </Li> <Li> Estonia </Li> <Li> Finland </Li> <Li> France </Li> <Li> Germany </Li> <Li> Greece </Li> <Li> Hungary </Li> <Li> Iceland </Li> <Li> Ireland </Li> <Li> Italy </Li> <Li> Kosovo </Li> <Li> Latvia </Li> <Li> Lithuania </Li> <Li> Luxembourg </Li> <Li> Macedonia </Li> <Li> Malta </Li> <Li> Moldova </Li> <Li> Monaco </Li> <Li> Montenegro </Li> <Li> The Netherlands </Li> <Li> Norway </Li> <Li> Poland </Li> <Li> Portugal </Li> <Li> Romania </Li> <Li> Russia </Li> <Li> Serbia </Li> <Li> Slovakia </Li> <Li> Slovenia </Li> <Li> Spain </Li> <Li> Sweden </Li> <Li> Switzerland </Li> <Li> Ukraine </Li> <Li> Vatican City ( Apostolic Nunciature ) </Li> </Ul> </Td> </Tr> <Tr> <Th> Oceania </Th> <Td> <Ul> <Li> Australia </Li> <Li> Fiji </Li> <Li> New Zealand </Li> <Li> Papua New Guinea </Li> <Li> Tonga </Li> </Ul> </Td> </Tr> <Tr> <Th> States with limited recognition </Th> <Td> <Ul> <Li> North Cyprus </Li> <Li> Palestine </Li> <Li> Taiwan </Li> </Ul> </Td> </Tr> <Tr> <Th> De facto independent states </Th> <Td> <Ul> <Li> Somaliland </Li> </Ul> </Td> </Tr> <Tr> <Th> British Overseas Territories </Th> <Td> <Ul> <Li> Anguilla </Li> <Li> Bermuda </Li> <Li> British Virgin Islands </Li> <Li> Cayman Islands </Li> <Li> Falkland Islands </Li> <Li> Gibraltar </Li> <Li> Montserrat </Li> <Li> Saint Helena </Li> <Li> Tristan da Cunha </Li> <Li> Turks and Caicos Islands </Li> </Ul> </Td> </Tr> <Tr> <Th> Other economies with their own representations </Th> <Td> Hong Kong </Td> </Tr> <Tr> <Th> International organisations </Th> <Td> <Ul> <Li> Arab League </Li> <Li> European Union </Li> <Li> International Organisation for Migration </Li> <Li> United Nations <Ul> <Li> UNHCR </Li> <Li> World Food Programme </Li> </Ul> </Li> <Li> World Bank </Li> </Ul> </Td> </Tr> </Table> <Table> <Tr> <Th_colspan=\"3\"> <Ul> <Li> </Li> <Li> </Li> <Li> </Li> </Ul> Trafalgar Square , London </Th> </Tr> <Tr> <Th> Buildings </Th> <Td> <Table> <Tr> <Th> Current </Th> <Td> <Ul> <Li> Clockwise from North : National Gallery </Li> <Li> St Martin - in - the - Fields </Li> <Li> South Africa House </Li> <Li> Drummonds Bank </Li> <Li> Admiralty Arch </Li> <Li> Uganda House <Ul> <Li> Embassy of Burundi </Li> <Li> High Commission of Uganda </Li> </Ul> </Li> <Li> Canadian Pacific building </Li> <Li> Admiralty ( pub ) </Li> <Li> Canada House </Li> </Ul> </Td> </Tr> <Tr> <Th> Former </Th> <Td> <Ul> <Li> Morley 's Hotel </Li> <Li> Northumberland House </Li> <Li> Royal Mews </Li> </Ul> </Td> </Tr> </Table> </Td> <Td> </Td> </Tr> <Tr> <Th> Statues </Th> <Td> <Table> <Tr> <Th> Plinths </Th> <Td> <Ul> <Li> SE : Henry Havelock </Li> <Li> SW : Charles Napier </Li> <Li> NE : George IV </Li> <Li> NW : Fourth plinth </Li> </Ul> </Td> </Tr> <Tr> <Th> Busts </Th> <Td> <Ul> <Li> Lord Beatty </Li> <Li> Lord Jellicoe </Li> <Li> Lord Cunningham </Li> </Ul> </Td> </Tr> <Tr> <Th> Other </Th> <Td> <Ul> <Li> Charles I <Ul> <Li> Charing Cross </Li> </Ul> </Li> <Li> Nelson 's Column </Li> <Li> James II </Li> <Li> George Washington </Li> </Ul> </Td> </Tr> </Table> </Td> </Tr> <Tr> <Th> Adjacent streets </Th> <Td> <Ul> <Li> Charing Cross Road </Li> <Li> Cockspur Street </Li> <Li> Northumberland Avenue </Li> <Li> Strand </Li> <Li> Whitehall </Li> </Ul> </Td> </Tr> <Tr> <Th> People </Th> <Td> <Table> <Tr> <Th> Architects </Th> <Td> <Ul> <Li> Charles Barry </Li> <Li> Norman Foster </Li> <Li> Edwin Lutyens </Li> <Li> John Nash </Li> </Ul> </Td> </Tr> <Tr> <Th> Fourth plinth sculptors </Th> <Td> <Ul> <Li> Elmgreen and Dragset </Li> <Li> Katharina Fritsch <Ul> <Li> Hahn / Cock </Li> </Ul> </Li> <Li> Antony Gormley <Ul> <Li> One & Other </Li> </Ul> </Li> <Li> Marc Quinn </Li> <Li> Thomas Schütte </Li> <Li> Yinka Shonibare </Li> <Li> Mark Wallinger </Li> <Li> Rachel Whiteread </Li> <Li> Bill Woodrow </Li> </Ul> </Td> </Tr> </Table> </Td> </Tr> <Tr> <Th> Events </Th> <Td> <Ul> <Li> Poll Tax Riots </Li> </Ul> </Td> </Tr> <Tr> <Th> Miscellaneous </Th> <Td> <Ul> <Li> Christmas tree </Li> </Ul> </Td> </Tr> <Tr> <Td_colspan=\"3\"> <Ul> <Li> </Li> <Li> Commons </Li> </Ul> </Td> </Tr> </Table> Retrieved from `` https://en.wikipedia.org/w/index.php?title=High_Commission_of_South_Africa,_London&oldid=850142361 '' Categories : <Ul> <Li> Diplomatic missions in London </Li> <Li> Trafalgar Square </Li> <Li> Diplomatic missions of South Africa </Li> <Li> Herbert Baker buildings and structures </Li> <Li> South Africa -- United Kingdom relations </Li> <Li> South Africa and the Commonwealth of Nations </Li> <Li> Grade II * listed buildings in the City of Westminster </Li> <Li> Buildings and structures completed in 1933 </Li> </Ul> <Ul> <Li> </Li> <Li> </Li> </Ul> <H2> </H2> <H3> </H3> <Ul> <Li> </Li> <Li> Talk </Li> <Li> </Li> <Li> </Li> <Li> </Li> </Ul> <H3> </H3> <Ul> <Li> </Li> <Li> </Li> </Ul> <H3> </H3> <Ul> </Ul> <H3> </H3> <Ul> <Li> </Li> <Li> </Li> <Li> </Li> </Ul> <H3> </H3> <Ul> </Ul> <H3> </H3> <H3> </H3> <Ul> <Li> </Li> <Li> Contents </Li> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> </Ul> <H3> </H3> <Ul> <Li> </Li> <Li> About Wikipedia </Li> <Li> </Li> <Li> </Li> <Li> </Li> </Ul> <H3> </H3> <Ul> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> </Ul> <H3> </H3> <Ul> <Li> </Li> <Li> </Li> <Li> </Li> </Ul> <H3> </H3> <Ul> <Li> </Li> </Ul> <H3> </H3> <Ul> <Li> Afrikaans </Li> </Ul> Edit links <Ul> <Li> This page was last edited on 13 July 2018 , at 22 : 10 ( UTC ) . </Li> <Li> </Li> </Ul> <Ul> <Li> </Li> <Li> About Wikipedia </Li> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> <Li> </Li> </Ul> <Ul> <Li> </Li> <Li> </Li> </Ul>",
  "long_answer_candidates": [
    {
      "end_token": 136,
      "start_token": 18,
      "top_level": true
    },
    {
      "end_token": 30,
      "start_token": 19,
      "top_level": false
    },
    {
      "end_token": 45,
      "start_token": 34,
      "top_level": false
    },
    {
      "end_token": 59,
      "start_token": 45,
      "top_level": false
    },
    {
      "end_token": 126,
      "start_token": 59,
      "top_level": false
    },
    {
      "end_token": 135,
      "start_token": 126,
      "top_level": false
    },
    {
      "end_token": 211,
      "start_token": 141,
      "top_level": true
    },
    {
      "end_token": 336,
      "start_token": 240,
      "top_level": true
    },
    {
      "end_token": 425,
      "start_token": 336,
      "top_level": true
    },
    {
      "end_token": 488,
      "start_token": 425,
      "top_level": true
    },
    {
      "end_token": 570,
      "start_token": 488,
      "top_level": true
    }
  ]
}

My main question is, can I get the script to convert the source code of the webpage into "document_text" and "long_answer_candidates" ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant