- Extract journal and conference data from Springer in json format
- Get Springer conference/journal links
- Get LAK journal citations, web version.
- Get Springer metadata by using API.
- Use requests to get Journal of Learning Sciences reference download links.
- Process json data for getting keywords, then process keywords and generate NLP table, keyword frequency table and common keyword table.
- Process bib data, and generate same tables as above.
- Same as above but generate one more article title and keyword table.
- Extract authors' affiliations by educational institutions and non-educational institutions. The output is two json files by ed and non-ed.
- Recommend papers by abstracts using tfidf and consine similarities.
- Get n-gram from abstract and see frequency.
- Get all author names and regulate name data through observation and checking their last name.
- Get unique author names and then generate the relationship graphs.
- many useful functions for finding adhesive words or sentence
[New] regulate_data.py
- Change some author names who use different names in file.
[New] text_process.py
- Process full paper text.
[Update] Create code_keywordanalysis
and code_datacollection
folders and arrange code files.
[New] extract_htmlinfo_tojson.py
- Have conference html files, extract useful info, get json files.
[New] miner_aied_journal_web.py
- Extract paper links from official website, then get journal html files, extract useful info, get json files.
[New] commonkeywords_fromkws.py
- Get common keywords from keyword corpus. Result is in keyword_dict.csv
[New] ngram_model.py
- (1) Get n-gram model from abstract text; (2) Find context of negative words in abstracts. Trigrams are in neg_trigrams.txt
[Update] sentimentanalysis.Rmd
- Get negative words by both nrc and bing lexicons. Negative words from both lexicons have been merged in neg_wordlist.csv
[New] socialnetworkanalysis.Rmd
- Pilot view of social network analysis.
[New] sna_matrixprocessing.py
- The output is a sna table (sna_table.csv
) which is about authors' relations. The SNA graph is also generated by networkx.
[New] authorinfo_toecharts_pre.Rmd
& authorinfo_toecharts.py
- Discover the visualization by Echarts. The mediate output is author_table_echarts.json
and relation_echarts.json
. The final output of these two files is echarts.json
, which can be used in Echarts.
[New] mergejson_summary.py
- Merge all paper info json files into one json_summary.json
[Updated] sentimentanalysis.Rmd
- Sentiment analysis of abstracts.
[Updated] jsontocsv.R
& jsontocsv_summary.R
- Convert JSON file, which contains paper info, to CSV file. The output of jsontocsv.R is a folder with separate csv files. The output of jsontocsv_summary.R is a summary.csv
containing all information in one csv file.
[Updated] springerminer.py
& run.py
- Extract author, abstract, RIS, and more paper information from Springer
[Updated] paperminer.py
- Extract abstracts from IAIED