NeoSCA

NeoSCA is a rewrite of L2 Syntactic Complexity Analyzer (L2SCA) which is developed by Xiaofei Lu, with added support for Windows and an improved command-line interface for easier usage. The same as L2SCA, NeoSCA takes written English language samples in plain text format as input, and computes:

the frequency of 9 structures in the text:

words (W)
sentences (S)
verb phrases (VP)
clauses (C)
T-units (T)
dependent clauses (DC)
complex T-units (CT)
coordinate phrases (CP)
complex nominals (CN), and

14 syntactic complexity indices of the text:

mean length of sentence (MLS)
mean length of T-unit (MLT)
mean length of clause (MLC)
clauses per sentence (C/S)
verb phrases per T-unit (VP/T)
clauses per T-unit (C/T)
dependent clauses per clause (DC/C)
dependent clauses per T-unit (DC/T)
T-units per sentence (T/S)
complex T-unit ratio (CT/T)
coordinate phrases per T-unit (CP/T)
coordinate phrases per clause (CP/C)
complex nominals per T-unit (CN/T)
complex nominals per clause (CP/C)

Highlights ^{Top ▲}

Works on Windows/macOS/Linux
Reserves intermediate results, i.e., parsed trees of Stanford Parser and matched subtrees of Stanford Tregex
An improved command-line interface

Install ^{Top ▲}

Install NeoSCA ^{Top ▲}

To install NeoSCA, you need to have Python 3.7 or later installed on your system. You can check if you already have Python installed by running the following command in your terminal:

python --version

If Python is not installed, you can download and install it from Python website. Once you have Python installed, you can install NeoSCA using pip:

pip install neosca

If you are in China and having trouble with slow download speeds or network issues, you can use the Tsinghua University PyPI mirror to install NeoSCA:

pip install neosca -i https://pypi.tuna.tsinghua.edu.cn/simple

Install Dependents ^{Top ▲}

NeoSCA depends on Java, Stanford Parser, and Stanford Tregex. After you have NeoSCA installed, you can use nsca --check-depends to install them. Note that this command requires Administrative privileges if you are on Windows.

Basic Usage ^{Top ▲}

To use NeoSCA, run the nsca command in your terminal, followed by the options and arguments you want to use.

Single Input ^{Top ▲}

To analyze a single text file, use the command nsca followed by the file path.

nsca ./samples/sample1.txt
# frequency output: ./result.csv

A result.csv file will be generated in the current directory. You can specify a different output filename using -o.

nsca ./samples/sample1.txt -o sample1.csv
# frequency output: ./sample1.csv

When analyzing a text file with a filename that includes spaces, it is important to enclose the file path in double quotes. Assume you have a sample 1.txt to analyze:

nsca "./samples/sample 1.txt"

This ensures that the entire filename including the spaces, is interpreted as a single argument. Without the double quotes, the command would interpret "sample" and "1.txt" as two separate arguments and the analysis would fail.

Multiple Input ^{Top ▲}

To analyze multiple text files at once, simply list them after the nsca command.

nsca ./samples/sample1.txt ./samples/sample2.txt

You can also use wildcards to select multiple files at once.

nsca ./samples/sample*.txt     # every file whose name starts with "sample" and ends with ".txt"
nsca ./samples/sample[1-9].txt # sample1.txt, sample2.txt, ..., sample9.txt
nsca ./samples/sample1?.txt    # sample10.txt, sample11.txt, ..., sample19.txt

Advanced Usage ^{Top ▲}

Output Frequencies in Json Format ^{Top ▲}

You can generate a json file by:

nsca ./samples/sample1.txt --output-format json
# frequency output: ./result.json

Or

nsca ./samples/sample1.txt -o sample1.json
# frequency output: ./sample1.json

Pass Text Through the Command Line ^{Top ▲}

If you want to analyze text that is passed directly through the command line, you can use --text followed by the text.

nsca --text 'The quick brown fox jumps over the lazy dog.'
# frequency output: ./result.csv

Reserve Intermediate Results ^{Top ▲}

To reserve the parsed trees, use -p or --reserve-parsed. To reserve matched subtrees, use -m or --reserve-matched.

nsca samples/sample1.txt -p
# frequency output: ./result.csv
# parsed trees:     ./samples/sample1.parsed
nsca samples/sample1.txt -m
# frequency output: ./result.csv
# matched subtrees: ./result_matches/
nsca samples/sample1.txt -p -m
# frequency output: ./result.csv
# parsed trees:     ./samples/sample1.parsed
# matched subtrees: ./result_matches/

Just Parse Text and Exit ^{Top ▲}

If you only want to save the parsed trees and exit, you can use --no-query. This can be useful if you want to use the parsed trees for other purposes.

nsca samples/sample1.txt --no-query
# parsed trees: samples/sample1.parsed
nsca --text 'This is a test.' --no-query
# parsed trees: ./cmdline_text.parsed

List Output Fields ^{Top ▲}

If you are not sure what the output fields represent, you can use --list to print a list of all the available output fields.

nsca --list

W: words
S: sentences
VP: verb phrases
C: clauses
T: T-units
DC: dependent clauses
CT: complex T-units
CP: coordinate phrases
CN: complex nominals
MLS: mean length of sentence
MLT: mean length of T-unit
MLC: mean length of clause
C/S: clauses per sentence
VP/T: verb phrases per T-unit
C/T: clauses per T-unit
DC/C: dependent clauses per clause
DC/T: dependent clauses per T-unit
T/S: T-units per sentence
CT/T: complex T-unit ratio
CP/T: coordinate phrases per T-unit
CP/C: coordinate phrases per clause
CN/T: complex nominals per T-unit
CN/C: complex nominals per clause

Print the Help Message ^{Top ▲}

If you call the nsca command without any arguments or options, it will return a help message.

Citing ^{Top ▲}

If you use NeoSCA in your research, please cite as follows.

BibTeX:

@misc{tan2022neosca,
title        = {NeoSCA: A Rewrite of L2 Syntactic Complexity Analyzer, version 0.0.35},
author       = {Long Tan},
howpublished = {\url{https://github.com/tanloong/neosca}},
year         = {2022}
}

APA (7th edition):

Tan, L. (2022). NeoSCA (version 0.0.35) [Computer software]. Github. https://github.com/tanloong/neosca

MLA (9th edition):

Tan, Long. NeoSCA. version 0.0.35, GitHub, 2022, https://github.com/tanloong/neosca.

Also, you need to cite Xiaofei's article describing L2SCA.

BibTeX:

@article{lu2010automatic,
title     = {Automatic analysis of syntactic complexity in second language writing},
author    = {Xiaofei Lu},
journal   = {International journal of corpus linguistics},
volume    = {15},
number    = {4},
pages     = {474--496},
year      = {2010},
publisher = {John Benjamins Publishing Company},
doi       = {10.1075/ijcl.15.4.02lu},
}

APA (7th edition):

Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474-496.

MLA (9th edition):

Lu, Xiaofei. "Automatic Analysis of Syntactic Complexity in Second Language Writing." International Journal of Corpus Linguistics, vol. 15, no. 4, John Benjamins Publishing Company, 2010, pp. 474-96.

License ^{Top ▲}

NeoSCA is licensed under the GNU General Public License version 2 or later.

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
.github/workflows		.github/workflows
img		img
neosca		neosca
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README_zh_cn.md.draft		README_zh_cn.md.draft
requirements-dev.txt		requirements-dev.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeoSCA

Contents

Highlights ^{Top ▲}

Install ^{Top ▲}

Install NeoSCA ^{Top ▲}

Install Dependents ^{Top ▲}

Basic Usage ^{Top ▲}

Single Input ^{Top ▲}

Multiple Input ^{Top ▲}

Advanced Usage ^{Top ▲}

Output Frequencies in Json Format ^{Top ▲}

Pass Text Through the Command Line ^{Top ▲}

Reserve Intermediate Results ^{Top ▲}

Just Parse Text and Exit ^{Top ▲}

List Output Fields ^{Top ▲}

Print the Help Message ^{Top ▲}

Citing ^{Top ▲}

License ^{Top ▲}

About

Releases

Packages

Languages

License

bright2013/neosca

Folders and files

Latest commit

History

Repository files navigation

NeoSCA

Contents

Highlights Top ▲

Install Top ▲

Install NeoSCA Top ▲

Install Dependents Top ▲

Basic Usage Top ▲

Single Input Top ▲

Multiple Input Top ▲

Advanced Usage Top ▲

Output Frequencies in Json Format Top ▲

Pass Text Through the Command Line Top ▲

Reserve Intermediate Results Top ▲

Just Parse Text and Exit Top ▲

List Output Fields Top ▲

Print the Help Message Top ▲

Citing Top ▲

License Top ▲

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Highlights ^{Top ▲}

Install ^{Top ▲}

Install NeoSCA ^{Top ▲}

Install Dependents ^{Top ▲}

Basic Usage ^{Top ▲}

Single Input ^{Top ▲}

Multiple Input ^{Top ▲}

Advanced Usage ^{Top ▲}

Output Frequencies in Json Format ^{Top ▲}

Pass Text Through the Command Line ^{Top ▲}

Reserve Intermediate Results ^{Top ▲}

Just Parse Text and Exit ^{Top ▲}

List Output Fields ^{Top ▲}

Print the Help Message ^{Top ▲}

Citing ^{Top ▲}

License ^{Top ▲}

Packages