S3 drivers #44

mgasvoda · 2018-04-10T20:25:07Z

No description provided.

jnelson16

Test passes on my laptop, so this is probably good to go. I still want to try running one of the state corpora through the new driver, I'll let you know how that goes.

jnelson16 · 2018-04-11T18:51:21Z

quantgov/corpora/structures.py

+        """ Filter paths based on index values. """
+        raise NotImplementedError
+
+    def gen_indces_and_paths(self):


jnelson16 · 2018-04-11T18:54:40Z

quantgov/corpora/__init__.py

@@ -5,5 +5,7 @@
    FlatFileCorpusDriver,
    RecursiveDirectoryCorpusDriver,
    NamePatternCorpusDriver,
-    IndexDriver
+    IndexDriver,


@OliverSherouse Is there a reason this is not called IndexCorpusDriver, to follow the pattern above?

Nope. Let's add that as a bug and rename for 1.0

Added as #46

jnelson16 · 2018-04-11T19:01:48Z

tests/test_corpora.py

+        rows.append((letter, number, path))
+    index_path = directory.join('index.csv')
+    with index_path.open('w', encoding='utf-8') as outf:
+        outf.write(u'letter,number,path\n')


Should we be using the csv.writerows method for this? Or is this a more efficient way for testing purposes? @OliverSherouse

Not more efficient, particularly, though not really a problem, either.

jnelson16 · 2018-04-11T19:45:31Z

I have an S3Driver working with the Wyoming (or as @OliverSherouse calls it, Wisconsin) corpus!

OliverSherouse

Fix small bugs and a few questions.

OliverSherouse · 2018-04-13T16:07:04Z

tests/test_corpora.py

+        rows.append((letter, number, path))
+    index_path = directory.join('index.csv')
+    with index_path.open('w', encoding='utf-8') as outf:
+        outf.write(u'letter,number,path\n')


Why do we have u strings? This ain't 2007, we're not writing python 2!

OliverSherouse · 2018-04-13T16:07:44Z

tests/test_corpora.py

+        outf.write(u'letter,number,path\n')
+        outf.write(u'\n'.join(','.join(row) for row in rows))
+    return quantgov.corpora.S3Driver(str(index_path),
+                                     bucket='quantgov-databanks')


Will people outside the core dev team be able to run these tests?

As long as they have aws credentials for boto I believe so, since the bucket is public.

* Inaugurated 0.4.0 dev series * Sentiment analysis (#33) Closes #11 #12 #13 and adds Sentiment analysis! * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * if it aint broke... * textblob sentiment * tests and error raising * fixed install req * pep8 fixes * code review updates * fix travis file * import fixes * small fix * Test corpora (#35) * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * new corpora in English!! * hotfix to add timestamp as corpus identifier * Skl compatibility (#41) * Add sklearn 0.17 compatibility Paper over library reorganization. * renamed corpora to corpus, added deprecation warning (#42) * renamed corpora to corpus, added deprecation warning * moved load_driver and set up for future forcing of full imports of submodules Closes #31 * S3 drivers (#44) * initial working commit for s3 driver and database driver * removing 3.6 formatting * adding extra requirements list * adding basic s3 driver test * Removing unnecessary function * This ain't 2007 * test updates * adding s3driver to new corpus structure * Rounding (#45) * bumped version

* Inaugurated 0.4.0 dev series * Sentiment analysis (#33) Closes #11 #12 #13 and adds Sentiment analysis! * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * if it aint broke... * textblob sentiment * tests and error raising * fixed install req * pep8 fixes * code review updates * fix travis file * import fixes * small fix * Test corpora (#35) * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * if it aint broke... * new corpora in English!! * hotfix to add timestamp as corpus identifier * Skl compatibility (#41) * Add sklearn 0.17 compatibility Paper over library reorganization. * renamed corpora to corpus, added deprecation warning (#42) * renamed corpora to corpus, added deprecation warning * moved load_driver and set up for future forcing of full imports of submodules Closes #31 * S3 drivers (#44) * initial working commit for s3 driver and database driver * removing 3.6 formatting * adding extra requirements list * adding basic s3 driver test * Removing unnecessary function * This ain't 2007 * test updates * adding s3driver to new corpus structure * Rounding (#45) * bumped version * Fix NLTK loading bug Fix evaluation order when NLTK is not present

* hotfix to add timestamp as corpus identifier (#39) * bumped version * Release 0.4 (#47) * Inaugurated 0.4.0 dev series * Sentiment analysis (#33) Closes #11 #12 #13 and adds Sentiment analysis! * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * if it aint broke... * textblob sentiment * tests and error raising * fixed install req * pep8 fixes * code review updates * fix travis file * import fixes * small fix * Test corpora (#35) * complexity * complexity builtins * complexity builtins with tests * code review updates * option tests * added nltk requirement in setup.py * add pip install to .travis.yml * nltk fixes * another nltk fix * last nltk fix? * you know the drill * Update .travis.yml * nltk troubles * some final cleanup * new corpora in English!! * hotfix to add timestamp as corpus identifier * Skl compatibility (#41) * Add sklearn 0.17 compatibility Paper over library reorganization. * renamed corpora to corpus, added deprecation warning (#42) * renamed corpora to corpus, added deprecation warning * moved load_driver and set up for future forcing of full imports of submodules Closes #31 * S3 drivers (#44) * initial working commit for s3 driver and database driver * removing 3.6 formatting * adding extra requirements list * adding basic s3 driver test * Removing unnecessary function * This ain't 2007 * test updates * adding s3driver to new corpus structure * Rounding (#45) * bumped version * Fix NLTK loading bug Fix evaluation order when NLTK is not present

mgasvoda added 3 commits April 10, 2018 15:41

initial working commit for s3 driver and database driver

bcf292e

removing 3.6 formatting

11b0e2f

adding extra requirements list

7da72f3

mgasvoda requested review from OliverSherouse and jnelson16 April 10, 2018 20:25

adding basic s3 driver test

f691889

jnelson16 reviewed Apr 11, 2018

View reviewed changes

mgasvoda mentioned this pull request Apr 13, 2018

Rename index driver #46

Open

OliverSherouse reviewed Apr 13, 2018

View reviewed changes

mgasvoda and others added 5 commits April 13, 2018 12:10

Removing unnecessary function

2fdc803

This ain't 2007

c4641f2

merging upstream changes

f9cd65e

test updates

a46a22d

adding s3driver to new corpus structure

6ec8732

OliverSherouse approved these changes Apr 13, 2018

View reviewed changes

OliverSherouse merged commit ebcb7d2 into dev Apr 13, 2018

OliverSherouse deleted the s3_drivers branch April 13, 2018 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 drivers #44

S3 drivers #44

mgasvoda commented Apr 10, 2018

jnelson16 left a comment

jnelson16 Apr 11, 2018

jnelson16 Apr 11, 2018

OliverSherouse Apr 13, 2018

mgasvoda Apr 13, 2018

jnelson16 Apr 11, 2018

OliverSherouse Apr 13, 2018

jnelson16 commented Apr 11, 2018

OliverSherouse left a comment

OliverSherouse Apr 13, 2018

OliverSherouse Apr 13, 2018

mgasvoda Apr 13, 2018

S3 drivers #44

S3 drivers #44

Conversation

mgasvoda commented Apr 10, 2018

jnelson16 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnelson16 commented Apr 11, 2018

OliverSherouse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment