Skip to content

Mailing list migration

Eric Larson edited this page Dec 11, 2020 · 17 revisions

Following:

1. In Ubuntu host

git clone https://github.com/discourse/discourse.git
cd discourse
d/boot_dev --init
d/rails db:migrate RAILS_ENV=development
printf "\n\ngem 'sqlite3'" >> Gemfile
d/bundle
d/shell

2. In docker shell

# follow the import/list_name hierarchy from
# https://meta.discourse.org/t/importing-mailing-lists-mbox-listserv-google-groups-emails/79773#1-5-prepare-files

sudo mkdir -p /shared/import/data
sudo chown -R discourse:discourse /shared/import
wget -r -l1 --no-parent --no-directories "https://mail.nmr.mgh.harvard.edu/pipermail//mne_analysis/" -P /shared/import/data/mne_analysis -A "*-*.txt.gz"
rm /shared/import/data/mne_analysis/robots.txt.tmp
gzip -d /shared/import/data/mne_analysis/*.txt.gz
wget https://gist.githubusercontent.com/larsoner/940cd6c7100b87c4c5668cb0bc540afb/raw/9e78513620d11355ad0e10f4a2470996c26ebc8c/mailmanToMBox.py -O ~/mailmanToMBox.py
python3 ~/mailmanToMBox.py /shared/import/data/mne_analysis/
rm /shared/import/data/mne_analysis/*.txt
sudo apt install -y libsqlite3-dev

# check results
cat /shared/import/data/mne_analysis/*.mbox > ~/all.mbox
sudo apt install -y procmail
mkdir -p ~/split
export FILENO=0000
formail -ds sh -c 'cat > ~/split/msg.$FILENO' < ~/all.mbox
rm -rf ~/split ~/all.mbox

# settings
wget https://raw.githubusercontent.com/discourse/discourse/master/script/import_scripts/mbox/settings.yml -O /shared/import/settings.yml
printf "\n\n\"mne_analysis\": \"[Mne_analysis]\"" >> /shared/import/settings.yml  # remove [Mne_analysis], turn into tag

# run it
cd /src
bundle exec ruby script/import_scripts/mbox.rb /shared/import/settings.yml
$ bundle exec ruby script/import_scripts/mbox.rb /shared/import/settings.yml
Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...

creating index
indexing files in /shared/import/data/mne_analysis
indexing /shared/import/data/mne_analysis/2018-March.mbox
...
indexing /shared/import/data/mne_analysis/2007-December.mbox

indexing replies and users

creating categories
        1 / 1 (100.0%)  [17497813 items/min]  
creating users

creating topics and posts
     7373 / 7373 (100.0%)  [1360 items/min]  

Updating topic status

Updating bumped_at on topics

Updating last posted at on users

Updating last seen at on users

Updating first_post_created_at...

Updating user post_count...

Updating user topic_count...

Updating topic users

Updating post timings

Updating featured topic users

Updating featured topics in categories
        5 / 5 (100.0%)  [6890 items/min]   ]  
Resetting topic counters


Done (00h 06min 21sec)

3. In Ubuntu host (hopefully)

d/unicorn
google-chrome http://0.0.0.0:9292

Done!

3.5 Redoing the import

To wipe and start over, from here in the discourse root on Ubuntu host (this allows the Ruby commands to execute there rather than on the dev docker instance, which is necessary to kill the dB as the docker instance does not have permissions):

rm -R tmp/*
rm -R log/*
# sudo apt install ruby-dev libsqlite3-dev libpq-dev redis-tools  # only needs to be done once
# bundle install --path vendor/bundle  # only needs to be done once

Note that the dev root directory Gemfile discourse/Gemfile on the Ubuntu host is the same as /src/Gemfile on the dev docker instance, so this effectively duplicates the env that is on the docker instance.

Note that, without wiping the old instance, the import can be repeated, so only new posts will be added!

4. Exporting the database for the Discourse people to host

TBD