change to numpy

MIFA-Lab · Jun 6, 2022 · 2f0f231 · 2f0f231
1 parent cf91563
commit 2f0f231
Show file tree

Hide file tree

Showing 9 changed files with 150 additions and 191 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+
+.DS_Store
diff --git a/README.md b/README.md
@@ -1,73 +1,68 @@
-# Contents
-
-- [LDP LinUCB Description](#ldp-linucb-description)
-- [Model Architecture](#model-architecture)
-- [Dataset](#dataset)
-- [Environment Requirements](#environment-requirements)
-- [Script Description](#script-description)
-    - [Script and Sample Code](#script-and-sample-code)
-    - [Script Parameters](#script-parameters)
-    - [Launch](#launch)
-- [Model Description](#model-description)
-    - [Performance](#performance)
-- [Description of Random Situation](#description-of-random-situation)
-- [ModelZoo Homepage](#modelzoo-homepage)
-
-# [LDP LinUCB Description](#contents)
-
-[![Platform](https://img.shields.io/badge/platform-mindspore-blue)](https://www.mindspore.cn/install/en)
+# LDP LinUCB
+
+[![Platform](https://img.shields.io/badge/platform-numpy-blue)](https://numpy.org/install)
 [![Top Language](https://img.shields.io/github/languages/top/huang-research-group/LDPbandit2020)](https://github.com/huang-research-group/LDPbandit2020/search?l=python)
 [![Latest Release](https://img.shields.io/github/v/release/huang-research-group/LDPbandit2020)](https://github.com/huang-research-group/LDPbandit2020/releases)
 
-Locally Differentially Private (LDP) LinUCB is a variant of LinUCB bandit algorithm with local differential privacy guarantee, which can preserve users' personal data with theoretical guarantee.
 
-[Paper](https://arxiv.org/abs/2006.00701):  Kai Zheng, Tianle Cai, [Weiran Huang](https://www.weiranhuang.com), Zhenguo Li, Liwei Wang. "Locally Differentially Private (Contextual) Bandits Learning." *Advances in Neural Information Processing Systems*. 2020.
+## Description
+
+Locally Differentially Private (LDP) LinUCB is a variant of LinUCB bandit algorithm with local differential privacy guarantee, which can preserve users' personal data with theoretical guarantee.
 
-# [Model Architecture](#contents)
+The server interacts with users in rounds. For a coming user, the server first transfers the current model parameters to the user. In the user side, the model chooses an action based on the user feature to play (e.g., choose a movie to recommend), and observes a reward (or loss) value from the user (e.g., rating of the movie). Then we perturb the data to be transferred by adding Gaussian noise. Finally, the server receives the perturbed data and updates the model. Details can be found in the [paper](https://arxiv.org/abs/2006.00701).
 
-The server interacts with users in rounds. For a coming user, the server first transfers the current model parameters to the user. In the user side, the model chooses an action based on the user feature to play (e.g., choose a movie to recommend), and observes a reward (or loss) value from the user (e.g., rating of the movie). Then we perturb the data to be transferred by adding Gaussian noise. Finally, the server receives the perturbed data and updates the model. Details can be found in the [original paper](https://arxiv.org/abs/2006.00701).
+Paper:  Kai Zheng, Tianle Cai, [Weiran Huang](https://www.weiranhuang.com), Zhenguo Li, Liwei Wang, "[Locally Differentially Private (Contextual) Bandits Learning](https://arxiv.org/abs/2006.00701)", *Advances in Neural Information Processing Systems*, 2020.
 
-# [Dataset](#contents)
+Note: An earlier MindSpore-based version can be found in [MindSpore Models (Gitee)](https://gitee.com/mindspore/models/tree/master/research/rl/ldp_linucb) or [v1.0.0](https://github.com/huang-research-group/LDPbandit2020/tree/v1.0.0). 
 
-Note that you can run the scripts based on the dataset mentioned in original paper. In the following sections, we will introduce how to run the scripts using the related dataset below.
+## Dataset
 
-Dataset used: [MovieLens 100K](https://grouplens.org/datasets/movielens/100k/)
+Dataset used: [MovieLens 100K](https://grouplens.org/datasets/movielens/100k/) ([download](https://files.grouplens.org/datasets/movielens/ml-100k.zip))
 
 - Dataset size：5MB, 100,000 ratings (1-5) from 943 users on 1682 movies.
 - Data format：csv/txt files
 
-# [Environment Requirements](#contents)
+We process the dataset by `src/dataset.py`:
+We first pick out all the users having at least one rating score.
+Then SVD is appied to complement missing ratings and full rating table is obtained.
+We normalize all the ratings to [-1,1].
 
-- Hardware (Ascend/GPU)
-    - Prepare hardware environment with Ascend or GPU processor.
-- Framework
-    - [MindSpore](https://www.mindspore.cn/install/en)
-- For more information, please check the resources below：
-    - [MindSpore Tutorials](https://www.mindspore.cn/tutorials/en/master/index.html)
-  - [MindSpore Python API](https://www.mindspore.cn/docs/api/en/master/index.html)
 
-# [Script Description](#contents)
+## Installation
+
+Unzip the MovieLens dataset and place `ua.base` in the code directory.
+Then run the following commands:
+
+```bash
+python -m venv venv                 # create a virtual environment named venv
+source venv/bin/activate            # activate the environment
+pip install -r requirements.txt     # install the dependencies
+```
+
+Code is tested in the following environment:
+- numpy==1.21.6
+- matplotlib==3.5.2
 
-## [Script and Sample Code](#contents)
+
+## Script and Sample Code
 
 ```console
-├── model_zoo
-    ├── README.md                                // descriptions about all the models
-    ├── research
-        ├── rl
-            ├── ldp_linucb
-                ├── README.md                    // descriptions about LDP LinUCB
-                ├── scripts
-                │   ├── run_train_eval.sh        // shell script for running on Ascend
-                ├── src
-                │   ├── dataset.py               // dataset for movielens
-                │   ├── linucb.py                // model
-                ├── train_eval.py                // training script
-                ├── result1.png                  // experimental result
-                ├── result2.png                  // experimental result
+├── LDPbandit2020
+    ├── ua.base                  // downloaded data file
+    ├── README.md                // descriptions about the repo
+    ├── requirements.txt         // dependencies
+    ├── scripts
+        ├── run_train_eval.sh    // shell script for training and evaluation
+    ├── src
+        ├── dataset.py           // dataset processing for movielens
+        ├── linucb.py            // model
+    ├── train_eval.py            // training and evaluation script
+    ├── result1.png              // experimental result
+    ├── result2.png              // experimental result
 ```
 
-## [Script Parameters](#contents)
+
+## Script Parameters
 
 - Parameters for preparing MovieLens 100K dataset
 
@@ -85,52 +80,66 @@ Dataset used: [MovieLens 100K](https://grouplens.org/datasets/movielens/100k/)
   'iter_num': 1e6           # number of iterations
   ```
 
-## [Launch](#contents)
 
-- running on Ascend
+## Usage
 
-  ```shell
-  python train_eval.py > result.log 2>&1 &
+  ```bash
+  python train_eval.py --epsilon=8e5 --delta=1e-1 --alpha=1e-1
   ```
 
-The python command above will run in the background, you can view the results through the file `result.log`.
-
 The regret value will be achieved as follows:
 
 ```console
---> Step: 0, diff: 348.662, current_regret: 0.000, cumulative regret: 0.000
---> Step: 1, diff: 338.457, current_regret: 0.000, cumulative regret: 0.000
---> Step: 2, diff: 336.465, current_regret: 2.000, cumulative regret: 2.000
---> Step: 3, diff: 327.337, current_regret: 0.000, cumulative regret: 2.000
---> Step: 4, diff: 325.039, current_regret: 2.000, cumulative regret: 4.000
+--> Step: 0, diff: 350.346, current regret: 0.000, cumulative regret: 0.000
+--> Step: 1, diff: 344.916, current regret: 0.400, cumulative regret: 0.400
+--> Step: 2, diff: 340.463, current regret: 0.000, cumulative regret: 0.400
+--> Step: 3, diff: 344.849, current regret: 0.800, cumulative regret: 1.200
+--> Step: 4, diff: 337.587, current regret: 0.000, cumulative regret: 1.200
 ...
+--> Step: 999997, diff: 54.873, current regret: 0.000, cumulative regret: 962.400
+--> Step: 999998, diff: 54.873, current regret: 0.000, cumulative regret: 962.400
+--> Step: 999999, diff: 54.873, current regret: 0.000, cumulative regret: 962.400
+Regret: 962.3999795913696, cost time: 562.508s
+Theta: [64.96814  26.639004  21.260265  19.860786  18.405128  16.73249  15.778397  14.784237  13.298004  12.329174  12.149574  11.159462  10.170071  9.662151  8.269745  7.794155  7.3355427  7.3690567  5.790653  3.9999294]
+Ground-truth theta: [88.59274289  28.4110571  22.59921103  21.77239171  20.3727694  19.27781873  17.40422888  16.8321811  15.52599173  14.62141299  14.21670515  12.55781785  11.29962158  10.97902155  10.32499178  9.33040444  8.88399318  8.28387461  6.86420729  4.47880342]
 ```
 
-# [Model Description](#contents)
 
-The [original paper](https://arxiv.org/abs/2006.00701) assumes that the norm of user features is bounded by 1 and the norm of rating scores is bounded by 2. For the MovieLens dataset, we normalize rating scores to [-1,1]. Thus, we set `sigma` in Algorithm 5 to be $4/\epsilon \times \sqrt{2 \times ln(1.25/\delta)}$.
+## Algorithm Modification
+
+The [original paper](https://arxiv.org/abs/2006.00701) assumes that the norm of user features is bounded by 1 and the norm of rating scores is bounded by 2. For the MovieLens dataset, we normalize rating scores to [-1,1]. Thus, we set `sigma` in Algorithm 5 to $4/\epsilon \cdot \sqrt{2  \ln(1.25/\delta)}$.
 
-## [Performance](#contents)
+
+## Performance
 
 The performance for different privacy parameters:
 
-- x: number of iterations
-- y: cumulative regret
+- X axis: number of iterations
+- Y axis: cumulative regret
 
 ![Result1](result1.png)
 
-The performance compared with optimal non-private regret O(sqrt(T)):
+The performance compared with optimal non-private regret $O(\sqrt{T})$:
 
-- x: number of iterations
-- y: cumulative regret divided by sqrt(T)
+- X axis: number of iterations
+- Y axis: cumulative regret divided by $\sqrt{T}$
 
 ![Result2](result2.png)
 
-# [Description of Random Situation](#contents)
+It can be seen that our privacy-preserved performance is close to the optimal non-private performance.
+
 
-In `train_eval.py`, we randomly sample a user at each round. We also add Gaussian noise to the date being transferred.
+## Citation
 
-# [ModelZoo Homepage](#contents)
+If you find our work useful in your research, please consider citing:
 
-Please check the official
-[homepage](https://gitee.com/mindspore/models).
+```
+@article{zheng2020locally,
+  title={Locally differentially private (contextual) bandits learning},
+  author={Zheng, Kai and Cai, Tianle and Huang, Weiran and Li, Zhenguo and Wang, Liwei},
+  journal={Advances in Neural Information Processing Systems},
+  volume={33},
+  pages={12300--12310},
+  year={2020}
+}
+```
diff --git a/requirements.txt b/requirements.txt
diff --git a/result1.png b/result1.png
diff --git a/result2.png b/result2.png
diff --git a/scripts/run_train_eval.sh b/scripts/run_train_eval.sh
@@ -1,17 +1,17 @@
 #!/bin/bash
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
 
-python3 train_eval.py > result.log 2>&1 & 
+# parameters
+epsilon=8e5
+delta=1e-1
+alpha=1e-1
+iter_num=1e6
+timestamp=`date '+%s'`
+
+# preparation
+SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
+cd ${SHELL_FOLDER}/../
+if [ ! -d "./results" ]; then
+    mkdir ./results
+fi
+
+python train_eval.py --epsilon=${epsilon} --delta=${delta} --alpha=${alpha} --iter_num=${iter_num}| tee ./results/"${timestamp}_epsilon_${epsilon}_delta_${delta}_alpha_${alpha}_iter_num_${iter_num}".txt
diff --git a/src/dataset.py b/src/dataset.py
@@ -1,24 +1,10 @@
-# Copyright 2020 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
 """
 MovieLens Environment.
+Datasets can be downloaded at https://files.grouplens.org/datasets/movielens/ml-100k.zip
 """
 
 import random
 import numpy as np
-from mindspore import Tensor
 
 _MAX_NUM_ACTIONS = 1682
 _NUM_USERS = 943
@@ -58,7 +44,7 @@ def __init__(self, data_file, num_movies, rank_k):
         self._data_matrix = load_movielens_data(data_file)
         # Keep only the first items
         self._data_matrix = self._data_matrix[:, :num_movies]
-        # Filter the users with at least one rating score
+        # Pick out the users with at least one rating score
         nonzero_users = list(
             np.nonzero(
                 np.sum(
@@ -92,8 +78,8 @@ def observation(self):
         """random select a user and return its feature."""
         sampled_user = random.randint(0, self._data_matrix.shape[0] - 1)
         self._current_user = sampled_user
-        return Tensor(self._feature[sampled_user])
+        return self._feature[sampled_user]
 
     def current_rewards(self):
         """rewards for current user."""
-        return Tensor(self._approx_ratings_matrix[self._current_user])
+        return self._approx_ratings_matrix[self._current_user]