search.xml

<?xml version="1.0" encoding="utf-8"?>
<search>
  
    
    <entry>
      <title><![CDATA[基于Docker实现单机多账号拨号且共享校园网]]></title>
      <url>%2F2017%2F09%2F26%2F%E5%9F%BA%E4%BA%8EDocker%E5%AE%9E%E7%8E%B0%E5%8D%95%E6%9C%BA%E5%A4%9A%E8%B4%A6%E5%8F%B7%E6%8B%A8%E5%8F%B7%E4%B8%94%E5%85%B1%E4%BA%AB%E6%A0%A1%E5%9B%AD%E7%BD%91%2F</url>
      <content type="text"><![CDATA[目前，我们学校是一人一账号的实名上网制度，为了共享上网，本文将讲解在Ubuntu 16.04环境下，通过Docker实现多个校园网账号拨号的问题。并且实现共享上网的的功能。你可能不知道这有什么用，每个Docker节点都能用于多人共享校园网账号上网。 1.安装／升级你的Docker客户端通过以下命令可以安装或升级你的Docker，升级的前提是你用这行代码安装过Docker。1curl -sSL http://acs-public-mirror.oss-cn-hangzhou.aliyuncs.com/docker-engine/internet | sh - Docker其实就像是虚拟机，只不过是命令行下，没有图形界面的虚拟机。 2.使用Docker加速源你可以通过修改daemon配置文件/etc/docker/daemon.json来使用加速器：1234567sudo mkdir -p /etc/dockersudo tee /etc/docker/daemon.json &lt;&lt;-&apos;EOF&apos;&#123; &quot;registry-mirrors&quot;: [&quot;https://xxxxxxxx.mirror.aliyuncs.com&quot;]&#125;EOFsudo service docker restart 3.启动一个Docker节点1docker run -itd --privileged --name test ubuntu /bin/bash 其中 –privileged 参数是必要的，为了以后配置系统文件。–name 是这个Docker节点的名字。 4.创建桥接桥接这一步是实现单机多账号拨号的关键，你可以把桥接当作一个交换机，相当于在本机网卡前多加了一个虚拟交换机，再虚拟交换机后，就可以连接无限多个Docker节点，来分别用于拨号了。1234567docker start test;pipework br0 test 172.18.9.134/24@172.18.9.129ip addr add 172.18.9.140/24 dev br0; \ip addr del 172.18.9.140/24 dev enp3s0; \brctl addif br0 enp3s0; \route add default gw 172.18.9.129 dev br0;\ip addr add 172.18.9.141/24 dev enp3s0 其中 172.18.9.140 是原有的网卡IP，现在的桥接IP。172.18.9.129 是网关，172.18.9.141是现在的网卡IP，172.18.9.134是Docker节点的IP，这样Docker节点就和物理网卡同IP段了。enp3s0 是网卡名字，br0是桥接名字。 5.创建pptpd的VPN服务器docker attach 进入Docker节点后，在Ubuntu下，通过以下命令即可安装pptpd:1sudo apt-get install pptpd 配置vpn连入的IP池:1234vim /etc/pptpd.conf# 在文件最后添加(喜欢怎么写，就怎么写)localip 10.100.55.1remoteip 10.100.55.100-200 配置账号密码：12vim /etc/ppp/chap-secrets账号 pptpd 密码 * 配置DNS:123vim /etc/ppp/pptpd-optionsms-dns 114.114.114.114ms-dns 8.8.8.8 配置IP地址转发：123vim /etc/sysctl.conf取消注释net.ipv4.ip_forward = 1wq退出后，sysctl -p 然后restart以下pptpd这个服务再配置NAT：1iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE eth1 为网卡名字。配置连入vpn的MTU：123vim /etc/ppp/ip-up最后加入：ifconfig $1 mtu 1500 最后再：1service pptpd restart]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[Disease-Level-Recognition]]></title>
      <url>%2F2017%2F07%2F18%2FDisease-Level-Recognition%2F</url>
      <content type="text"><![CDATA[本文会通过 Keras 搭建一个深度卷积神经网络来识别眼部疾病图像的病变程度，在验证集上的准确率可以达到100%，建议使用显卡来运行该项目。本项目使用的 Keras 版本是1.2.2。如果你使用的是更高级的版本，可能会稍有函数的变化。 病变程度识别数据集是我导师给的私人数据，若有需要，请私下联系我，目录结构如下：12345678910111213➜ ls data/1 | head1 (1).jpg1 (2).jpg1 (3).jpg1 (4).jpg➜ ls data/2 | head2 (1).jpg2 (2).jpg2 (3).jpg2 (4).jpg2 (5).jpg2 (6).jpg…… 代码开源地址： https://github.com/Silencezjl/Disease-Level-Recognition数据预处理与数据提升可以看到，我们的数据集很小，只有80张图片，而且数据不均匀，所以我们需要对数据进行一个扩展，我们将通过一系列随机变换堆数据进行提升，这样我们的模型将看不到任何两张完全相同的图片，这有利于我们抑制过拟合，使得模型的泛化能力更好。 123456789101112131415161718192021222324from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_imgfor i in range(7): pic_name = '6 (' + str(i+1) + ')' datagen = ImageDataGenerator( rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest') img = load_img('data/6/' + pic_name + '.jpg') # this is a PIL image x = img_to_array(img) # this is a Numpy array with shape (3, 150, 150) x = x.reshape((1,) + x.shape) # this is a Numpy array with shape (1, 3, 150, 150) # the .flow() command below generates batches of randomly transformed images # and saves the results to the `preview/` directory i = 0 for batch in datagen.flow(x, batch_size=1, save_to_dir='data/6', save_prefix='gen_eye'+pic_name, save_format='jpeg'): i += 1 if i &gt; 20: break # otherwise the generator would loop indefinitely 效果如下图(眼球图片实在有些高能，所以我用了猫的图片来示意。) 导出特征向量对于这个问题来说，使用预训练的网络是最好不过的了，一种有效的方法是综合各个不同的模型，从而得到不错的效果，兼听则明。如果是直接在一个巨大的网络后面加我们的全连接，那么训练10代就需要跑十次巨大的网络，而且我们的卷积层都是不可训练的，那么这个计算就是浪费的。所以我们可以将多个不同的网络输出的特征向量先保存下来，以便后续的训练，这样做的好处是我们一旦保存了特征向量，即使是在普通笔记本上也能轻松训练。 1234567891011121314151617181920212223242526272829303132from keras.models import *from keras.layers import *from keras.applications import *from keras.preprocessing.image import *import h5pydef write_gap(function_name, MODEL, image_size, lambda_func=None): width = image_size[0] height = image_size[1] input_tensor = Input((height, width, 3)) x = input_tensor if lambda_func: x = Lambda(lambda_func)(x) base_model = MODEL(input_tensor=x, weights='imagenet', include_top=False) model = Model(base_model.input, GlobalAveragePooling2D()(base_model.output)) gen = ImageDataGenerator() train_generator = gen.flow_from_directory("data", image_size, shuffle=False, batch_size=64) train = model.predict_generator(train_generator, train_generator.nb_sample) with h5py.File("gap_%s.h5" % function_name) as h: h.create_dataset("train", data=train) h.create_dataset("label", data=train_generator.classes) # h.create_dataset("label_map", data=train_generator.class_indices)write_gap('ResNet50', ResNet50, (224, 224))write_gap('InceptionV3', InceptionV3, (299, 299), inception_v3.preprocess_input)write_gap('Xception', Xception, (299, 299), xception.preprocess_input) 为了复用代码，写一个函数是非常有必要的，那么我们的函数就需要输入模型，输入图片的大小，以及预处理函数，因为 Xception 和 Inception V3 都需要将数据限定在 (-1, 1) 的范围内，然后我们利用 GlobalAveragePooling2D 将卷积层输出的每个激活图直接求平均值，不然输出的文件会非常大，且容易过拟合。然后我们定义了两个 generator，利用 model.predict_generator 函数来导出特征向量，最后我们选择了 ResNet50, Xception, Inception V3 这三个模型（如果有兴趣也可以导出 VGG 的特征向量）。每个模型导出的时间都挺长的，用 GTX 1080 Ti 上大概需要用五分钟到十分钟。 这三个模型都是在 ImageNet 上面预训练过的，所以每一个模型都可以说是身经百战，通过这三个老司机导出的特征向量，可以高度概括一张图片有哪些内容，最后导出的 h5 文件包括三个 numpy 数组： 载入特征向量和构建模型经过上面的代码以后，我们获得了三个特征向量文件，分别是： gap_ResNet50.h5 gap_InceptionV3.h5 gap_Xception.h5 我们需要载入这些特征向量，并且将它们合成一条特征向量，然后记得把 X 和 y 打乱，不然之后我们设置validation_split的时候会出问题。这里设置了 numpy 的随机数种子为2017，这样可以确保每个人跑这个代码，输出都能是一样的结果。 1234567891011121314151617181920212223242526272829303132333435363738394041424344import h5pyfrom sklearn.utils import shufflefrom keras.models import *from keras.layers import *from keras.utils import np_utilsfrom keras.preprocessing.image import *if __name__ == '__main__': np.random.seed(2017) X_train = [] # X_test = [] for filename in ["gap_ResNet50.h5", "gap_InceptionV3.h5", "gap_Xception.h5"]: print('加载'+filename) with h5py.File(filename, 'r') as h: X_train.append(np.array(h['train'])) # X_test.append(np.array(h['test'])) y_train = np.array(h['label']) X = np.concatenate(X_train, axis=1) # X_test = np.concatenate(X_test, axis=1) X_train, y_train = shuffle(X, y_train) input_tensor = Input(X_train.shape[1:]) x = input_tensor x = Dropout(0.5)(x) x = Dense(100, activation='softmax')(x) model = Model(input_tensor, x) model.compile(optimizer='adadelta', loss='categorical_crossentropy', metrics=['accuracy']) y_train = np_utils.to_categorical(y_train, 100) model.fit(X_train, y_train, batch_size=64, nb_epoch=100, validation_split=0.1) # model.save('model.h5') y_pred = model.predict(X_train, verbose=1) y_pred = np.argmax(y_pred, axis=1) print(y_pred[0]) 网络结构如下： 训练模型模型构件好了以后，我们就可以进行训练了，这里我们设置验证集大小为 10% ，也就是说训练集是1560张图，验证集是174张图。 1234567891011121314151617181920Train on 1560 samples, validate on 174 samplespoch 1/1001560/1560 [==============================] - 1s - loss: 1.8587 - acc: 0.3686 - val_loss: 1.0700 - val_acc: 0.6264Epoch 2/1001560/1560 [==============================] - 0s - loss: 0.9671 - acc: 0.6282 - val_loss: 0.7052 - val_acc: 0.8103Epoch 3/1001560/1560 [==============================] - 0s - loss: 0.7202 - acc: 0.7474 - val_loss: 0.5578 - val_acc: 0.8506Epoch 4/1001560/1560 [==============================] - 0s - loss: 0.5995 - acc: 0.8013 - val_loss: 0.4614 - val_acc: 0.8966Epoch 5/100…………Epoch 60/1001560/1560 [==============================] - 0s - loss: 0.0204 - acc: 0.9974 - val_loss: 0.0382 - val_acc: 0.9943Epoch 61/1001560/1560 [==============================] - 0s - loss: 0.0170 - acc: 0.9987 - val_loss: 0.0295 - val_acc: 0.9943Epoch 62/1001560/1560 [==============================] - 0s - loss: 0.0171 - acc: 0.9994 - val_loss: 0.0289 - val_acc: 0.9943Epoch 63/1001560/1560 [==============================] - 0s - loss: 0.0151 - acc: 1.0000 - val_loss: 0.0249 - val_acc: 1.0000………… 我们可以看到，训练的过程很快，十秒以内就能训练完，准确率也很高，在验证集上最高达到了100%的准确率。 总结大体思路就是，用预训练模型进行特征提取，再自己构造一个全链接神经网络进行图像分类。也可以看作是在别人的CNN后面加了一层Dropout和全链接，然后把别人的CNN参数固定，只改变我们自己的参数。实现的功能是，给我这种病的图片，我能得出它病变的程度，并给出准确率99%以上的结果。 参考链接： ResNet 15.12 Inception v3 15.12 Xception 16.10 面向小数据集构建图像分类模型 猫狗大战]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[Kesci_Ctrip_Room_Rrediction]]></title>
      <url>%2F2017%2F07%2F14%2FKesci-Ctrip-Room-Rrediction%2F</url>
      <content type="text"><![CDATA[Kesci Ctrip Room Rrediction5st Place Solution for kesci-ctrip room prediction 携程的题目说明及数据地址：https://www.kesci.com/apps/home/#!/lab/dataset/58d4e28c84a25f34b1d94906/document 开发环境：| Mac OS CPU | Python 3.5 | LightGBM 0.2 | Xgboost 0.6 | 开源地址：https://github.com/Silencezjl/Kesci-Ctrip-Room-Rrediction 1. 队伍介绍&emsp;&emsp;队名：还没有到极限吧？ 由2个小伙伴组成，这是我们第一次合作参加的大数据比赛，配合还是比较默契。希望以后参加比赛的时候，一开始少交流，各人独自提取特征，再来进行特征融合。这样得到的特征将会更多样。 2. 问题重述&emsp;&emsp;携程每天向超过2.5亿会员提供全方位的旅行服务，海量的网站访问产生了海量的数据，从中挖掘潜在的数据是具有重大的意义。合理利用这些数据使其能真正为用户带来更好的旅行体验。调研表明，大部分用户除了对于酒店有偏好外，也有对于酒店房型的偏好。不同的酒店房型会提供酒店不同服务和礼惠政策等，这使得提供更多服务的同时，带来了用户一定程度的挑选时间。如何根据在用户的历史信息，挖掘出用户对于某些房型偏好，也为了节省用户的挑选时间和提供更好的服务。针对需要解决的问题，我们主要从以下几个方面来进行处理：特征工程，造特征，模型融合(stacking)。 3. 特征工程&emsp;&emsp;特征工程主要包含了5个部分的特征提取： &emsp;&emsp;第一个是时间信息，经过统计，所给出的训练集的时间在2013-04-14到2013-04-20之间，而测试集的时间在2013-04-21到2013-04-27之间，所以对于需要预测的信息，前6天的信息就是一个穿越特征，在use_leak函数中，就是运用这一点来提取特征。 &emsp;&emsp;第二是剔除异常数据和缺失严重的数据，通过观察，我们发现所给出的数据里面也存在一些异常数据，比如有些数据，用户历史订购房间的平均面积(user_avgroomarea)只有1，这可能是携程帮我们填补的缺失值；预定的时间(orderdate_lastord)比本次预定时间(orderdate)还大，所以在训练数据中把这部分数据剔除，因为这些异常数据一定会影响模型的效果。此外数据中roomtag_6全部都为0，roomtag_6_lastord（也几乎全为0，极少部分为空），orderbehavior_4_ratio_1month, orderbehavior_5_ratio_1month和orderbehavior_3_ratio_1month这几列特征也全为空，我们也把这部分特征全部剔除掉。当然，这些特征的选取可以通过pandas的corr函数来进行选择，我们没有开源这部分特征选择的代码，因为携程官方最终想看到一个直接能运行出结果的代码，所以我们就总结出train和eval的代码。 &emsp;&emsp;第三是标明某特征是否是该orderid中最大(小)值，这个特征也是有必要的，因为最小和最大特征是会影响用户的购买。在make_focus中，通过pandas的groupby和transform实现了这个特征的提取。 &emsp;&emsp;第四是标明该房型特定服务是否满足该用户的需求，我们把用户的历史订购频率定义为用户的需求，如user_roomservice_2_1ratio代表roomservice_2取1的频率，如果user_roomservice_2_1ratio &gt; 0.5且这个房间提供roomservice_2为1。则说明roomservice_2满足了用户的需求。在make_focus实现了这一特征的提取。 &emsp;&emsp;第五是历史购买记录和本次差别，就是用此房间和用户历史房间做差来比较，这能体现变异程度，也是很有必要的，在make_focus实现了这一特征的提取。 4. XGB造特征&emsp;&emsp;通过以上人工造特征的过程，我们再采用xgboost来造特征，具体做法就是在xgb的train之后predict时，将pred_leaf设置为true，就能得到每棵树的预测输出，以此当作特征。 &emsp;&emsp; 最终采用lgb五折预测和线性融合得到了最终的结果。 5. 一点感想&emsp;&emsp; 数据和特征决定了机器学习的上限，模型和算法只是逼近这个上限而已。我们最终成绩是可以再有提升的，但是我们把最终优化的时间都花在了模型的调参和模型的选择上面。而没有把重心放在特征的提取上，这点切记，下次比赛一定多花时间进行数据分析和特征提取。]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[One Last Fight]]></title>
      <url>%2F2017%2F06%2F10%2FOne%20Last%20Fight%2F</url>
      <content type="text"><![CDATA[不管怎样，2017年的国赛，将是我最后一次数学建模比赛。 在此小结一下我的数学建模经历，和大家分享一些小技巧，不喜勿喷。 总的来说，数学建模，真的简单，只要你用心，虽然，我也没获得过特等奖，也写不出第一次参加美赛就获O奖那样的文章。我个人最高一等，最低二等，因为内江师范学院真的很厉害。 为什么说简单，因为相比于发表一篇中文核心的论文，一篇二等奖数模论文是再简单不过了(小比赛中，还能获得一等奖)。 对比一下发表的论文和数模论文，数模更像是高中考试做题，把解题思路写得非常详细，但是这个解题思路可以不是你的创新，可以是你原来学过的知识(所以你不用怕老师说你抄袭了前人的研究，在参考文献利写出就好，当然不能完全复制)，如果有好的创新，那就一等奖以上啦！而发表一篇论文，是必须要有创新的。 为什么说简单，因为真的有很多僵尸参赛玩家。这些人就是受到周围人的影响才去参赛，他们也不会，所以，你只要会查资料，多百度，知网等，就会比他们厉害，其实僵尸玩家还是很多的。 数模的核心是，用数学的方法，揭示问题的本质。所以反思了一下，千万别在数模论文中，用深度学习的知识，虽然CNN做图像分类真的效果好，但是你也真的不好用数学公式解释CNN原理。灰色系统这样的方法也要少用，因为不好揭示问题的本质。不好揭示问题本质的方法，少用。 数学建模，更像是一个写论文的比赛，想考研的同学，可以多多参加，因为研究生也是要写论文的。只要你把论文结构写好，排版做好，论文思路层次鲜明，那你就有三等奖了，建议用Latex哦。 对于人员分工问题，真的不用一定要找一个数统院的人建模，计信院的写代码。我觉得这个真的是不用的，因为我参加的数模比赛，大半是我一个人做完所有任务的。其中我花一天写代码，两天写论文，毕竟这是一个写论文的比赛。如果你真的找队友，我的队伍安排是，两个人建模写代码，一个人排版写论文。这样还可以多几个模型。 最后是美赛和国赛的区别，我看了很多数模优秀论文，发现国内外的阅卷思想是不一样的，国赛偏重考技术，美赛偏重考思想创新。国赛更需要你有五花八门的特技，而美赛中，如果你能用一个简单的模型来解释一个复杂的问题，相比你用一个复杂炫酷模型来做，会更好。而国赛反之，你的论文越炫酷越好。 其实真的很简单，别看数学建模这四个字很高大上。其实就是用数学方法解决问题的过程就叫做建模。 最后祝各位数模玩家，玩得开心，你只需要写出一篇看起来像优秀论文的文章就好了！]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[偷一波资源]]></title>
      <url>%2F2017%2F05%2F01%2F%E5%81%B7%E4%B8%80%E6%B3%A2%E8%B5%84%E6%BA%90%2F</url>
      <content type="text"><![CDATA[基于TensorFlow的框架 https://github.com/fchollet/kerashttps://github.com/tflearn/tflearnhttps://github.com/beniz/deepdetecthttps://github.com/tensorflow/foldhttps://github.com/leriomaggio/deep-learning-keras-tensorflow 精选入门教程https://github.com/tensorflow/modelshttps://github.com/aymericdamien/TensorFlow-Exampleshttps://github.com/donnemartin/data-science-ipython-notebookshttps://github.com/jtoy/awesome-tensorflowhttps://github.com/jikexueyuanwiki/tensorflow-zhhttps://github.com/nlintz/TensorFlow-Tutorialshttps://github.com/pkmital/tensorflow_tutorialshttps://github.com/deepmind/learning-to-learnhttps://github.com/BinRoot/TensorFlow-Bookhttps://github.com/jostmey/NakedTensorhttps://github.com/alrojo/tensorflow-tutorialhttps://github.com/CreatCodeBuild/TensorFlow-and-DeepLearning-Tutorialhttps://github.com/sjchoi86/Tensorflow-101https://github.com/chiphuyen/tf-stanford-tutorialshttps://github.com/google/prettytensorhttps://github.com/ahangchen/GDLnoteshttps://github.com/Hvass-Labs/TensorFlow-Tutorialshttps://github.com/NickShahML/tensorflow_with_latest_papershttps://github.com/nfmcclure/tensorflow_cookbookhttps://github.com/ppwwyyxx/tensorpackhttps://github.com/rasbt/deep-learning-bookhttps://github.com/pkmital/CADLhttps://github.com/tensorflow/skflow 无人驾驶https://github.com/kevinhughes27/TensorKarthttps://github.com/SullyChen/Autopilot-TensorFlow 深度强化学习https://github.com/dennybritz/reinforcement-learninghttps://github.com/zsdonghao/tensorlayerhttps://github.com/matthiasplappert/keras-rlhttps://github.com/nivwusquorum/tensorflow-deepqhttps://github.com/devsisters/DQN-tensorflowhttps://github.com/coreylynch/async-rlhttps://github.com/carpedm20/deep-rl-tensorflowhttps://github.com/yandexdataschool/Practical_RL 自然语言处理文本分类https://github.com/dennybritz/cnn-text-classification-tf 序列建模https://github.com/google/seq2seq 中文分词https://github.com/koth/kcws 基于文本的图像合成https://github.com/paarthneekhara/text-to-image RNN语言建模https://github.com/sherjilozair/char-rnn-tensorflowhttps://github.com/silicon-valley-data-science/RNN-Tutorial 神经图灵机https://github.com/carpedm20/NTM-tensorflow 小黄鸡https://github.com/wong2/xiaohuangji 语音领域语音合成https://github.com/ibab/tensorflow-wavenethttps://github.com/tomlepaine/fast-wavenet 语音识别https://github.com/buriburisuri/speech-to-text-wavenethttps://github.com/pannous/tensorflow-speech-recognition 计算机视觉风格转换https://github.com/anishathalye/neural-stylehttps://github.com/cysmith/neural-style-tf 运用GAN图像生成https://github.com/carpedm20/DCGAN-tensorflow 图像到图像的翻译https://github.com/affinelayer/pix2pix-tensorflow 图像超分辨https://github.com/Tetrachrome/subpixel 人脸识别https://github.com/davidsandberg/facenet 目标检测https://github.com/TensorBox/TensorBox 运动识别https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition 图像复原https://github.com/bamos/dcgan-completion.tensorflow 生成模型https://github.com/wiseodd/generative-models TensorFlow实时debug工具https://github.com/ericjang/tdb TensorFlow在树莓派上的应用https://github.com/samjabrahams/tensorflow-on-raspberry-pi TensorFlow基于R的应用https://github.com/rstudio/tensorflow 实时Spark与TensorFlow的输入pipelinehttps://github.com/fluxcapacitor/pipelinehttps://github.com/yahoo/TensorFlowOnSpark caffe与TensorFlow结合https://github.com/ethereon/caffe-tensorflow 概率建模https://github.com/blei-lab/edward PS:其中部分转自http://www.wobei.org/wenzhang/2017042811/344024.html]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[深度学习-中文分词]]></title>
      <url>%2F2017%2F04%2F30%2F%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0-%E4%B8%AD%E6%96%87%E5%88%86%E8%AF%8D%2F</url>
      <content type="text"><![CDATA[1. 前言很久没有写博客了，这段时间一直在忙着各种数学建模比赛，加上这个中文分词的代码比较难(注释太少了！)，基于双向长短时记忆单元(Bi-LSTM)和条件随机场(CRF)，我看了很久才看懂一些皮毛，所以今天才来Commit一下。首先附上这个开源项目的源代码。在网上各种百度，都没有看到详细的解释，于是只有自己慢慢读代码了TAT，代码是用C++和Python混合开发的，运用了Google的bazel代码构架工具可以进行Build，运用Python进行模型的训练，再运用C++进行模型调用(估计是C++调用的速度会快一些)，神经网络的框架是基于TensorFlow_1.0，还运用了Word2Vec相关工具进行语料预处理，语料库来自2014年的人民日报，是已经分词完成的句子。(真的，看这个代码，就像看别人变魔术一样。) 2. 代码解读要看懂这个代码，首先你需要掌握一些机器学习的算法。下面我简单介绍一下。如果你都知道就可以跳过这一节。 2.1. Word2VecWord2Vec(下文简称W2V)是一种将词语量化的工具(量化后电脑才能看懂)，指 mikolov 在 2011 年发表的 paper 中用到的模型和工具。实际也是运用了神经网络，具体来说，是 CBOW 和 SKIP-GRAM 两个模型，negative sampling 和 hierarchical softmax 两种训练方法组成的 word embedding 训练方法。此代码运用了CBOW的思路，就是通过上下文的词语预测来进行神经网络的训练，在此我不细讲，具体思路可以看看这里。 2.2. RNN (Recurrent Neural Networks)循环神经网络，已经在众多自然语言处理中取得了巨大成功以及广泛应用。如下图1，左侧是的x是输入，U、V、W为权值，s为一个节点，o是输出。将这个结构进行展开，就能更直观的理解RNN了。每层神经网络的信息都会传递给下一个节点。这种串联的结构天然就非常适合时间序列型数据的处理和分析。 图1 循环神经网络结构图 但RNN也会有缺点，就是间隔太远的输入信息，它很难记住。于是就提出了LSTM单元。 2.3. LSTM (Long Short-Term Memory)LSTM 其实是在1997年由 Hochreiter &amp; Schmidhuber提出(我刚刚出生)，并在近期被 Alex Graves 进行了改良和推广。它的结构如下图2。 图2 LSTM网络结构图 其中包含4层神经网络，其中小圆点是point-wise操作，比如向量加法，点乘等。小矩形代表一层可学习参数的神经网络。具体思路看这里。 具体代码中是使用的Bi-LSTM，双向长短时记忆单元，应该是受到了W2V的启发，Bi-LSTM是结合上下文来进行神经网络的训练的。传统的LSTM只能根据上文来猜测下文，而Bi-LSTM优化了这一点，它的结构如下图3。 图3 Bi-LSTM网络结构图(来源) 2.4. CRF (Conditional Random Field)条件随机场这个东西，是个数学问题，应该是概率论的知识，可以把它结合HMM(Hidden Markov Model)一起理解，个人理解是：CRF讲的是一个序列中，各个元素之间的出现概率，HMM讲的是，这个序列出现的概率。具体权威解释，请看这里 3. 代码解读假设你已经明白上面的理论知识，你就可以开始看代码了。我会跟着它Github上的操作步骤来解释代码。切换到代码目录，运行:1234 &gt; python kcws/train/process_anno_file.py &lt;绝对路径到语料目录&gt; pre_chars_for_w2v.txt#这一步是去除语料库的标注信息，并且将每个字都分开。 &gt; bazel build third_party/word2vec:word2vec#这一步是构建word2vec的环境 先得到初步词表12 &gt; ./bazel-bin/third_party/word2vec/word2vec -train pre_chars_for_w2v.txt -save-vocab pre_vocab.txt -min-count 3#得到一个词频字典 处理低频词12 &gt; python kcws/train/replace_unk.py pre_vocab.txt pre_chars_for_w2v.txt chars_for_w2v.txt#我对比过chars_for_w2v.txt和pre_chars_for_w2v.txt，他们两个是一样的啊，感觉这一步没有用。 训练word2vec &gt; ./bazel-bin/third_party/word2vec/word2vec -train chars_for_w2v.txt -output vec.txt -size 50 -sample 1e-4 -negative 5 -hs 1 -binary 0 -iter 5 #这儿就开始变魔术了，这一步可以得到每个字的字向量。 构建训练语料工具 &gt; bazel build kcws/train:generate_training #构建generate_training的环境，generate_training.py是用来得到没句话的向量和标注的。 #这个标注有个解释： # label -1, unknown # 0-&gt; &apos;S&apos; 代表Single，单字 # 1-&gt; &apos;B&apos; 代表Begin，词的开始 # 2-&gt; &apos;M&apos; 代表Middle，词的中间 # 3-&gt; &apos;E&apos; 代表End，词的结束 生成语料 &gt; ./bazel-bin/kcws/train/generate_training vec.txt &lt;绝对路径到语料目录&gt; all.txt #通过vec.txt讲语料库的文本转化为向量，同时将其标注。 得到train.txt , test.txt文件 &gt; python kcws/train/filter_sentence.py all.txt #选8000个为测试集，其余为训练集。 安装好tensorflow,切换到kcws代码目录，运行: &gt; python kcws/train/train_cws_lstm.py --word2vec_path vec.txt --train_data_path &lt;绝对路径到train.txt&gt; --test_data_path test.txt --max_sentence_len 80 --learning_rate 0.001 生成vocab &gt; bazel build kcws/cc:dump_vocab &gt; &gt; ./bazel-bin/kcws/cc/dump_vocab vec.txt kcws/models/basic_vocab.txt #目前不知道这个有啥用，应该是调用模型时用的，C++的代码我还没有看。。 导出训练好的模型 &gt; python tools/freeze_graph.py --input_graph logs/graph.pbtxt --input_checkpoint logs/model.ckpt --output_node_names &quot;transitions,Reshape_7&quot; --output_graph kcws/models/seg_model.pbtxt #导出计算TensorFlow的计算图谱，以便以后调用。 4. 最后一句最后调用模型的C++代码我还没有看，以后看了再Update这篇文章，官方说这个代码思路是参考论文：http://www.aclweb.org/anthology/N16-1030。我看了一下这篇论文，是一个关于命名实体识别的运用，此代码是中文分词的，可见此代码也改进不少，说不定整理一下，可以发一篇论文，但是这样有点狗，哈哈哈，关于字词向量那个地方我还有点迷糊，不知道是如何通过字向量得到词向量的？直接相加吗？有点不靠谱。。]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[PHP读取doc文件的问题]]></title>
      <url>%2F2017%2F04%2F03%2FPHP%E8%AF%BB%E5%8F%96doc%E6%96%87%E4%BB%B6%E7%9A%84%E9%97%AE%E9%A2%98%2F</url>
      <content type="text"><![CDATA[在这个问题上遇见了很多坑，百度了两天，最后总结出一个经验：百度的境界分为三层，初级阶段是能直接百度到中文解答，中级阶段是能百度到英文解答，高级阶段是根本百度不到你想要的答案！ 首先声明一下这是Linux下读取doc和docx的问题，不是windows，因为windows可以简单的使用COM组件来解决问题！ antiwordantiword是Linux下自带的插件，就它的名字来看，anti就是反对，抵制的意思，所以它就是一个用来读取word文档的插件，哈哈哈哈，瞎几把解释！用法如下：1antiword test.doc 这样他就能把doc里面的内容显示出来！BUT！有些文档它居然报错，有些doc文档它居然说不是doc。。。所以，只有换掉。。 catdoccatdoc和antiword用法类似，先给上一个疑似官方文档的文档-&gt;点我，这是一个非美国的老外写的文档，看了半天，还是没解决我的问题，就是catdoc会产生中文乱码，它的文档里面只写了。有可能产生乱码。。。用法也是：1catdoc test.doc libreoffice最终我终于找到一个解决方法，就是libreoffice，先给上官网链接，这是中文官网，但是说明文档还是英文的。运气好的是，他能解决我的所有问题，用法如下：1soffice --headless --convert-to docx --outdir 输出文件路径 转换文件路径 –headless 是不产生图形界面，只在后台运行–outdir 是输出文件路径的意思，如果不加这个参数，默认输出到当前路径 这个功能就相当强大啦，能够进行各种文件的转换！但是我又遇见问题了，当我用PHP调用这个shell命令时，他却报错说：java运行环境缺失。最后在stackoverflow上找到解答说是，要先暴露一下HOME环境，所以这样写就能解决问题啦：123$cmd = &apos;HOME=&apos;.getCWD().&apos; &amp;&amp; export HOME &amp;&amp; soffice --headless --convert-to docx --outdir &apos;.$thumb_url.&apos; &apos;.$File;exec($cmd);]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[一些PHP的小技术]]></title>
      <url>%2F2017%2F03%2F13%2F%E4%B8%80%E4%BA%9BPHP%E7%9A%84%E5%B0%8F%E6%8A%80%E6%9C%AF%2F</url>
      <content type="text"><![CDATA[前言以我做过的一个小项目为例，开源一些核心小技巧，当然我不会开源全部代码，毕竟这可以算一个商业项目，而且我也只知道后台的技术^-^ （PS：解释我大部分写在注释里。） PHP后台发smtp邮件就是用于做邮箱验证的，部分代码是我自己写的，在网上找了很久，找到一个令我满意的class(点我查看原网址)，就直接调用了里面的功能，然后自己根据情形改了一些参数。这个email.class.php有点多，我会直接上传到github，然后讲一下其他代码的功能。首先是index.html，是做一个显示提示内容的，相信大家都懂！1234567891011121314151617index.html&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Transitional//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt;&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;&lt;head&gt;&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=gb2312&quot; /&gt;&lt;title&gt;PHP利用smtp类发送邮件范例&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;form action=&quot;sendmail.php&quot; method=&quot;post&quot;&gt; &lt;p&gt;收件人：&lt;input type=&quot;text&quot; name=&quot;toemail&quot; /&gt;&lt;/p&gt; &lt;p&gt;标&amp;nbsp;&amp;nbsp;题：&lt;input type=&quot;text&quot; name=&quot;title&quot; /&gt;&lt;/p&gt; &lt;p&gt;内&amp;nbsp;&amp;nbsp;容：&lt;textarea name=&quot;content&quot; cols=&quot;50&quot; rows=&quot;5&quot;&gt;&lt;/textarea&gt;&lt;/p&gt; &lt;p&gt;&lt;input type=&quot;submit&quot; value=&quot;发送&quot; /&gt;&lt;/p&gt;&lt;/form&gt;&lt;/body&gt;&lt;/html&gt; 然后是发邮件的配置文件:123456789101112131415161718192021222324252627282930313233343536sendmail.php&lt;?php/** * 注：本邮件类都是经过我测试成功了的，如果大家发送邮件的时候遇到了失败的问题，请从以下几点排查： * 1. 用户名和密码是否正确； * 2. 检查邮箱设置是否启用了smtp服务； * 3. 是否是php环境的问题导致； * 4. 将22行的$smtp-&gt;debug = false改为true，可以显示错误信息，然后可以复制报错信息到网上搜一下错误的原因； */require_once &quot;email.class.php&quot;;//******************** 配置信息 ********************************$smtpserver = &quot;smtp.163.com&quot;;//SMTP服务器$smtpserverport =25;//SMTP服务器端口$smtpusermail = &quot;m15310977608_1@163.com&quot;;//SMTP服务器的用户邮箱$smtpemailto = $_POST[&apos;toemail&apos;];//发送给谁$smtpuser = &quot;m15310977608_1@163.com&quot;;//SMTP服务器的用户帐号$smtppass = &quot;xxxxxxxxxx&quot;;//SMTP服务器的用户密码$mailtitle = $_POST[&apos;title&apos;];//邮件主题$mailcontent = &quot;&lt;h1&gt;&quot;.$_POST[&apos;content&apos;].&quot;&lt;/h1&gt;&quot;;//邮件内容$mailtype = &quot;HTML&quot;;//邮件格式（HTML/TXT）,TXT为文本邮件//************************ 配置信息 ****************************$smtp = new smtp($smtpserver,$smtpserverport,true,$smtpuser,$smtppass);//这里面的一个true是表示使用身份验证,否则不使用身份验证.$smtp-&gt;debug = true;//是否显示发送的调试信息$state = $smtp-&gt;sendmail($smtpemailto, $smtpusermail, $mailtitle, $mailcontent, $mailtype);echo &quot;&lt;div style=&apos;width:300px; margin:36px auto;&apos;&gt;&quot;;if($state==&quot;&quot;)&#123; echo &quot;对不起，邮件发送失败！请检查邮箱填写是否有误。&quot;; echo &quot;&lt;a href=&apos;index.html&apos;&gt;点此返回&lt;/a&gt;&quot;; exit();&#125;echo &quot;恭喜！邮件发送成功！！&quot;;echo &quot;&lt;a href=&apos;index.html&apos;&gt;点此返回&lt;/a&gt;&quot;;echo &quot;&lt;/div&gt;&quot;;?&gt; 图片上传以及压缩就是接受form表单的FILE，然后存到本地。首先是图片上传的接口代码：123456789101112131415161718192021222324252627282930313233343536373839404142434445464748&lt;?phpdefine(&quot;IMGPATH&quot;,&quot;/upload/image/&quot;); //设置默认上传路径$serverRoot = $_SERVER[&apos;DOCUMENT_ROOT&apos;];//修改权限$img = $_FILES[&apos;inputImg&apos;];$return = new stdClass();//接受文件并且判断if($img)&#123; $return-&gt;status = &quot;success&quot;; $tmp_path = $img[&apos;tmp_name&apos;]; $img_name = $img[&apos;name&apos;]; // $img_type = $img[&apos;type&apos;]; $regex = &apos;/.(jpg|png|jpeg|gif)/&apos;;//正则匹配，判断是否问图片 preg_match($regex,strtolower($img_name),$type); date_default_timezone_set(&quot;PRC&quot;); $nowDay = date(&quot;Ymd&quot;); $timeUnix = time(); $dirPath = $serverRoot.IMGPATH.$nowDay.&quot;/&quot;; $hashName = $timeUnix.abs(crc32($img_name)).$type[0]; //生成纯数字的hash随机名称，放在重复 $file_relative_path = IMGPATH.$nowDay.&quot;/&quot;.$hashName; if(!is_dir($dirPath)) mkdir($dirPath); copy($tmp_path,$dirPath.$hashName); $img = &quot;http://120.27.247.68&quot;.$file_relative_path; $return-&gt;url = $img;&#125;else&#123; $return-&gt;status = &quot;fail&quot;; $return-&gt;error = &quot;error&quot;;&#125;$localImg=&quot;/home/wwwroot/default&quot;.$file_relative_path;//如果大于500kb则压缩if(filesize($localImg)&gt;50000) &#123; $data = array(&quot;imgUrl&quot; =&gt; $img); $data = http_build_query($data); $opts = array( &apos;http&apos;=&gt;array( &apos;method&apos; =&gt; &apos;POST&apos;, &apos;header&apos; =&gt; &apos;Content-type: application/x-www-form-urlencoded&apos;, &apos;content&apos; =&gt; $data ) ); $cxContext = stream_context_create($opts); $sFile = file_get_contents(&quot;http://120.27.247.68/api/image/imgZip.php&quot;, false, $cxContext);&#125;echo(json_encode($return)); 后面这个是图片压缩的代码，其实就是降低图片像素，再替换原图片：123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172&lt;?php/** * Created by PhpStorm. * User: silence * Date: 2016/10/25 * Time: 下午3:12 *//** * desription 压缩图片 * @param string $imgsrc 图片路径 * @param string $imgdst 压缩后保存路径 */function image_png_size_add($imgsrc)&#123; $imgdst = $imgsrc; list($width,$height,$type)=getimagesize($imgsrc);// $new_width = ($width&gt;1600?1600:$width)*0.9;// $new_height =($height&gt;1200?1200:$height)*0.9; $new_width = $width*0.5; $new_height = $height*0.5; switch($type)&#123; case 1: $giftype=check_gifcartoon($imgsrc); if($giftype)&#123; header(&apos;Content-Type:image/gif&apos;); $image_wp=imagecreatetruecolor($new_width, $new_height); $image = imagecreatefromgif($imgsrc); imagecopyresampled($image_wp, $image, 0, 0, 0, 0, $new_width, $new_height, $width, $height); imagejpeg($image_wp, $imgdst,75); imagedestroy($image_wp); &#125; break; case 2: header(&apos;Content-Type:image/jpeg&apos;); $image_wp=imagecreatetruecolor($new_width, $new_height); $image = imagecreatefromjpeg($imgsrc); imagecopyresampled($image_wp, $image, 0, 0, 0, 0, $new_width, $new_height, $width, $height); imagejpeg($image_wp, $imgdst,75); imagedestroy($image_wp); break; case 3: header(&apos;Content-Type:image/png&apos;); $image_wp=imagecreatetruecolor($new_width, $new_height); $image = imagecreatefrompng($imgsrc); imagecopyresampled($image_wp, $image, 0, 0, 0, 0, $new_width, $new_height, $width, $height); imagejpeg($image_wp, $imgdst,75); imagedestroy($image_wp); break; &#125;&#125;/** * desription 判断是否gif动画 * @param string $image_file图片路径 * @return boolean t 是 f 否 */function check_gifcartoon($image_file)&#123; $fp = fopen($image_file,&apos;rb&apos;); $image_head = fread($fp,1024); fclose($fp); return preg_match(&quot;/&quot;.chr(0x21).chr(0xff).chr(0x0b).&apos;NETSCAPE2.0&apos;.&quot;/&quot;,$image_head)?false:true;&#125;@$imgFile = $_POST[&apos;imgUrl&apos;];echo $imgFile.&quot;&lt;br&gt;&quot;;$imgFile=str_replace(&quot;http://120.27.247.68&quot;,&quot;/home/wwwroot/default&quot;,$imgFile);echo $imgFile.&quot;&lt;br&gt;&quot;;//@$imgFile = &quot;/home/wwwroot/default/WeChatProject/upload/image/20161019/14768660031366040191.jpg&quot;;if(!file_exists($imgFile))&#123; echo &quot;要压缩的文件不存在。&quot;;&#125;else&#123; image_png_size_add($imgFile);&#125;?&gt; 上传文件也是同样的道理。 PHP发短信验证码详见LeanCloud说明文档，我就不在此赘述。 PHP爬虫后的匹配PHP的爬虫就是一个简单的CURL，爬完数据后的整理数据才比较重要！主要用的一个开源库simple_html_dom，它能快捷匹配到html的dom节点，很方便。访问他的官方说明文档，会有很多例子啦！ mi_push推送功能哇！做这一块的时候，我的心态是爆炸的，因为小米官方并没有给出说明稳定，只给出了SDK，于是我就照着JAVA的说明文档写代码。这里给出我自己猜测的理解！先上官方SDK的链接。 然后里面的关键词和JAVA文档中的差不多，以安卓为例，IOS的差不多：123456789101112131415161718192021222324252627282930313233343536373839404142&lt;?phpheader(&apos;Access-Control-Allow-Origin: *&apos;);use xmpush\Builder;use xmpush\HttpBase;use xmpush\Sender;use xmpush\Constants;use xmpush\Stats;use xmpush\Tracer;use xmpush\Feedback;use xmpush\DevTools;use xmpush\Subscription;use xmpush\TargetedMessage;include_once(dirname(__FILE__).&apos;/autoload.php&apos;);$secret = &apos;开发者信息&apos;;$package = &apos;开发者信息&apos;;// 常量设置必须在new Sender()方法之前调用Constants::setPackage($package);Constants::setSecret($secret);$fileId = $_POST[&apos;fileId&apos;];$title = $_POST[&apos;title&apos;];$url = $_POST[&apos;url&apos;];$desc = $title;$title = &apos;教案推送!&apos;;$payload = &apos;&#123;&quot;fileId&quot;:&quot;&apos;.$fileId.&apos;&quot;,&quot;msg&quot;:&quot;教案推送&quot;,&quot;url&quot;:&quot;&apos;.$url.&apos;&quot;&#125;&apos;;$sender = new Sender();// message 演示预定义点击行为中的点击直接打开app行为$message = new Builder();$message-&gt;title($title);//推送标题$message-&gt;description($desc);//推送的内容$message-&gt;passThrough(0);$message-&gt;payload($payload); // 对于预定义点击行为，payload会通过点击进入的界面的intent中的extra字段获取，而不会调用到onReceiveMessage方法。$message-&gt;extra(Builder::notifyEffect, 2); // 此处设置预定义点击行为，1为打开app$message-&gt;extra(Builder::intentUri, &quot;intent:#Intent;component=cn.aike.cloudnews/.activity.message.MessageActivity;end&quot;); // 这个是客户端配置的包名$message-&gt;extra(Builder::notifyForeground, 1);$message-&gt;notifyId(0);$message-&gt;build();print_r($sender-&gt;broadcastAll($message)-&gt;getRaw());//显示推送后的返回信息?&gt; 视频内容视频内容采用的是第三方搜狐视频，主要是把视频都放在自己的服务器上，加载太慢了，然后就用了第三方。]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[PHP开发无课表系统]]></title>
      <url>%2F2017%2F03%2F10%2FPHP%E5%BC%80%E5%8F%91%E6%97%A0%E8%AF%BE%E8%A1%A8%E7%B3%BB%E7%BB%9F%2F</url>
      <content type="text"><![CDATA[前言这是以前做过的一个小系统，开源给大家分享一下，主要用到PHP的curl和mysql，加上前端SUI Mobile的布局。利用curl爬去校园网的课表数据，从而实现功能。效果图如下： 相关工作首先你要会用浏览器的开发者工具F12，其实就是用来抓包，chrome可以用Firebug。然后就是有个PHP的服务器，xampp这样的就可以啦！当然，做后端开发的都必须懂一些前段知识。 核心思路模拟校园网登陆，首先抓包看一下校园网的登陆方式，发现我们校园网是直接POST登录，然后记录Cookie的，于是就可以使用PHP的CURL来模拟登录啦！然后用Cookie模拟POST获取课表信息。 核心代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172require_once(&quot;include/config.php&quot;);$time=(string)time();@$username=$_POST[&apos;uname&apos;];@$pass=$_POST[&apos;pwd&apos;];$cookie_file = &apos;../computer/cookie/cookie_&apos;.$time.&apos;.txt&apos;;$cookie_file2 =&apos;../computer/cookie/cookie2_&apos;.$time.&apos;.txt&apos;;$cookie_file3 = &apos;../computer/cookie/cookie3_&apos;.$time.&apos;.txt&apos;;$mydata1=&quot;serviceInfo=%7B%22serviceAddress%22%3A%22https%3A%2F%2Fuaaap.swu.edu.cn%2Fcas%2Fws%2FacpInfoManagerWS%22%2C%22serviceType%22%3A%22soap%22%2C%22serviceSource%22%3A%22td%22%2C%22paramDataFormat%22%3A%22xml%22%2C%22httpMethod%22%3A%22POST%22%2C%22soapInterface%22%3A%22getUserInfoByUserName%22%2C%22params%22%3A%7B%22userName%22%3A%22$username%22%2C%22passwd%22%3A%22$pass%22%2C%22clientId%22%3A%22yzsfwmh%22%2C%22clientSecret%22%3A%221qazz%40WSX3edc%24RFV%22%2C%22url%22%3A%22http%3A%2F%2Fi.swu.edu.cn.*%22%7D%2C%22cDataPath%22%3A%5B%5D%2C%22namespace%22%3A%22%22%2C%22xml_json%22%3A%22%22%7D&quot;;$urlUrp1=&apos;http://i.swu.edu.cn/remote/service/process&apos;;$ch = curl_init ();// print_r($ch);curl_setopt ( $ch, CURLOPT_URL, $urlUrp1 );curl_setopt ( $ch, CURLOPT_POST, 1 );curl_setopt ( $ch, CURLOPT_HEADER, 0 );curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt ( $ch, CURLOPT_POSTFIELDS, $mydata1 );curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION,true);curl_setopt ( $ch, CURLOPT_COOKIEJAR,$cookie_file); //存储cookies$response=curl_exec ( $ch );curl_close ( $ch );$regex=&quot;/\&quot;,\&quot;tgt\&quot;:\&quot;(.*)=\&quot;,\&quot;/&quot;;preg_match_all($regex,$response,$res_tgt);$tgt=$res_tgt[1][0].&quot;=&quot;;echo(&quot;&lt;br&gt;&quot;);//echo(base64_decode($tgt));$headers = array( &apos;Cookie: CASTGC=&quot;&apos;.base64_decode($tgt).&apos;&quot;&apos;,);$urlJW=&apos;https://uaaap.swu.edu.cn/cas/login?service=http%3A%2F%2Fjw.swu.edu.cn%2Fssoserver%2Flogin%3Fywxt%3Djw&apos;;$ch = curl_init($urlJW);curl_setopt($ch, CURLOPT_HEADER,1);curl_setopt($ch, CURLOPT_HTTPHEADER,$headers);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_FOLLOWLOCATION,true);$response = curl_exec($ch);curl_close($ch);$regex=&quot;/JSESSIONID=(.*); Path=\/jwglxt/&quot;;preg_match_all($regex,$response,$rawcookie);//var_dump($rawcookie);@$mycookie = substr($rawcookie[0][0],11,32);function post3($url, $data,$cookie)&#123;//file_get_content $postdata = http_build_query( $data ); $opts = array(&apos;http&apos; =&gt; array( &apos;method&apos; =&gt; &apos;POST&apos;, &apos;header&apos; =&gt; &apos;Cookie: JSESSIONID=&apos;.$cookie, &apos;content&apos; =&gt; $postdata ) ); $context = stream_context_create($opts); $result = @file_get_contents($url, false, $context); return $result;&#125;$mydata = array(&apos;xnm&apos;=&gt;&apos;2016&apos;,&apos;xqm&apos;=&gt;&apos;3&apos;);$myresult=post3(&apos;http://jw.swu.edu.cn/jwglxt/kbcx/xskbcx_cxXsKb.html?gnmkdmKey=N253508&apos;,$mydata,$mycookie);$myArr=json_decode($myresult, true); 全部代码，详见Github。]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[Windows部署TensorFlow]]></title>
      <url>%2F2017%2F03%2F08%2FWindows%E9%83%A8%E7%BD%B2TensorFlow%2F</url>
      <content type="text"><![CDATA[CPU版TensorFlowCPU版的TensorFlow都很简单，参见上次的教程 GPU版TensorFlow####1.下载GPU依赖库PS:首先你要有Nvida的显卡哦！ 需要用到的链接安装顺序由上到下：# Anaconda安装，选择py3.5版本(目前2.7不行)https://www.continuum.io/downloads# cuda安装(这个官方下载很慢，我给一个百度云链接，8.0版Cuda)https://developer.nvidia.com/cuda-downloads# cudnn下载就好，不用安装，但要配置环境变量https://developer.nvidia.com/cudnn 安装Anaconda前可以把你原来的Python卸载了，Anaconda自带Python。 前两个Anaconda和cuda都是exe安装包，一路点下一步就能安装。 最后一个cudnn需要像我这样配置一下环境变量：此电脑=&gt;高级系统设置=&gt;高级=&gt;环境变量=&gt;系统变量找到path那个变量向其中添加cudnn的路径。 在此，TF依赖库安装完成！ ####2.下载安装TF直接调出CMD然后输入1pip install --ignore-installed --upgrade tensorflow-gpu 稍等几分钟就能用啦！ ####3.测试TFCMD进入Python1&gt;&gt;&gt; import tensorflow as tf 如果正常运行就好啦！]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[基于HEXO搭建个人博客]]></title>
      <url>%2F2017%2F03%2F02%2F%E5%9F%BA%E4%BA%8EHEXO%E6%90%AD%E5%BB%BA%E4%B8%AA%E4%BA%BA%E5%8D%9A%E5%AE%A2%2F</url>
      <content type="text"><![CDATA[什么是 Hexo？Hexo 是一个快速、简洁且高效的博客框架。Hexo 使用 Markdown（或其他渲染引擎）解析文章，在几秒内，即可利用靓丽的主题生成静态网页。 首先附上官方链接 我查阅了很多资料，但由于每个人版本不同，有各种坑，然后根据我自己遇到的坑总结以下经验： 开始在安装前，您必须检查电脑中是否已安装下列应用程序： Node.jsGit 附上目前最新3.1.0版本的Node.js下载路径以及NodeJS官网 Git就自己去下载啦，windows就去官网下载，在Mac电脑的终端里面输入git回车，他就会自动下载啦！ 下载完后：调出命令行，以管理员身份执行下面命令安装hexo：1npm install hexo -g #这一步有点慢 然后就可以初始化你的博客啦，nodejs是一个自动生成H5代码的工具，所以可以自动生成代码，真的很方便，让我一天就搭建好博客啦！ 首先找到本地存放博客代码的文件夹，用一下代码初始化它： 1sudo hexo init 目录名 然后你可以这样操作来看看这个博客最初始的状态：12hexo g #生成静态页面hexo s #开启本地服务器 然后你如果看到这样的页面，就说明你以上的操作都成功啦！ next主题但是明显这个主题有点丑。。于是我找到了next主题,附上下载地址以及NEXT官网。 下载next后，将其放在themes文件夹里面，然后在站点配置文件里改一下主题theme: next。 123hexo clean #清除上次生成的静态页面hexo g #生成静态页面hexo s #开启本地服务器 这样就可以看到好看的NEXT主题啦！ 在此声明一下：你会发现两个配置文件都叫_config.yml，在此规定theme里面的叫主题配置文件，外面第一个_config.yml叫站点配置文件。 部署到服务器最新的hexo真的很方便，不用配置ssh就能记住用户密码啦！只用在站点配置文件最后添加一下配置信息如下： 123456deploy: type: git repo: GitHub: https://github.com/Silencezjl/Silencezjl.github.io.git Coding: https://git.coding.net/Silencezjl/Silencezjl.git branch: master 我这里同时部署到啦github和coding，一个是国外一个是国内，你只写一个也行。 特别注意的是：配置文件的冒号后面一调要有空格！！！！ 然后你就可以通过：1hexo new &quot;文章名字&quot; #新建文章来写文章，可以下载一个Markdown编辑器，比如chrome的插件马克飞象！ 每次添加完文章记得用hexo g命令来重新生成一次静态页面哦！ 1hexo d #上传到git服务器 coding和github都挺不错的，提供了静态页面的服务器，在coding或github中创建一个与用户名相同的repo，再上传到这个repo就好啦，hexo的hexo d命令就能帮您上传，其实就是将hexo g生成的public文件夹上传。(PS：coding记得要设置一下打开pages服务) 插件NEXT官网提供了所有插件的安装方式，点我传送。 还有就是让你的博客能被百度和谷歌搜索到的方法，点我传送。 在此我就不赘述啦，Have Fun！]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[相册]]></title>
      <url>%2F2017%2F03%2F02%2Fphoto%2F</url>
      <content type="text"><![CDATA[]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[第一篇论文]]></title>
      <url>%2F2017%2F03%2F01%2F%E7%AC%AC%E4%B8%80%E7%AF%87%E8%AE%BA%E6%96%87%2F</url>
      <content type="text"><![CDATA[基于 word2vec 与 LVQ 的作业查重及评分系统]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[The problem D of Methematical Contest 2017]]></title>
      <url>%2F2017%2F01%2F21%2FThe%20problem%20D%20of%20Methematical%20Contest%202017%2F</url>
      <content type="text"><![CDATA[Preface：The problem D of Mathematical Contest 2017 is my first try to it, and share our answer here! Contest topics：PROBLEM D: Optimizing the Passenger Throughput at an Airport Security CheckpointFollowing the terrorist attacks in the US on September 11, 2001, airport security has been significantly enhanced throughout the world. Airports have security checkpoints, where passengers and their baggage are screened for explosives and other dangerous items. The goals of these security measures are to prevent passengers from hijacking or destroying aircraft and to keep all passengers safe during their travel. However, airlines have a vested interest in maintaining a positive flying experience for passengers by minimizing the time they spend waiting in line at a security checkpoint and waiting for their flight. Therefore, there is a tension between desires to maximize security while minimizing inconvenience to passengers. During 2016, the U.S. Transportation Security Agency (TSA) came under sharp criticism for extremely long lines, in particular at Chicago’s O’Hare international airport. Following this public attention, the TSA invested in several modifications to their checkpoint equipment and procedures and increased staffing in the more highly congested airports. While these modifications were somewhat successful in reducing waiting times, it is unclear how much cost the TSA incurred to implement the new measures and increase staffing. In addition to the issues at O’Hare, there have also been incidents of unexplained and unpredicted long lines at other airports, including airports that normally have short wait times. This high variance in checkpoint lines can be extremely costly to passengers as they decide between arriving unnecessarily early or potentially missing their scheduled flight. Numerous news articles, including [1,2,3,4,5], describe some of the issues associated with airport security checkpoints. Your Internal Control Management (ICM) team has been contracted by the TSA to review airport security checkpoints and staffing to identify potential bottlenecks that disrupt passenger throughput. They are especially interested in creative solutions that both increase checkpoint throughput and reduce variance in wait time, all while maintaining the same standards of safety and security. The current process for a US airport security checkpoint is displayed in Figure 1. Zone A: o Passengers randomly arrive at the checkpoint and wait in a queue until a security officer can inspect their identification and boarding documents. Zone B: The passengers then move to a subsequent queue for an open screening line; depending on the anticipated activity level at the airport, more or less lines may be open. Once the passengers reach the front of this queue, they prepare all of their belongings for X-ray screening. Passengers must remove shoes, belts, jackets, metal objects, electronics, and containers with liquids, placing them in a bin to be X-rayed separately; laptops and some medical equipment also need to be removed from their bags and placed in a separate bin. All of their belongings, including the bins containing the aforementioned items, are moved by conveyor belt through an X-ray machine, where some items are flagged for additional search or screening by a security officer (Zone D). Meanwhile the passengers process through either a millimeter wave scanner or metal detector. Passengers that fail this step receive a pat-down inspection by a security officer (Zone D). Zone C: The passengers then proceed to the conveyor belt on the other side of the X-ray scanner to collect their belongings and depart the checkpoint area. Figure 1: Illustration of the TSA Security Screening Process. Approximately 45% of passengers enroll in a program called Pre-Check for trusted travelers. These passengers pay $85 to receive a background check and enjoy a separate screening process for five years. There is often one Pre-Check lane open for every three regular lanes, despite the fact that more passengers use the Pre-Check process. Pre-Check passengers and their bags go through the same screening process with a few modifications designed to expedite screening. Pre-Check passengers must still remove metal and electronic items for scanning as well as any liquids, but are not required to remove shoes, belts, or light jackets; they also do not need to remove their computers from their bags. Data has been collected about how passengers proceed through each step of the security screening process.Your specific tasks are: a. Develop one or more model(s) that allow(s) you to explore the flow of passengers through a security check point and identify bottlenecks. Clearly identify where problem areas exist in the current process. b. Develop two or more potential modifications to the current process to improve passenger throughput and reduce variance in wait time. Model these changes to demonstrate how your modifications impact the process. c. It is well known that different parts of the world have their own cultural norms that shape the local rules of social interaction. Consider how these cultural norms might impact your model. For example, Americans are known for deeply respecting and prioritizing the personal space of others, and there is a social stigma against “cutting” in front of others. Meanwhile, the Swiss are known for their emphasis on collective efficiency, and the Chinese are known for prioritizing individual efficiency. Consider how cultural differences may impact the way in which passenger’s process through checkpoints as a sensitivity analysis. The cultural differences you apply to your sensitivity analysis can be based on real cultural differences, or you can simulate different traveler styles that are not associated with any particular culture (e.g., a slower traveler). How can the security system accommodate these differences in a manner that expedites passenger throughput and reduces variance? d. Propose policy and procedural recommendations for the security managers based on your model. These policies may be globally applicable, or may be tailored for specific cultures and/or traveler types. In addition to developing and implementing your model(s) to address this problem, your team should validate your model(s), assess strengths and weaknesses, and propose ideas for improvement (future work). Your ICM submission should consist of a 1 page Summary Sheet and your solution cannot exceed 20 pages for a maximum of 21 pages. Note: The appendix and references do not count toward the 20 page limit. References: [1] http://www.wsj.com/articles/why-tsa-security-lines-arent-as-bad-as-youd-feared-1469032116 [2] http://www.chicagotribune.com/news/ct-tsa-airport-security-lines-met-20160823-story.html [3] http://www.cnn.com/2016/06/09/travel/tsa-security-line-wait-times-how-long/ [4] http://wgntv.com/2016/07/13/extremely-long-lines-reported-at-chicago-midway-airports-tsa-checkpoint/ [5] http://www.cnbc.com/2016/04/14/long-lines-and-missed-flights-fuel-criticism-of-tsa-screening.html Our Answer：]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[Ubuntu部署Tensorflow]]></title>
      <url>%2F2016%2F11%2F14%2FUbuntu%E9%83%A8%E7%BD%B2Tensorflow%2F</url>
      <content type="text"><![CDATA[首先官方链接 官方提供了多种安装方法，下面我总结两种自己认为最方便的方法！ 1.ubuntu 安装 cpu版 tensorflow(1)首先安装pip(Mac使用easyinstall)1sudo apt-get install python-pip python-dev安装完后，最好换一下pip的源为国内源，能加快下载速度推荐豆瓣源：12345671）mkdir ~/.pip2）vi ~/.pip/pip.conf3）insert添加 [global] trusted-host = pypi.douban.com index-url = http://pypi.douban.com/simple :wq 保存退出 (2)替换源后，直接pip安装tensorflow就很快啦！1sudo pip install --upgrade tensorflow (3)安装完后，测试一下，进入python1234import tensorflow as tfhello = tf.constant('Hello, TensorFlow!')sess = tf.Session()print(sess.run(hello)) 如果没有报错则安装成功啦！ 2.ubuntu 安装 GPU版 tensorflowIf you want to enable GPU,you must install Ubuntu/Linux directly on the hard disk not virtual machine! However,the first method can run on the virtual machine.According to the website,Ubuntu/Linux 64-bit, GPU enabled, Python 2.7. Requires CUDA toolkit 7.5 and CuDNN v4.So,we should install CUDA toolkit 7.5 and CuDNN v4 or higher version! (1)uninstall raw Nvidia Driver1sudo apt-get --purge remove nvidia-*(2) download cuda from https://developer.nvidia.com/cuda-downloads TIPS:Using Thunder maybe faster! (3) turn off the light1sudo service lightdm stop (4)install cuda1sudo sh cuda_&lt;version.ID&gt;_linux.run (5)set environmental variables123echo &apos;export PATH=/usr/local/cuda-7.5/bin:$PATH&apos; &gt;&gt; ~/.bashrcecho &apos;export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH&apos; &gt;&gt; ~/.bashrcsource ~/.bashrc (6) turn on the light and reboot12sudo service lightdm startsudo reboot (7)test cuda12nvcc -V and you can see cuda version if you had successfully installed cuda. Then install cudnn(1)Download cuDNN form https://developer.nvidia.com/cudnn TIPS:Using Thunder maybe faster! (2)Uncompress and copy the cuDNN files into the toolkit directory. Assuming the toolkit is installed in /usr/local/cuda, run the following commands (edited to reflect the cuDNN version you downloaded):12345tar xvzf cudnn-8.0-linux-x64-v5.1-ga.tgzsudo cp -P cuda/include/cudnn.h /usr/local/cuda/includesudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*cudnn不用配置环境变量 然后安装tensorflow我安装的是r0.8的，其他版本要看官网的依赖来安装cuda和cudnnflinally:1sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl Then you can python import tensorflow to test it!1234import tensorflow as tfhello = tf.constant('Hello, TensorFlow!')sess = tf.Session()print(sess.run(hello)) Have Fun!]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[两个激活码]]></title>
      <url>%2F2016%2F09%2F01%2F%E4%B8%A4%E4%B8%AA%E6%BF%80%E6%B4%BB%E7%A0%81%2F</url>
      <content type="text"><![CDATA[VS 2013 激活码BWG7X-J98B3-W34RT-33B3R-JVYW9 JetBrains系列产品 激活码http://idea.lanyus.com/]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[JS书签功能]]></title>
      <url>%2F2016%2F08%2F09%2FJS%E4%B9%A6%E7%AD%BE%E5%8A%9F%E8%83%BD%2F</url>
      <content type="text"><![CDATA[很多网站都具备了利用书签按钮“一键提交”的功能，其实一点儿也不复杂，只要掌握了在收藏夹中使用js，就可以为书签工具增色 不少。(不过ie浏览器不支持，一般用浏览器插件代替) 以下例子都是将JS直接写在书签的地址栏中，以Chrome为例。 例子1：先来一个最简单的例子，只包含一个js函数： 例子11javascript:alert(document.lastModified); 点击这个书签项目，将会弹出一个提示框，显示当前网页的最后的修改时间。 例子2：修改例子21234&lt;form method="post" name="Form1"&gt; &lt;input type="text" value="sss" id="ok" name="ok" /&gt; &lt;input type="submit" value="提交" /&gt;&lt;/form&gt;收藏标签： 测试1javascript:document.Form1.ok.setAttribute('value','mei'); 例子3：引入 最普遍的是引入js文件，便于维护，也便于操作： 例子31javascript:void((function()&#123;var e=document.createElement('script');e.setAttribute('src','http://localhost/js/test.js');document.body.appendChild(e);&#125;)()) 例子4：开源一个代码 对于百度的一个问卷调查的自动提交脚本 代码如下例子12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667 function fillin() &#123;var Myradio = $("div.radio-image");var Mycheckbox = $("div.checkbox-image"); //产生随机数function GetRandomNum(Min, Max) &#123; var Range = Max - Min; var Rand = Math.random(); return (Min + Math.round(Rand * Range));&#125;//填写radiovar num = GetRandomNum(0, 1);Myradio.eq(num).trigger("click"); num = GetRandomNum(2, 5);Myradio.eq(num).trigger("click"); num = GetRandomNum(6, 11);Myradio.eq(num).trigger("click"); num = GetRandomNum(12, 13);Myradio.eq(num).trigger("click"); num = GetRandomNum(14, 18);Myradio.eq(num).trigger("click"); num = GetRandomNum(19, 20);Myradio.eq(num).trigger("click"); num = GetRandomNum(21, 22);Myradio.eq(22).trigger("click"); //填写checkboxvar num1=0, num2=0;var count = 0;for (var i = 0; i &lt; 6; i++) &#123; if (GetRandomNum(0,1)) &#123; Mycheckbox.eq(i).trigger("click"); &#125;&#125;//1-2个for (var i = 7; i &lt; 11; i++) &#123; if (GetRandomNum(0, 1)) &#123; Mycheckbox.eq(i).trigger("click"); count++; &#125; if (count == 2) break;&#125;if (count == 0) Mycheckbox.eq(7).trigger("click");for (var i = 12; i &lt; 17; i++) &#123; if (GetRandomNum(0, 1)) &#123; Mycheckbox.eq(i).trigger("click"); &#125;&#125;//2个while (num1 == num2) &#123; num1 = GetRandomNum(18, 22); num2 = GetRandomNum(18, 22); &#125;Mycheckbox.eq(num1).trigger("click");Mycheckbox.eq(num2).trigger("click");$("div .survey-submit-page").trigger("click");&#125;fillin();]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[Python3网络爬虫心得]]></title>
      <url>%2F2016%2F02%2F12%2FPython3%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB%E5%BF%83%E5%BE%97%2F</url>
      <content type="text"><![CDATA[为方便老年读者（比如我妈妈）看小说，我寒假抽空给她老人家写了一个爬小说的脚本。分享出来，大家一起学习讨论。 看过网络小说的小伙伴都知道，通过手机app（如书旗小说）看大部分小说是要收费的，我妈妈她老人家又不会在网上下载免费txt到手机上，经常下载错东西，于是我捣鼓出了这个脚本。 程序实现的功能：通过用户输入的小说名，自动在网络下载该小说到手机，避开下载垃圾软件。 以下是开发过程和源码分析： 编程语言：python3；作者：我； 为什么一定要说python3，因为网上大多数是python2的教程，我捣鼓了很久。 需要加载的库urllib(好像不用加载这个),urllib.request(网络请求),re(正则),codecs(编码)。 思路：其实一开始，我是准备爬一个小说网，比如起点中文网，但是这些小说网站的反黑客系统做的太好了，简直完美得可怕，而且有些小说网站上的小说不全，直说就是我黑不进去，于是我准备直接爬百度了，百度就是开放，什么样的鸟都有。 首先要实现搜索功能 在百度的网址(http://www.baidu.com/s?wd=) 那个等号后面加上url编码的字符串就可以实现搜索功能啦。 搜索完后，就可以把搜索后的那个界面的源码爬下来，现在我们只需言源码里的url，于是我们先将源码以utf-8(百度的编码格式)解码，再用正则表达式搜索你要的信息，这句话说起轻松，其实你需要先观察源码里url的规律再进行正则，这个不容易，再通过一个函数gethref来提取出你需要的url。 百度搜索结果的第一页会有10个搜索结果，10个url每个的解码方式不同，所以我写了一大串try：来尝试各种解码格式，然后再爬取每个url里的下载路径，同样要通过观察和正则表达式，找到下载路径就能下载啦，通过python的bif里的urllib.request.urlretrieve，实现下载和现实进度的功能，urllib.request.urlretrieve的第一个参数是需要下载的url，第二个参数是下载路径，如果没写下载路径，我也不知道会下载到哪儿去，只写下载名，可下载到这个脚本的当前目录下，最后一个参数是显示进度的。 看起来很简单吧，但是bug多多呀，大家多多反馈bug呀。目前两个缺点：1.单线程爬虫，速度慢。2.try那个地方解码速度慢，错误解码会导致速度变慢。 源码如下：源码12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273# coding:utf-8import urllib.requestimport urllibimport reimport codecs##获取href的函数def gethref(booktitle): quetos_first=booktitle.find('"',booktitle.find('href')); quetos_last=booktitle.find('"',quetos_first+1); href=booktitle[quetos_first+1:quetos_last]; return(href)##显示下载进度的函数def report_hook(count, block_size, total_size): print ('%02d%%'%(100.0 * count * block_size/ total_size));##mian()keyword=input("请输入小说名:");print('正在搜索请稍等……');##模拟浏览器访问，防止百度把我当成爬虫header=&#123;'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/601.4.4 (KHTML, like Gecko) Version/9.0.3 Safari/601.4.4'&#125;req=urllib.request.Request( url='http://www.baidu.com/s?wd='+urllib.request.quote(keyword+'txt下载'), headers=header );##对百度以utf-8解码，再查找有用的html=urllib.request.urlopen(req).read().decode('utf-8');allbook=re.findall('&lt;div class="result c-container "(.*?)&lt;/a&gt;&lt;/h3&gt;',html,re.S);##爱过mark=1314;for each in allbook: book=urllib.request.Request( url=gethref(each), headers=header); bookhtml=urllib.request.urlopen(book).read(); ##尝试各种解码 try: bookhtml.decode('utf-8'); download=re.findall('&lt;a(.*?)&lt;/a&gt;',bookhtml.decode('utf-8'),re.S); except: try: bookhtml.decode('gbk'); download=re.findall('&lt;a(.*?)&lt;/a&gt;',bookhtml.decode('gbk'),re.S); except: try: bookhtml.decode('gb2312'); download=re.findall('&lt;a(.*?)&lt;/a&gt;',bookhtml.decode('gb2312'),re.S); except: continue if(len(download)==0): print('抱歉，未找到该小说TAT.......'); else: for url_down in download: if(url_down.find('.txt"')!=-1 and url_down.find('href')!=-1): print('正在尝试下载........'); ## print(gethref(url_down)) try: urllib.request.urlretrieve(gethref(url_down)[:5]+urllib.request.quote(gethref(url_down)[5:]),keyword,reporthook=report_hook); mark=0; print("success"); break; except: print("下载失败"); if(mark==0): ##不约 break;if(mark): print('抱歉，未找到该小说TAT.......');]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[以学生视角看笔记本电脑]]></title>
      <url>%2F2015%2F11%2F01%2F%E4%BB%A5%E5%AD%A6%E7%94%9F%E8%A7%86%E8%A7%92%E7%9C%8B%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91%2F</url>
      <content type="text"><![CDATA[在大学里搬了两个月的砖，帮同学们修过一些电脑，于是今天小总结一些我对这些笔记本电脑的个人看法。关于买电脑的推荐，现在写可能都晚了，但是双11要来了，想再买电脑的同学可以看看，也可以留给学弟借鉴嘛0.0。一款适合学生学习用的电脑，其实要求并不高，几乎没有要求，3000+的价位就够了，要买知名品牌哦。推荐4款我下文将会说到的品牌：华硕（ASUS）,惠普（HP），宏碁（acer），联想（Lenovo）。我用的是华硕，应该是我们寝室都用的是华硕，大力推荐哦！ 当你花了3000+买了一台笔记本后，笔记本一般会配置一个500G的机械硬盘，但机械硬盘一般会卡顿，想要解决系统卡顿的问题，你只需要再换一块固态硬盘，或者直接买带有固态硬盘的电脑。你可以买块64G的固态硬盘（200左右）来装系统，或者直接买256G的固态（500左右）来把你原来的硬盘替换掉，256G绝逼够用，而且一定比原来笔记本自带的机械硬盘快，快得飞起来！8秒开机不是问题！（PS:传送门→固态硬盘和机械硬盘的差别） 然后关于系统的问题，修修你的系统也可以解决卡顿的问题，如果你不想换硬盘，那也是可以将就用的，或者你买的就是64G的固态+500G的机械，那就修修你的系统吧。 对windows的系统总的来说，你可以这样干，现在一个安全管家，推荐腾讯的，（以腾讯电脑管家为例）打开后再找到电脑加速，找到启动项，把除显卡，声卡等驱动以外的其他启动项启动项全部禁用了；然后在服务项里，把有关打印机和蓝牙的服务项禁用掉，如果你要用蓝牙就别禁用它；在计划任务里，禁用微软改善计划和PNP(蓝牙相关项)。最后用电脑管家的一键优化，优化一下。妥妥的！还有系统提示的windows更新，一律拒绝，不要更新！别问我为什么！ 关于特定电脑的建议如下： 首先说华硕的，打开控制面板的卸载工具，或者安全管家的卸载工具。ASUS开头的就是华硕电脑自带软件，在这些软件中1.ASUS Vibe Fun Center（这个是华硕自家开发的多媒体软件，里面有在线歌曲，在线影视，在线广播，在线电子书，但绝大多数都需要收费，可卸载。） 2.ASUS Sonic Focus（这个是华硕自家开发调节播放音效的，个人感觉打开还没有关闭效果好，卸载。） 3.AI Recovery Burner（这个是用来备份系统的，个人感觉用不上。卸载。） 4.Nuance PDF Reader（这是PDF阅读器，还没有福昕好用，打开稍微大点的PDF文件画面就特卡，卸载。） 5.SmartLogonManager(这是开机面部识别软件，这个软件我觉得比较鸡肋，有开机来检测面部识别的时间，自己都已经把密码打好了，这个看个人喜好了。比较好玩。想卸就卸。) 6.e-Driver（这是华硕笔记本电脑驱动程序光盘，打开后可以选择安装【华硕自动升级软件】【PC云安全软件】，可以卸载！） 7.Splendid Utility（这是华硕的靓彩软件，调节屏幕色调的，内置四种视觉效果功能：冷色系，阳光暖，柔和，标准，还可以自定义，用处不大。可以卸载。） 8.syncables desktop SE（这是两个笔记本或者与台式机之间同步音乐、图片等资料的程序，我觉得用不着，平时谁会在两三个电脑之间来回这么折腾?就算需要，一个U盘就搞定了。可以卸载之!） 9.趋势科技（这是杀毒软件，还是收费的，卸载吧。） 10.eManual（这是笔记本使用手册，没用。自己看过一遍就可以卸载除掉了。说白了就是个笔记本使用说明书的电子文档。）11.LifeFrame（这是专为笔记本电脑内置摄影机所提供的简单易用的拍摄/录影工具，让用户可以充分利用摄影机完成拍照或是摄影等图像撷取功能。这个随便玩玩就好。没啥用，卸载吧！） 以下是不能卸载的： ATK Package(华硕笔记本的快捷键驱动 ，卸载后会导致FN键相关的功能无法使用). 2.不是ASUS开头的软件，也不是应用软件，比如里面含有USB，或者带有MicrosoftVisual C++ 的软件 千万不要卸载！！！ 然后，惠普的。3D DriveGuard：当笔记本计算机不小心跌落或突然撞到其他物体时，它可以保护笔记本计算机的内部磁盘驱动器，可以不卸。 CoolSense：智凉散热系统是惠普独家散热辅助软件，主要是用来控制机器的温度。可以卸。 HP Documentation：HP的说明文件，包含一些使用注意事项，免责声明之类的，卸卸卸! HP Launch Box:允许通过分组方式组织应用，以便快速访问Windows 7 任务栏。该软件包可为支持的笔记本电脑和操作系统的提供支持。卸卸卸! HP On Screen Display: 该实用程序支持系统在某些特性（如音量或亮度）发生更改时在屏幕上显示弹出图形。卸卸卸! HP Power Manager: 该实用程序会在笔记本电脑的当前电源设置与惠普推荐设置不匹配时通知您。可以卸载。 HP Quick Launch： 在支持的笔记本电脑上使用特殊功能键。例如，借助 HP Quick Launch（快捷键）软件，用户可以按 Fn+ESC 键查看系统信息。该软件适用于支持的操作系统。 如果没有安装此驱动,可能会出现以下现象： 1.与FN配合使用的组合功能键无法使用，常见为亮度无法调节，外接投影，显示器或者电视时FN+F4无法切换 …你自己看着办！ HP Security Assistant： 可以快捷地连接到系统中的功能或者系统中安装的软件。 HP Setup：是个HP预设的一个系统设置，卸卸卸! HP setup manager：主要是为机器更新惠普有关的驱动及软件的一款管理软件。卸卸卸! HP SimplePass PE: 该软件可以使用电脑所有者的指纹来保护身份信息和帐户访问权限。卸卸卸! HP Software Framework： 提供了一套通用的软件接口，可以集中并简化对硬件、BIOS 和 HP 设备驱动程序的访问。可以不卸。 HP Support Assistant: HPSA是一个预装在笔记本上的自助解决问题的工具。您可以进行软件更新，安装驱动，运行系统健康检测，进行故障排错，学习故障排错的技术，也可以通过聊天或者打电话的方式通过HPSA这个软件与惠普工程师联系。卸卸卸! 宏碁的：acer backup manager：备份软件，主要是备份用户的系统盘，卸卸卸！ acer crystal eye webcam：摄像头软件，算是宏基笔记本自己特有的，如果不常用可以删除 acer epower management：笔记本电源管理 ，主要是对电池的使用分情况来处理，更好的节能和延长电池的使用寿命，同时对Fn热键的屏幕提示也起到作用。别卸! acer erecovery managment：一键还原，用来还原系统，新笔记本可以用它来恢复到出厂值。与备份软件配合使用。卸卸卸！ acer registration：用来连接官网进行注册，将电池质保延长至1年。如果不注册，电池质保期是半年。卸卸卸！还有些可能每台电脑不同，其实除了power那个，其他acer开头的都可以卸载！ 联想的电脑最简单，只要是lenovo开头的自带软件，都可以卸载！当然，如果你是自己买的固态硬盘，然后自己做的系统，就不会有自带软件，就没这个烦恼咯~ 本文是个人看法，若有建议，请与我联系。]]></content>
    </entry>

    
    <entry>
      <title><![CDATA[给电商老师的作业]]></title>
      <url>%2F2015%2F10%2F25%2F%E7%BB%99%E7%94%B5%E5%95%86%E8%80%81%E5%B8%88%E7%9A%84%E4%BD%9C%E4%B8%9A%2F</url>
      <content type="text"><![CDATA[电商老师您好，虽然您自我介绍的那张PPT一晃而过，但是幸好我还是记住了您的名字，余老师。好的，我就不写废话凑字数了。这篇文章虽然是我交给您的作业，但它也是我大学的小计划。 在您的三节对EC的介绍课后，对于阿里巴巴的B2B模式，京东的B2C模式，这些全新的商业模式，我都还是懂不起。但是至少我听懂了2是to的意思，不是2B的那个意思。但至少我知道在网上购物会便宜一些，而且质量也不错。现在电子商务在我脑中的印象就是B2A，B2B，B2C，B2M这些字符串，也仅仅是字符串，当然这些影响中还有网上购物。 我不喜欢营销，不喜欢运筹学；我也不想学经济学，管理学；听说电商有很多漂亮的姑娘，但是抱歉我喜欢男的。所以综上所述，等价于&lt;=&gt;抱歉，我大二不会选择电商。 上节课，余老还讲了一些关于计算机的历史和发展，讲到了一个我非常喜欢的人——艾伦·图灵。我给余老推荐一部电影，叫《模仿游戏》（The Imitation Game），可能余老也看过。这部电影就主要讲的是二战时期，图灵去破解德军密码的故事，剧情片哦，超好看！ 说到图灵，就不得不让我想到他的图灵测试，这也是我非常喜欢的东西，因为我以后就想去研究人工智能，所以我准备大二选择CS，在此我要感谢两位计算机之父，冯·洛伊曼和艾伦·图灵。 目前，我对AI的认识不多，大半都是在电影里看到的，第一部让我爱上人工智能的电影是《钢铁侠》里面的J.A.R.V.I.S。贾维斯是自学成才的Stark先生一个人做出来的一个人工智能管家。虽然，电影里没说制作过程，但我觉得这个结果是可以实现的，只是需要我们继续研究，或者这项技术已经实现，只是不为人知，还在内测。第二部让我爱上人工智能的是电影《机械姬》，这部电影大概讲述了，爸爸创造了女儿，然后爸爸为女儿找了个男朋友来测试女儿是不是人，结果女儿太厉害，最后把爸爸杀了，把男朋友关了起来，自己走了出去，看看外面的世界去了。 《机械姬》里面大概说了一下如何实现人工智能，就是通过物联网技术，把人们打电话时的说话方式，和监视器里面记录的行为方式都记录下来，然后编码到计算机里面，让电脑也学会。但我觉得，这种方式不好，最好的方式是让电脑能够自学，要设计一个程序，让电脑有自学能力，就像一个刚出生的婴儿，因为人和计算机的不同就是，人会思考，人会自己去学习，如果能实现计算机的自学，那人工智能也能有所进步了吧，当然这都是我的愚见。 这些都是后话了，现在需要做的事就是把技术先学好，至于网络工程，我也会去大概学习一下，现在对网工不是很了解，在我印象里，网工就是去修路由器的然后顺便帮用户拉线，但物联网工程也是网工里面的，我对物联网比较感兴趣，所以也会去学习，但是前提是先把计算机科学学好。 在我看来，如果要实现我最终的目标，我要这样学，计科，网工，心理学，人类学，生物科学，电商，幸好前面三个我都比较感兴趣。 然后，大学生活的规划是，晚上早点睡，早上自然醒，幸好我自然醒的时间是7点，多读书，多看报，少吃零食，多睡觉。 感情方面的事就保密咯，顺其自然吧。 还有，余老师，我很喜欢你的讲课风格。0.0]]></content>
    </entry>

    
  
  
</search>