-
Notifications
You must be signed in to change notification settings - Fork 0
/
result.txt
220 lines (203 loc) · 10.3 KB
/
result.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
DIMENSION
---------
Training Size: (90275, 3)
Property Size: (2985217, 58)
MERGED
------
0 1
parcelid 11016594 14366692
logerror 0.0276 -0.1684
transactiondate 2016-01-01 00:00:00 2016-01-01 00:00:00
airconditioningtypeid 1 NaN
architecturalstyletypeid NaN NaN
basementsqft NaN NaN
bathroomcnt 2 3.5
bedroomcnt 3 4
buildingclasstypeid NaN NaN
buildingqualitytypeid 4 NaN
calculatedbathnbr 2 3.5
decktypeid NaN NaN
finishedfloor1squarefeet NaN NaN
calculatedfinishedsquarefeet 1684 2263
finishedsquarefeet12 1684 2263
finishedsquarefeet13 NaN NaN
finishedsquarefeet15 NaN NaN
finishedsquarefeet50 NaN NaN
finishedsquarefeet6 NaN NaN
fips 6037 6059
fireplacecnt NaN NaN
fullbathcnt 2 3
garagecarcnt NaN 2
garagetotalsqft NaN 468
hashottuborspa NaN NaN
heatingorsystemtypeid 2 NaN
latitude 3.4281e+07 3.36681e+07
longitude -1.18489e+08 -1.17678e+08
lotsizesquarefeet 7528 3643
poolcnt NaN NaN
poolsizesum NaN NaN
pooltypeid10 NaN NaN
pooltypeid2 NaN NaN
pooltypeid7 NaN NaN
propertycountylandusecode 0100 1
propertylandusetypeid 261 261
propertyzoningdesc LARS NaN
rawcensustractandblock 6.03711e+07 6.05905e+07
regionidcity 12447 32380
regionidcounty 3101 1286
regionidneighborhood 31817 NaN
regionidzip 96370 96962
roomcnt 0 0
storytypeid NaN NaN
threequarterbathnbr NaN 1
typeconstructiontypeid NaN NaN
unitcnt 1 NaN
yardbuildingsqft17 NaN NaN
yardbuildingsqft26 NaN NaN
yearbuilt 1959 2014
numberofstories NaN NaN
fireplaceflag NaN NaN
structuretaxvaluedollarcnt 122754 346458
taxvaluedollarcnt 360170 585529
assessmentyear 2015 2015
landtaxvaluedollarcnt 237416 239071
taxamount 6735.88 10153
taxdelinquencyflag NaN NaN
taxdelinquencyyear NaN NaN
censustractandblock 6.03711e+13 NaN
DROP COLUMNS
------------
rawcensustractandblock vs censustractandblock
rawcensustractandblock censustractandblock
rawcensustractandblock 1.000000 0.999822
censustractandblock 0.999822 1.000000
finishedfloor1squarefeet vs finishedsquarefeet50
finishedfloor1squarefeet finishedsquarefeet50
finishedfloor1squarefeet 1.000000 0.980182
finishedsquarefeet50 0.980182 1.000000
calculatedfinishedsquarefeet vs finishedsquarefeet12
calculatedfinishedsquarefeet \
calculatedfinishedsquarefeet 1.0
finishedsquarefeet12 1.0
finishedsquarefeet12
calculatedfinishedsquarefeet 1.0
finishedsquarefeet12 1.0
bathroomcnt vs calculatedbathnbr
bathroomcnt calculatedbathnbr
bathroomcnt 1.0 1.0
calculatedbathnbr 1.0 1.0
hashottuborspa vs pooltypeid10
pooltypeid10
pooltypeid10 NaN
MISSING VALUES
--------------
Columns with missing values over 90%:
buildingclasstypeid 99.982276
finishedsquarefeet13 99.963445
basementsqft 99.952368
storytypeid 99.952368
yardbuildingsqft26 99.894766
fireplaceflag 99.754085
architecturalstyletypeid 99.710883
typeconstructiontypeid 99.668790
finishedsquarefeet6 99.533647
decktypeid 99.271116
poolsizesum 98.926613
pooltypeid10 98.713930
pooltypeid2 98.666297
taxdelinquencyyear 98.024924
taxdelinquencyflag 98.024924
hashottuborspa 97.380227
yardbuildingsqft17 97.068956
finishedsquarefeet15 96.052063
finishedfloor1squarefeet 92.405428
finishedsquarefeet50 92.405428
dtype: float64
Check the value of storytypeid:
storytypeid numberofstories
460 7.0 3.0
3046 7.0 1.0
4033 7.0 1.0
4375 7.0 2.0
11120 7.0 1.0
Examine architecturalstyletypeid:
7.0 221
8.0 16
2.0 11
21.0 8
3.0 4
10.0 1
Name: architecturalstyletypeid, dtype: int64
Examine buildingclasstypeid:
4.0 16
Name: buildingclasstypeid, dtype: int64
Examine typeconstructiontypeid:
6.0 296
4.0 2
13.0 1
Name: typeconstructiontypeid, dtype: int64
Compare fireplaceflag with fireplacecnt:
fireplaceflag fireplacecnt
7 NaN 1.0
14 NaN 1.0
15 NaN 1.0
16 NaN 1.0
21 NaN 1.0
Deck type:
66.0 658
Name: decktypeid, dtype: int64
Columns to drop:
['censustractandblock', 'finishedsquarefeet50', 'finishedsquarefeet12', 'calculatedbathnbr', 'pooltypeid10', 'fullbathcnt', 'threequarterbathnbr', 'finishedsquarefeet6', 'finishedsquarefeet13', 'finishedsquarefeet15', 'basementsqft', 'storytypeid', 'architecturalstyletypeid', 'buildingclasstypeid', 'typeconstructiontypeid']
<class 'pandas.core.frame.DataFrame'>
Int64Index: 90275 entries, 0 to 90274
Data columns (total 48 columns):
parcelid 90275 non-null float64
logerror 90275 non-null float32
transactiondate 90275 non-null object
airconditioningtypeid 29439 non-null category
bathroomcnt 90275 non-null float32
bedroomcnt 90275 non-null float32
buildingqualitytypeid 58022 non-null category
deckflag 658 non-null float32
finishedfloor1squarefeet 6883 non-null float32
calculatedfinishedsquarefeet 89614 non-null float32
fips 90275 non-null category
fireplacecnt 9722 non-null float32
garagecarcnt 29947 non-null float32
garagetotalsqft 29947 non-null float32
hashottuborspa 3023 non-null object
heatingorsystemtypeid 56738 non-null category
latitude 90275 non-null float32
longitude 90275 non-null float32
lotsizesquarefeet 80278 non-null float32
poolcnt 18440 non-null float32
poolsizesum 1515 non-null float32
pooltypeid2 1862 non-null float32
pooltypeid7 17236 non-null float32
propertycountylandusecode 90274 non-null category
propertylandusetypeid 90275 non-null category
propertyzoningdesc 58971 non-null object
rawcensustractandblock 90275 non-null float32
regionidcity 88476 non-null float32
regionidcounty 90275 non-null float32
regionidneighborhood 36595 non-null float32
regionidzip 90242 non-null float32
roomcnt 90275 non-null float32
unitcnt 59011 non-null float32
yardbuildingsqft17 3087 non-null float32
yardbuildingsqft26 736 non-null float32
yearbuilt 89520 non-null float32
numberofstories 20597 non-null float32
fireplaceflag 9722 non-null float64
structuretaxvaluedollarcnt 89895 non-null float32
taxvaluedollarcnt 90274 non-null float32
assessmentyear 90275 non-null float32
landtaxvaluedollarcnt 90274 non-null float32
taxamount 90269 non-null float32
taxdelinquencyflag 2434 non-null object
taxdelinquencyyear 2434 non-null float32
living_area_prop 79945 non-null float32
value_ratio 90268 non-null float32
value_prop 89895 non-null float32
dtypes: category(6), float32(36), float64(2), object(4)
memory usage: 17.7+ MB