Skip to content

jiehuan/insight_edgar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

insight_edgar

Approach

The following dictionaries and lists are used for processing.

main = dict()		# key: IP, value: list[start-time, start unix time, end-time, end unix-time]
time_ip = dict()	# key: last active unix-time, value: list of IPs each in sequential orders
count = dict()  	# key: IP, value: set of IDs
active_time  = []	# list of end unix-time of live sessions

Alternate image text

Requirements

This solution is code in Python 3.6.5. The following packages need to be imported.

import sys
import numpy as np
from dateutil.parser import parse

Run Instructions

execute 'run.sh' in terminal

./run.sh

default file names and locations

Python file: ./src/sessionization.py Input csv file: ./input/log.csv Inactivity period file: ./input/inactivity_period.txt Output file: ./output/sessionization.txt

If file names or locations need to be changed, the command in run.sh should also be changed accordingly.

About

Insight Data Engineer challenge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published