Regarding solution

How to run

There is spark-master container with 2 spark-workers and spark-submit container which will submit the job.
Not done, but is possible to scale docker-compose to any number of instances if we set static IPs in docker-compose carefully.

I assume ip:port as single user (For users behind NAT, they have same IP but different port number)
First make a new column which states if present time and previous time of hhtp request by particular user belong to same session
Then take a cumulative some, this will automatically give different number to different rows in different sessions
Then partition and make various queries
Lastly just save rough output to data directory

It is able to run but still few problems with standalone mode
There are few errors like rsync error because it can't find spark host name but it does not effect the output.
Some warning relating to some spark jars not found but it does not effect the output
When trying to run in docker, spark standalone cluster mode fails relay tasks to worker node(while client mode can connect to both workers and successfully complete the job.).