Evaluation doesn't work on Windows #45

peter-ch · 2024-06-19T09:08:47Z

After getting a score of 0 every time, I looked at the samples.jsonl_results.jsonl file and the result for each is this: "failed: module 'signal' has no attribute 'setitimer'"

This seems like a Windows/Unix issue.

Ephrem-Adugna · 2024-06-26T21:07:27Z

Same issue here

mfwong1223 · 2024-06-27T08:53:19Z

For Windows, I replaced the signal module by the threading module on

human-eval/human_eval/execution.py

Lines 90 to 99 in 312c5e5

    
           @contextlib.contextmanager 
        
           def time_limit(seconds: float): 
        
               def signal_handler(signum, frame): 
        
                   raise TimeoutException("Timed out!") 
        
               signal.setitimer(signal.ITIMER_REAL, seconds) 
        
               signal.signal(signal.SIGALRM, signal_handler) 
        
               try: 
        
                   yield 
        
               finally: 
        
                   signal.setitimer(signal.ITIMER_REAL, 0)

to

import threading
@contextlib.contextmanager
def time_limit(seconds: float):
    def signal_handler():
        raise TimeoutException("Timed out!")
    timer = threading.Timer(seconds, signal_handler)
    timer.start()
    try:
        yield
    finally:
        timer.cancel()

Ephrem-Adugna · 2024-07-01T15:48:49Z

Above didn't work for me, just ran inside linux vm using wsl

CynicalWilson · 2024-09-06T20:38:10Z

same issue here. Every LLM I load in LMStudio, and test against HumanEval via the script below, I get 0/0 with the failure being the same module not being found.

HumanEval.py:

import os
import json
from human_eval.data import write_jsonl, read_problems
from human_eval.evaluation import evaluate_functional_correctness
from local_llm_client import client

def generate_one_completion(prompt):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat_completion_create(messages)
    return response['choices'][0]['message']['content']

def generate_completions(problems, output_file):
    samples = []
    for task_id, problem in problems.items():
        prompt = problem["prompt"]
        completion = generate_one_completion(prompt)
        samples.append({"task_id": task_id, "completion": completion})
    
    write_jsonl(output_file, samples)

if __name__ == "__main__":
    problems = read_problems()
    output_file = "completions.jsonl"
    
    generate_completions(problems, output_file)
    
    results = evaluate_functional_correctness(output_file)
    print(json.dumps(results, indent=2))

local_llm_client.py:

import requests
import json

class LocalLLMClient:
    def __init__(self, base_url="http://localhost:4445"):
        self.base_url = base_url

    def chat_completion_create(self, messages, temperature=0.7, max_tokens=-1, stream=False):
        url = f"{self.base_url}/v1/chat/completions"
        headers = {"Content-Type": "application/json"}
        data = {
            "model": "nxcode-cq-7b-orpo-q8_0",  # Adjust this to match your model name
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
            "stream": stream
        }

        response = requests.post(url, headers=headers, json=data)
        response.raise_for_status()
        return response.json()

client = LocalLLMClient()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation doesn't work on Windows #45

Evaluation doesn't work on Windows #45

peter-ch commented Jun 19, 2024

Ephrem-Adugna commented Jun 26, 2024

mfwong1223 commented Jun 27, 2024

Ephrem-Adugna commented Jul 1, 2024

CynicalWilson commented Sep 6, 2024

Evaluation doesn't work on Windows #45

Evaluation doesn't work on Windows #45

Comments

peter-ch commented Jun 19, 2024

Ephrem-Adugna commented Jun 26, 2024

mfwong1223 commented Jun 27, 2024

Ephrem-Adugna commented Jul 1, 2024

CynicalWilson commented Sep 6, 2024