-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about the precision of the trained model #23
Comments
Yes, you should use the larger dateset. The small data is just for quick setup. |
Can you provide an already-trained model based on the large dataset for me to have a test? |
@ttbuffey I uploaded the trained model (epoch 500 by Keras) to the data folder in google driver. Please check it out and let me know any questions. |
@guxd Thanks a lot, I'm trying it. |
@guxd We tested the similarity is around 0.4, the result seems more relevant now. I also want to confirm the epoch500 you just provided, does it contain 18,233,872 records mentioned in the paper? |
0.4 seems normal. The epoch500 is a model trained with the dataset from Google driver. The data contains 18,233,872 records as mentioned in the paper. |
Hello , I used the original data set on your cloud disk, ran 1000 epoch with the original parameter keras model, and the chunksize was 200,000.However, the best val_loss of the optimal model during training is about 0.00016. I used the optimal model to run eval function, and the result top10 is about 0.79, and MRR is about 0.53, which is slightly different from your paper.When search was tested in the use_dataset, the highest similar score was around 0.36. What is the problem?Thank you for your reply! |
I retrained a model with 670 epoches,Compare the selected minimum loss which it is ,the loss is 0.000329920450937,so I stop training it. Input Query: convert string to date ('public static Class < ? > loadSystemClass ( String className ) throws ClassNotFoundException { return Class . forName ( className ) ; } \n', 0.40123203) ('public static BinaryExpression andAssign ( Expression left , Expression right , Method method , LambdaExpression lambdaExpression ) { throw Extensions . todo ( ) ; } \n', 0.40073127) ('public static < TSource , TKey , TElement , TResult > Enumerable < TResult > groupBy ( Enumerable < TSource > enumerable , Function1 < TSource ('public static void squelchWriter ( Writer writer ) { try { if ( writer != null ) { writer . close ( ) ; } } catch ( IOException ex ) { } } \n', ('public String getElement ( int index ) throws Exception { if ( index != 0 ) { throw new Exception ( "INTERNAL-ERROR:-invalid-index-" + index + ('public static byte [ ] loadBinary ( File binFile ) throws IOException { byte [ ] xferBuffer = new byte [ 10240 ] ; byte [ ] outBytes = null ; ByteArrayOutputStream baos ; int i ; FileInputStream fis = new FileInputStream ( binFile ) ; try { baos = new ByteArrayOutputStream ( ) ; while ( ( i = fis . read ( xferBuffer ) ) > 0 ) baos . write ( xferBuffer , 0 , i ) ; outBytes = baos . toByteArray ( ) ; } finally { try { fis . close ( ) ; } catch ( IOException ioe ) { } finally { fis = null ; baos = null ; } } return outBytes ; } \n', 0.39748406) ('@ XmlElementDecl ( namespace = "http://schemas.microsoft.com/2003/10/Serialization1/" , name = "duration" ) public JAXBElement < Duration > cre ('protected void doFormatValue ( final CharArrayBuffer buffer , final String value , boolean quote ) { if ( ! quote ) { for ( int i = 0 ; ( i < v ('public double diagonal ( ) { return Math . sqrt ( Math . pow ( theLength , 2 ) + Math . pow ( theWidth , 2 ) ) ; } \n', 0.39587107) ('protected String buildQuery ( ) throws UnsupportedEncodingException { String timestamp = getTimestampFromLocalTime ( Calendar . getInstance ( ) |
When I try to reproduce the results, I also have a situation where the results are not relevant. Have you solved the problem? |
Dear Author,
When we run the model trained with the default dataset in deep-code-search-master/pytorch/data/github, we find the relevance of search result to the input question is not very high, with the highest similarity 0.3;
I want to confirm if the the result of the default dataset is indeed not that relevant? what's the similarity score you tried?
Question: "convert string to date "
Result: ('public static String formatSeconds ( Object obj ) { long time = - 1L ; if ( obj instanceof Long ) { time = ( ( Long ) obj ) . longValue ( ) ; } else if ( obj instanceof Integer ) { time = ( ( Integer ) obj ) . intValue ( ) ; } return ( time + "-s" ) ; } \r\n', 0.31213856)
How about using the larger dataset you provided in the "Google Drive"- https://drive.google.com/drive/folders/1GZYLT_lzhlVczXjD6dgwVUvDDPHMB6L7?usp=sharing;
Will the precision based on the large dataset will be much higher? We haven't got the result yet because it takes quite a long time to train.
Sincerely hope you can help answer, Thx a lot.
The text was updated successfully, but these errors were encountered: