Questions about the precision of the trained model #23

ttbuffey · 2019-04-29T05:58:14Z

Dear Author,

When we run the model trained with the default dataset in deep-code-search-master/pytorch/data/github, we find the relevance of search result to the input question is not very high, with the highest similarity 0.3;

I want to confirm if the the result of the default dataset is indeed not that relevant? what's the similarity score you tried?
Question: "convert string to date "
Result: ('public static String formatSeconds ( Object obj ) { long time = - 1L ; if ( obj instanceof Long ) { time = ( ( Long ) obj ) . longValue ( ) ; } else if ( obj instanceof Integer ) { time = ( ( Integer ) obj ) . intValue ( ) ; } return ( time + "-s" ) ; } \r\n', 0.31213856)
How about using the larger dataset you provided in the "Google Drive"- https://drive.google.com/drive/folders/1GZYLT_lzhlVczXjD6dgwVUvDDPHMB6L7?usp=sharing;
Will the precision based on the large dataset will be much higher? We haven't got the result yet because it takes quite a long time to train.

Sincerely hope you can help answer, Thx a lot.

guxd · 2019-04-29T06:03:15Z

Yes, you should use the larger dateset. The small data is just for quick setup.

ttbuffey · 2019-04-29T06:40:44Z

Can you provide an already-trained model based on the large dataset for me to have a test?
Because our GPU server to train the model is quite slow, for seven days, we only run 11/2000; It still takes quite a long time.
Thx a lot for your quick response and help.

guxd · 2019-04-29T07:09:16Z

@ttbuffey I uploaded the trained model (epoch 500 by Keras) to the data folder in google driver. Please check it out and let me know any questions.

ttbuffey · 2019-04-29T07:23:56Z

@guxd Thanks a lot, I'm trying it.

ttbuffey · 2019-04-29T09:32:45Z

@guxd We tested the similarity is around 0.4, the result seems more relevant now.
Is this similarity consistent with the supposed right one?

I also want to confirm the epoch500 you just provided, does it contain 18,233,872 records mentioned in the paper?

guxd · 2019-04-29T10:36:37Z

0.4 seems normal. The epoch500 is a model trained with the dataset from Google driver. The data contains 18,233,872 records as mentioned in the paper.

ahzz1207 · 2019-05-07T02:20:50Z

Hello , I used the original data set on your cloud disk, ran 1000 epoch with the original parameter keras model, and the chunksize was 200,000.However, the best val_loss of the optimal model during training is about 0.00016. I used the optimal model to run eval function, and the result top10 is about 0.79, and MRR is about 0.53, which is slightly different from your paper.When search was tested in the use_dataset, the highest similar score was around 0.36. What is the problem?Thank you for your reply!

guxd · 2019-05-21T20:07:26Z

@ahzz1207 The MRR shown by the program is calculated in a different way to that in the paper. It is automatically computed in the training set. The MRRs shown in the paper is manually calculated by human labeling. 0.36 seems a bit below the expectation. It is usually 0.4
#16

primary-studyer · 2019-09-27T13:15:36Z

I retrained a model with 670 epoches,Compare the selected minimum loss which it is ,the loss is 0.000329920450937,so I stop training it.
but I feel the results are irrelevant .

Input Query: convert string to date
How many results? 10

('public static Class < ? > loadSystemClass ( String className ) throws ClassNotFoundException { return Class . forName ( className ) ; } \n', 0.40123203)

('public static BinaryExpression andAssign ( Expression left , Expression right , Method method , LambdaExpression lambdaExpression ) { throw Extensions . todo ( ) ; } \n', 0.40073127)

('public static < TSource , TKey , TElement , TResult > Enumerable < TResult > groupBy ( Enumerable < TSource > enumerable , Function1 < TSource
, TKey > keySelector , Function1 < TSource , TElement > elementSelector , Function2 < TKey , Enumerable < TElement > , TResult > resultSelector ) { throw Extensions . todo ( ) ; } \n', 0.40073127)

('public static void squelchWriter ( Writer writer ) { try { if ( writer != null ) { writer . close ( ) ; } } catch ( IOException ex ) { } } \n',
0.40073127)

('public String getElement ( int index ) throws Exception { if ( index != 0 ) { throw new Exception ( "INTERNAL-ERROR:-invalid-index-" + index +
"-sent-to-AreaMoments:getElement" ) ; } else { return Integer . toString ( number_windows ) ; } } \n', 0.39803487)

('public static byte [ ] loadBinary ( File binFile ) throws IOException { byte [ ] xferBuffer = new byte [ 10240 ] ; byte [ ] outBytes = null ; ByteArrayOutputStream baos ; int i ; FileInputStream fis = new FileInputStream ( binFile ) ; try { baos = new ByteArrayOutputStream ( ) ; while ( ( i = fis . read ( xferBuffer ) ) > 0 ) baos . write ( xferBuffer , 0 , i ) ; outBytes = baos . toByteArray ( ) ; } finally { try { fis . close ( ) ; } catch ( IOException ioe ) { } finally { fis = null ; baos = null ; } } return outBytes ; } \n', 0.39748406)

('@ XmlElementDecl ( namespace = "http://schemas.microsoft.com/2003/10/Serialization1/" , name = "duration" ) public JAXBElement < Duration > cre
ateDuration ( Duration value ) { return new JAXBElement < Duration > ( _Duration_QNAME , Duration . class , null , value ) ; } \n', 0.39748406)

('protected void doFormatValue ( final CharArrayBuffer buffer , final String value , boolean quote ) { if ( ! quote ) { for ( int i = 0 ; ( i < v
alue . length ( ) ) && ! quote ; i ++ ) { quote = isSeparator ( value . charAt ( i ) ) ; } } if ( quote ) { buffer . append ( '"' ) ; } for ( int i = 0 ; i < value . length ( ) ; i ++ ) { char ch = value . charAt ( i ) ; if ( isUnsafe ( ch ) ) { buffer . append ( '|' ) ; } buffer . append ( ch ) ; } if ( quote ) { buffer . append ( '"' ) ; } } \n', 0.39748406)

('public double diagonal ( ) { return Math . sqrt ( Math . pow ( theLength , 2 ) + Math . pow ( theWidth , 2 ) ) ; } \n', 0.39587107)

('protected String buildQuery ( ) throws UnsupportedEncodingException { String timestamp = getTimestampFromLocalTime ( Calendar . getInstance ( )
. getTime ( ) ) ; Map < String , String > queryParams = new TreeMap < String , String > ( ) ; queryParams . put ( "ApplicationName" , application_name ) ; queryParams . put ( "AWSAccessKeyId" , accessKeyId ) ; queryParams . put ( "Description" , "descriptionversion1" ) ; queryParams . put ( "Operation" , ACTION_NAME ) ; queryParams . put ( "SignatureVersion" , "2" ) ; queryParams . put ( "SignatureMethod" , HASH_ALGORITHM ) ; queryParams . put ( "Timestamp" , timestamp ) ; queryParams . put ( "Version" , SERVICE_VERSION ) ; String query = "" ; boolean first = true ; for ( String name : queryParams . keySet ( ) ) { if ( first ) first = false ; else query += "&" ; query += name + "=" + URLEncoder . encode ( queryParams . get ( name ) , "UTF-8" ) ; } return query ; } \n', 0.39587107)

xdliu1998 · 2020-05-21T04:56:20Z

When I try to reproduce the results, I also have a situation where the results are not relevant. Have you solved the problem?

guxd mentioned this issue Apr 29, 2019

trained model #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the precision of the trained model #23

Questions about the precision of the trained model #23

ttbuffey commented Apr 29, 2019 •

edited

Loading

guxd commented Apr 29, 2019

ttbuffey commented Apr 29, 2019

guxd commented Apr 29, 2019

ttbuffey commented Apr 29, 2019

ttbuffey commented Apr 29, 2019

guxd commented Apr 29, 2019

ahzz1207 commented May 7, 2019

guxd commented May 21, 2019 •

edited

Loading

primary-studyer commented Sep 27, 2019

xdliu1998 commented May 21, 2020

Questions about the precision of the trained model #23

Questions about the precision of the trained model #23

Comments

ttbuffey commented Apr 29, 2019 • edited Loading

guxd commented Apr 29, 2019

ttbuffey commented Apr 29, 2019

guxd commented Apr 29, 2019

ttbuffey commented Apr 29, 2019

ttbuffey commented Apr 29, 2019

guxd commented Apr 29, 2019

ahzz1207 commented May 7, 2019

guxd commented May 21, 2019 • edited Loading

primary-studyer commented Sep 27, 2019

xdliu1998 commented May 21, 2020

ttbuffey commented Apr 29, 2019 •

edited

Loading

guxd commented May 21, 2019 •

edited

Loading