-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tran rawtxt data to h5 #50
Comments
现有代码只支持线性的序列化,不支持嵌套,你可以把你的方括号也看成一个字符,这样就可以当成一个序列了。 |
要怎么将txt文件转成符合您的数据集中的.h5格式呢? |
@li-car-fei 可以参考 |
Hey sorry for being dumb but can you please guide what are dialogs in Are these arrays of sentences ? or something else |
You can refer to this function which processes the def get_daily_dial_data(data_path):
dialogs = []
dials = open(data_path, 'r').readlines()
for dial in dials:
utts = []
for i, utt in enumerate(dial.rsplit(' __eou__ ')):
caller = 'A' if i % 2 == 0 else 'B'
utts.append((caller, utt, np.zeros((1, 1))))
dialog = {'knowledge': '', 'utts': utts}
dialogs.append(dialog)
return dialogs According to this code, dialogs is a list of |
[20, [8, [14, [73]], [14, [36]], [4, [28]]], [4, [1516], [660]], [19, [15, [11, [8, [4, [169], [66], [4]]], [4, [4]]]], [15, [11, [8, [4, [4, [6599]], [9, [7, [4]]]]], [4, [160]]]], [15, [11, [8, [4, [1534], [74], [1216]]], [4, [1216], [74]]]], [15, [11, [8, [4, [6057], [8]]], [4, [8], [1534]]]], [15, [11, [8, [4, [6057], [8]]], [4, [8], [74]]]], [15, [11, [8, [4, [1516], [196], [909]]], [4, [59]]]]], [12, [13]]]
我的每一条数据是多层嵌套的list,我需要转成h5格式,以至于可以直接在您的程序上进行。但是np.array做不了这个操作。
def save_hdf5(vecs, filename): '''save the processed data into a hdf5 file''' f = tables.open_file(filename, 'w') filters = tables.Filters(complib='blosc', complevel=5) earrays = f.create_earray(f.root, 'phrases', tables.Int16Atom(),shape=(0,),filters=filters) indices = f.create_table("/", 'indices', Index, "a table of indices and lengths") pos = 0 line=1 for x in vecs: print(line) earrays.append(numpy.array(x)) ind = indices.row ind['pos'] = pos ind['length'] = len(x) ind.append() pos += len(x) line=line+1 f.close()
我应该如何修改这段代码,thx。
The text was updated successfully, but these errors were encountered: