Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

realtime reading of binary files #33

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

realtime reading of binary files #33

wants to merge 6 commits into from

Conversation

nkx111
Copy link
Member

@nkx111 nkx111 commented Jun 29, 2021

nkx111 77 Powered by Pull Request Badge

I added a method FRead() in TRestRawToSignalProcess to replace the default fread(). It can:

  • wait specific times if the file reading reaches eof
  • add readed bytes count automatically

Now with the help of this method, TRestRawMultiCoBoAsAdToSignalProcess is able to process data when the file is still writing.

@nkx111 nkx111 requested a review from jgalan June 29, 2021 07:33
@jgalan jgalan requested a review from DavidDiezIb June 29, 2021 08:25
@jgalan jgalan assigned juanangp and unassigned juanangp Jun 29, 2021
@jgalan jgalan requested a review from juanangp June 29, 2021 08:25
@juanangp
Copy link
Member

I would suggest to implement 2 modalities via config file:

  • Online visualization: It never reach EOF, it checks the file size before reading the data stream. You can define an offset size that shouldn't be reached. In case all the pulses are saved it should be straightforward to define this size which would be 1 full event. In case not all pulses are saved it may be more tricky, but still can be added as a parameter in the config file.
  • Data processing (off-line): Current implementation, it doesn't check the file size, so you gain time while processing the data.

On the other hand, I think that doing the analysis process while running may consume a lot of resources in the DAQ computer. I would suggest to do a quick algorithm to retrieve the most important parameters for visualization.

I was planning to work on a DAQ program which should also include on-line visualization which is linked with this feature but it would require time.

@jgalan
Copy link
Member

jgalan commented Jun 29, 2021

  • Online visualization: It never reach EOF, it checks the file size before reading the data stream. You can define an offset size that shouldn't be reached. In case all the pulses are saved it should be straightforward to define this size which would be 1 full event. In case not all pulses are saved it may be more tricky, but still can be added as a parameter in the config file.

Right, probably an option would be to keep a variable that tells us the averageEventSize, such as totalBytes/nEntries. Then, at the configuration file we define the number of events to be left as offset. That is a bit more human, instead of having to play with the filesize chunks.

@nkx111 nkx111 changed the title readtime reading of binary files realtime reading of binary files Jun 29, 2021
@jgalan
Copy link
Member

jgalan commented Jun 29, 2021

On the other hand, I think that doing the analysis process while running may consume a lot of resources in the DAQ computer. I would suggest to do a quick algorithm to retrieve the most important parameters for visualization.

I think reusing what we have would be the most efficient, because any new implementation at the processing chains, would have a direct impact on the possibilities of the new DAQ system.

We could always launch the visualization data chain with low nice priority. In the scenario this affects the resources at the DAQ one could even think of having two isolated computers, one in charge of registering the data, the other in charge of online display.

Perhaps REST could be installed at a raspberry PI for the visualization routines. The TPC data flow is broadcasted and could be intercepted by any device.

@nkx111
Copy link
Member Author

nkx111 commented Jun 29, 2021

Online visualization: It never reach EOF, it checks the file size before reading the data stream

We don't need to play the trick of file size. The trick is already played by checking the output of fread(). If it reaches end(bytes to read > bytes to eof), fread() will just return 0 and nothing will be changed. We just need to wait several seconds and call fread() again.

So online visualization/Data processing switch is already enabled with this PR. By setting fMaxWaitTimeEOF to 0 switches to data processing mode. By setting it to ~5 sec switches to online visualization mode, unless there is no data from daq within 5 seconds.

fMaxWaitTimeEOF is a new parameter for TRestRawToSignalProcess, which plays similar role as your comments as averageEventSize or file waiting offset

@nkx111
Copy link
Member Author

nkx111 commented Jun 29, 2021

Perhaps REST could be installed at a raspberry PI for the visualization routines. The TPC data flow is broadcasted and could be intercepted by any device.

This is possible. We can mount nfs to a shared disk and run REST/daq software on different computers. The files updated by daq computer can be correctly read from the REST computer. fMaxWaitTimeEOF shall be set to >5s in order to wait nfs to sync file changes.

@jgalan
Copy link
Member

jgalan commented Jun 29, 2021

Right, we could have even a Raspberry PI running REST, processing the online data at NFS, generating the PNG files, and placing them at a webserver location created by the Raspberry PI, so it could be accessed anywhere. Not sure if that would be good enough, but I guess it would be for a refresh rate of ~5seconds. Perhaps other ideas @juanangp @lobis

@jgalan jgalan requested a review from lobis June 29, 2021 10:59
@DavidDiezIb
Copy link
Member

We don't need to play the trick of file size. The trick is already played by checking the output of fread(). If it reaches end(bytes to read > bytes to eof), fread() will just return 0 and nothing will be changed. We just need to wait several seconds and call fread() again.

I was expecting that but always get stuck at the end of the file. I've added some cout to see the performance and this is what I get:

imagen

And this the code:

bool TRestRawToSignalProcess::FRead(void* ptr, size_t size, size_t n, FILE* file) {
    int nwaits = 0;
    while (1) {
        int reads = fread(ptr, size, n, file);
        totalBytesReaded += reads * size;
        if (reads != n || feof(file)) {
            cout <<"ftell after fread: "<< ftell(file) << endl;
            if (reads != n ) {cout<<"Reads: "<<reads<<" n: "<<n<<endl;}
            if ( feof(file)) {cout<<"EoF: "<<feof(file)<<endl;}
            nwaits++;
            if (nwaits > fMaxWaitTimeEOF) return false;
            //sleep(1);
            std::this_thread::sleep_for(std::chrono::seconds(1));
            fseek(file, ftell(file), 0);
            cout<<"EoF after fseek: "<<feof(file)<<endl;
        } else {
            return true;
        }
    }
    return false;
}

It stays like that no matter how much time I set for fMaxWaitTimeEOF .
So first time it fails to read a full event fread returns a number smaller than n but in following iterations always 0, like if it is being pushed always to the end of the file. Also ftell stays the same, although I think this is normal because if fread cannot read the full chain it shouldn't move the position. So I don't know what is going on here.

@nkx111
Copy link
Member Author

nkx111 commented Jun 29, 2021

It seems that there is some problems when n is not 1. I am looking into that.

@nkx111
Copy link
Member Author

nkx111 commented Jun 29, 2021

Also ftell stays the same

ftell shall propagate if fread reads the file partially.

e.g. you are at file position 7, the total length of file is 10. If you run: fread(ptr, 2, 3, file), you will get a returned value 1, with new file position 9. On the other hand, if your file length is 15, which is enough to read, you will get a returned value 3, with new file position 13, meaning that all the 3 two-byte chunks are read.

@nkx111
Copy link
Member Author

nkx111 commented Jun 29, 2021

Now it should be working.

@DavidDiezIb
Copy link
Member

Have you been able to see an histogram growing in real time? In my case the problem is still the same, it enters in the while loop but never reach to read further in the file.

@nkx111
Copy link
Member Author

nkx111 commented Jul 1, 2021

For me it is working. First I modify the rml file and set maxWaitTimeEOF="10" for MultiCoBoAsAdToSignalProcess. Then I create a file with limited writing speed using scp -l 2000 command. In the mean time I launch restManager and see the file is processing gradually. The displayed speed of processing is same as the scp speed.

The usage of new FRead method is a little different from fread. It returns just bool values to indicate if file reading is successfull. You may use it like:

if (!FRead(&(cur_fr[fr_offset]), sizeof(unsigned short), nb_sh, fInputBinFile)) {
                    printf("Error: could not read %d bytes.\n", (nb_sh * 2));
                    exit(1);
}

@DavidDiezIb
Copy link
Member

Not sure what is going on but I'm still at the same point. I've checked also that we are using FRead as you say.
What I know for sure is that when the file grows the fread inside FReaddoesn't detect it (I've check it with a dedicated macro running at the same time).
I use some cout to see what is going on inside the while loop:

while (1) {
        float pos = ftell(file);
        int reads = fread((char*)ptr + chunksReaded * size, size, chunksRemaining, file);
        float pos2 = ftell(file);
        totalBytesReaded += reads * size;
        if (reads != chunksRemaining || feof(file)) {
            if (reads == 0) {
                nwaits++;
            } else {
                // In case it reads something partially
                nwaits = 1;
                chunksReaded += reads;
                chunksRemaining -= reads;
            }
            if (nwaits > fMaxWaitTimeEOF) return false;
            
            cout << " " << endl << endl;
            cout << "ftell 1(file) " << ftell(file) << endl ;
            //cout << "Pointer to last " << fseek(file, ftell(file), SEEK_END) << endl;
            cout << "Feof " << feof(file) << endl;
            cout << "reads " << reads << endl;
            cout << "size " << size << endl;
            cout << "pos " << pos << endl;
            cout << "pos2 " << pos2 << endl;
            
            cout << "chunksReaded " << chunksReaded << endl;
            cout << "chunksRemaining " << chunksRemaining << endl;
            
            
            fseek (file, 0, SEEK_END);   // non-portable
            long size=ftell (file);
            
            cout << "Size " << size << endl;

            std::this_thread::sleep_for(std::chrono::seconds(1));
            fseek(file, ftell(file), 0);
            //fseek(file, 0, SEEK_CUR);
            cout << "ftell 2(file) " << ftell(file) << endl << endl;
            
        } 

And I set maxWaitTimeEOF big enough to see changes in time. This is what I get all the time:

imagen

And with the macro I can see changes in the file size:

imagen

I'm using FRead in TRestRawMultiFEMINOSToSignalProcess, is there any difference with MultiCoBoAsAdToSignalProcess that could explain this behavior?
I've seen MultiCoBoAsAdToSignalProcess has a AddInputFile function that TRestRawMultiFEMINOSToSignalProcess haven't, I don't see any relation with my problem but I don't understand completely how TRestRawMultiFEMINOSToSignalProcess deals with files, maybe it opens the file at the beginning of the processing chain and never updates it?
@jgalan do you know if something like that could happen?

PD: Sorry for the long post but it's also helpful to me to have everything written down.

@nkx111
Copy link
Member Author

nkx111 commented Jul 3, 2021

I updated TRestRawMultiFEMINOSToSignalProcess by just changing fread() to FRead(). It is also working on my side. Now I guess it is because of some OS or file system dependent problem. What is your operation system and file system?

@DavidDiezIb
Copy link
Member

DavidDiezIb commented Jul 5, 2021

The OS is Debian 4.19.194-1 x86_64 GNU/Linux and the file system nfs4

@jgalan
Copy link
Member

jgalan commented May 24, 2022

What about this PR?

@nkx111
Copy link
Member Author

nkx111 commented May 24, 2022

I found it later on that for nfs file system the real-time file reading is indeed problematic. If the binary file is being taken, online visualization must be running on the same computer with daq system.

The FRead() method is still useful. And the raw processes can be updated accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants