-
-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measuring AsyncAPI spec adoption #780
Comments
Welcome to AsyncAPI. Thanks a lot for reporting your first issue. Please check out our contributors guide and the instructions about a basic recommended setup useful for opening a pull request. |
@derberg Sounds interesting ... I would like to take this issue as my GSOC'22 proposal.😊 |
@ritik307 sounds awesome! @smoya @BOLT04 @magicmatatjahu any objections to have this endpoint first on |
No problem for me, but we have to remember that we also provide that project as Docker Image, so people will have also that path. We have to think how to avoid unnecessary paths for people which use that project. |
@derberg no problem for me 🙂, this is pretty cool!
@magicmatatjahu I get what you're saying and if this new endpoint does in fact need to use external services (e.g. Google APIs), we would need new config/environment variables for API keys, etc. I propose we use feature flags to solve this. On our deployed version of the API the feature is on, but for local development it's not. If someone wants to try it out locally, they just have to configure the necessary values and turn the toggle on to start measuring spec adoption in their own environment 🙂 |
I love the idea of adding our schemas to Schema Store. I didn't know about it until 5 minutes ago and I like it a lot 👍. I want to add some feedback regarding the creation of a service for serving the schemas: Serving static files such as JSON Schema files in a fast and reliable way is exactly the reason why CDNs exist. Considering the possible amount of traffic this service will have, and the fact it will keep growing on time (more users, more tooling, etc), I would not advocate for creating and maintaining this service by ourselves. I understand we want this service because we need those metrics (maybe there is another strong reason I missed, so please correct me). There is also the following Draft PR by @jonaslagoni : #502 that might make sense to check. It aims to serve AsyncAPI JSON Schema files from our website. On the other side, we could consider the same approach with any other CDN product that offers analytics, such as AWS S3, GCP Cloud Storage, etc (asking for budget, etc). I would like to know your thoughts. |
Cool. I think the most important is that you support the idea. It is not written in stone to have it as endpoint here:
@magicmatatjahu @smoya @BOLT04 please only keep in mind that we should leave as much as possible up to @ritik307 (if you still want to take this task for GSoC). You folks turn into mentors, just guide @ritik307 what needs to be checked and tried out to get the desired outcome. |
Sure @derberg I would love to take this task for GSOC 😊 and it would be great if you guys mentor me.😊 |
I think that file hosting and adding metrics to ServerAPI itself will not be a problem. We have control over every part, so we won't have to use additional services. However, CDN would be better in this case and I am for this option! |
I would like to retake this, especially after @derberg raised its concern on #502 (comment). There is something we should consider before moving forward with a custom solution with our own service. Right now we do not have services exposed openly for being consumed in the same frequency as static JSON Schema files would be. With a CDN provided by a SAAS company, you remove all of those concerns. Again, I know we want some metrics, but IMHO it is totally worth asking Netlify and if it fills our goals, pay if needed for the Metrics service. I can tell you is worth paying for a service than having to handle your own high-available service. |
This is definitely what I prefer since you mentioned Netlify Analytics. |
Yup, this is the service I meant.
|
some important info: #502 (comment) |
TL;DR: I still think we should avoid creating a new file server app. Instead, look for an alternative based on a SaaS provider . And I'm suggesting some alternative ideas to the previous one. I'm happy to keep evolving this idea and also to put it in practice asap. I understand the need to get such metrics and how simple it seems to build a file server with built-in metrics. However, I want to stay strong on this idea: We should avoid managing services on our own (at this time). Some of the reasons have been exposed already in (my) previous comments but I'm going to list some of them here with a bit more detail. AsyncAPI JSON Schema definitions are the most important pieces of software we provide to the community (IMHO). They are meant to be used by systems for parsing and validating AsyncAPI documents and services that use them on runtime for validating messages, among other use cases. However, who are the users of those raw files, and how do they use them? I can imagine a few use cases:
With this in mind, the following points are worth to be noted:
Having said that, I'm proposing stick with a SaaS-based solution from the first day that allows us only to take care about the very minimum: as max, collecting the metrics and processing them, but never about serving the files. We tried with Netlify Analytics. Unfortunately, the metrics we want (hits on JSON Schema files) are not collected. Even though it is a matter of time they support it, we don't have an ETA for it. There are several other ways we can do this, and those are some of the ideas I have in mind: Netlify Log DrainsNetlify Log Drains allows sending both traffic logs and function logs to an external service, such as New Relic, Datadog, S3... and also to our own service (could be a Netlify Function as well). sequenceDiagram
participant User
participant asyncapi.org (Netlify)
participant AsyncAPI Metrics collector
Note right of AsyncAPI Metrics collector: Netlify Function <br/>or<br/> any monitoring SaaS
User->>asyncapi.org (Netlify): https://asyncapi.org/definitions/2.3.0.json
asyncapi.org (Netlify)->>User: 2.3.0.json
asyncapi.org (Netlify)-->>AsyncAPI Metrics collector: Netlify Log Drains metrics
With this approach, and in the more complex solution, we will only care about the metrics collector service, which could eventually be down but won't affect the user request. Netlify Edge HandlersNetlify Edge Handlers work by letting you executing code on the edge directly, intercepting the request. We could run Javascript code there to collect the metrics we want; in our case, the hits on the definition files. This is in BETA right now (you should ask to enable it). However, I would ask you for an ETA for going public. I guess they should have plans to release it as a public beta in the short-mid term. EDIT: Netlify Edge Functions are now public beta, available for free. https://www.netlify.com/blog/announcing-serverless-compute-with-edge-functions Use AWS S3AWS S3 is a well-known solution for storing files. And with the metrics they expose (Cloudwatch), we could know the number of sequenceDiagram
participant User
participant asyncapi.org (Netlify)
participant AWS S3
User->>asyncapi.org (Netlify): https://asyncapi.org/definitions/2.3.0.json
asyncapi.org (Netlify)->>AWS S3: Netlify rewrite rule to asyncapi.s3.amazonaws.com/definitions/2.3.0.json
AWS S3->>asyncapi.org (Netlify): 2.3.0.json
asyncapi.org (Netlify)->>User: 2.3.0.json
The price for this is not pretty high. I did a quick estimation for 30 million requests per month (yeah, a lot) here. We should also include the price for the Cloudwatch metrics, but IIRC is almost "nothing." If price is a concern, we could investigate Cloudflare R2, which is super cheap. However, the metrics they provide are unknown to me at this moment. Also, we would need to ask for access to R2 as it is in Beta at this moment. |
From today, Netlify Edge Functions (Previously known as Edge Handlers) are now public beta, available for free. https://www.netlify.com/blog/announcing-serverless-compute-with-edge-functions |
With the following, we could add the metrics push into the Netlify function #680 |
Taking this one off GSoC as it is important topic to handle and can't be delayed |
How to start 😄 I love idea from #680 ❗ On the "negative" side. I have completely different view on Maintainance/High Availability/Response-time topics:
So, lets fo forward with idea from #680 ❗ Alternative/compromise: to not mix topics and try to solve all with one solution. Maybe #680 could have 2 alternative paths, one for the needs related to AsyncAPI JSON Schema and |
I've been playing with Google Analytics 4 as a candidate for publishing our metrics. I have to say, I didn't get a good result. 1. In the whole realtime metrics, only a small rectangle including the events is present:2. The details are very hard to check (I added a param for the URL of the fetched file):As we can see, everything is focused on web apps, so not a really good fit for us. I know @derberg has played a lot with GA , Google Tag Manager, etc. Do you think it is still a fit for this, or should we rather consider using another alternative? |
I've been checking NewRelic One new free tier, and it allows to send up to 100GB of data, events included. I did a simple test with a POST request and created a simple dashboard to see how it would look like. Btw, New Relic has NRQL, a custom query language that allows you to easily query anything you send to them in a SQL query language fashion. If anyone has another suggestion, I'm happy to keep investigating (there are plenty of others out there) |
In the meantime, I'm moving forward with New Relic solution by now, and the development is all here: #680 In case you want to use another provider for metrics, I'm happy to adapt the code. More on #680 (comment) |
GA allows you to also create new view, custom components, with scheduled reports etc. But yeah, I'm not GA evangelist. Tbh I think the approach with New Relic is super nifty, as long as we can use it for free of course 😆 I guess you @smoya and @fmvilas can anyway get us more free storage if we need 😆 ❤️ from me for New Relic Does it mean we have an agreement on implementation? 🙌🏼 |
yeah let's go with the new relic solution proposed by @smoya 👍 wdyt everyone? |
I think we can even transfer it to https://github.com/asyncapi/website now 🤔 |
JSON Schema Store PR has been merged now, meaning all JSON Schema files fetched from it are now being downloaded from asyncapi.com/schema-store and metrics show that users are already fetching them: cc @derberg |
Omg this is so exciting 😍 |
❤️ Indeed! @smoya start thinking how do we send custom metrics from tooling 😝 |
We would need to expose a service that acts as a metrics ingest forwarding them to NR so we don't expose the NR API key on tooling but just send metrics to our service. I would think about it eventually! |
After fixing asyncapi/spec-json-schemas#236, JSON Schemas for different AsyncAPI versions are being downloaded from JSON Schema Store: I see there are downloads from all versions, and I really doubt those downloads are organic or in purpose. What I think it's happening is that, since the schema served by Schema Store now is https://github.com/asyncapi/spec-json-schemas/blob/master/schemas/all.schema-store.json, JSON Schema parsers might be downloading ALL referenced ( However, VSCode IDE is still showing the error: which @derberg mentioned already in redhat-developer/vscode-yaml#772 (reply in thread). So 🤷 ... |
I think we entered the world where we have to decide if we want to do things in our JSON Schema the way spec and spec maintainers recommend, or just adjust schema to work with tooling provided by the community 🤷🏼
yeah, the numbers for |
But if we do that, we will be invalidating all the counts for legitimate downloads. Correct me if I'm wrong, but: Considering that 1 fetch of
We can't say, subtract
|
@smoya yeah, you are right 🤦🏼 it sucks |
@smoya so looks like we can only measure adoption of the spec in general, not its specific versions? |
Yes, as the IDE plugins are downloading just one schema (containing all of the versions), we can't know which one they are using. So unfortunately, I'm running out of ideas here. I could open an issue in Schema Store repo asking for ideas. |
It is not that bad. For me, most important is to measure how many users we have. So adoption of the spec in general, and not each version. I'm personally skeptical of such measurements, as then people complain that new versions are not adopted forgetting that they also do not use new versions if they do not need them (anyway, not topic for this issue). If you can open a discussion with Schema Store, on how to fix things in future, that would be amazing. As even if I'm not interested with specific version adoption, I bet others are 😄 Can you adjust dashboard in New Relic 🙏🏼 So what is left is:
missing something? |
I will definitely be interested to know if people are really adopting version 3.0 once it's out. Would be cool to get some insights. Maybe it's time to measure it on our tools. |
I think it is a crucial metric, even though not the only method to collect data from. I would love to have a metric where, after a release, we could see how downloads for older versions go down in favor of the new one.
Done. No hope at all anyway. SchemaStore/schemastore#2440
Do you mean removing the versions stuff from it?
Do we really need that? With New Relic, we have 1 year right now. If more is needed, we could write some scripts to do aggregations every few months.
Related: SchemaStore/schemastore#2438 |
yeah, until we get it solved, this metric is not helpful, we just need total number
yes we need lifetime data to see over years how numbers change. But I do not mean we need that support on New Relic. Automated script, maybe running on GitHub Actions on a schedule is also fine 👍🏼
yeah, not much help, other than knowing you can clear the cache on demand. Source code indicates it is based on |
This issue has been automatically marked as stale because it has not had recent activity 😴 It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation. There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model. Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here. Thank you for your patience ❤️ |
This issue has been automatically marked as stale because it has not had recent activity 😴 It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation. There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model. Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here. Thank you for your patience ❤️ |
FYI, I tried to give it a last try, but didn't succeed 😞. All the info can be found at SchemaStore/schemastore#2440 (comment). |
overall adoption is still a great number to have 👍 |
FYI, I created SchemaStore/schemastore#3460 as a feature request in Schema Store that, if adopted, will help us achieve our mission. |
@smoya may I know the update of this issue please 😅 |
What do you need to know in particular? |
Like we will be going forward with this issue or not |
We did move forward. We have a New Relic dashboard that counts downloads of JSON Schema files, among other things. ATM there is no quick solution but long term journey, which is working on SchemaStore/schemastore#3460 and then push (or do the work) plugins (such as the VSCode YAML) to adapt to that new mechanism when pulling schemas from Schema Store. It is a long journey, but happy to welcome people if want to help! |
best would be if we document what we have, make accessible to others and close that issue, as there is a dependency on outside world that will take as @smoya wrote - long way to get it done. What we already have - overall adoption of AsyncAPI is good for me anyway, as I personally do not care much about specific version adoption. We have one big challange - 0 historical data as new relic free account that we use do not preserve data |
Reason/Context
We do not know how many people use AsyncAPI. The most accurate number we could get is the amount of the AsyncAPI users that work with AsyncAPI documents. But how measure how many people out there created/edited AsyncAPI file?
The answer is a solution that includes:
asyncapi
in a filename created using AsyncAPI specSome more discussion -> https://asyncapi.slack.com/archives/C0230UAM6R3/p1622198311005900
Description
server-api
service that anyone can use to fetch AsyncAPI JSON Schema files of any versionIf time left, we need to expose numbers somewhere. Either embed Google Analytics diagram somewhere on the AsyncAPI website or just have at least an API endpoint that exposes the latest numbers.
For GSoC participates
The text was updated successfully, but these errors were encountered: