Dashboard still dead?

Appreciate the work going on here, and it’s been a lot, and the move will be great, but:

When will this start to reflect reality?

In general, when performing a migration, shouldn’t we ensure the cutover is done within a reasonable time window, e.g. a day or two?

  • What are the issues and what is left to be done?

  • Is there something the development team can help with?

  • Is it waiting on any action by a member of the development team?

I have no idea either, I know I have the move of the canaries from one DO org to the other on my list but AFAIK that should not affect the functionality in any way.

It’s doing SOMETHING - since we got this last night

Which I - to be honest - don’t understand. It says there is a problem and then it recovered within 10 minutes. Now I know you’re fast Robert but I doubt that you fixed an issue within 10 minutes in the middle of the night?

1 Like

The team missed the dev call today :man_facepalming:

But I will make sure this is discussed in our replacement meeting same time tomorrow

@development_team

I’m not even sure if this is dev related…

Well it feels like an incomplete migration to me, and if @angus needs help to complete it we can jump in. Would really help to have a pointer or two on what the issue might be and where roughly.

The plugin guard logs have this message:

PluginGuard::Status.update failed. Errors: Failed to post status to plugin manager: {"errors":["You are not permitted to view the requested resource. The API username or key is invalid."],"error_type":"invalid_access"}

Where is this message coming from? Check out lib/plugin_guard/status.rb. You’ll see where it posts status updates to the server and how it handles errors

unless response.status == 200
  add_error("Failed to post status to plugin manager: #{response.body.to_s}")
  return false
end

Then check out the plugin manager sever plugin status controller. You’ll see it requires an authorized API access (and has a descriptive error message)

unless is_api? || (is_user_api? && current_user.present?)
  raise Discourse::InvalidAccess.new('plugin statuses can only be updated via authorized api requests')
end

Going back to the plugin guard code you’ll notice there’s one hidden site setting: plugin_manager_api_key for the purpose of running the guard on a server we control (i.e. a canary; the user api key would be used if an end user used the guard). There’s also a dedicated api key scope. Go to the old PMS server and you’ll notice there are two keys bearing the names of the two canaries.

Check out the keys and you’ll notice their attributes, e.g. the user and scope

So looks like we need to issue an api key for the system user with the plugin_manager scope. I’ve done that and set it on the tests-passed canary.

ssh root@tests-passed.discourse.pluginmanager.org
cd /var/discourse
./launcher enter app
rails c
SiteSetting.plugin_manager_api_key = "# key "

Tests passed plugin status is working now.

2 Likes

Great, thanks for explaining Angus. I just noticed that tests-passed is now indeed working.
I will do the effort to make stable work later today or tomorrow.

Would you be able to shine your light on this as well?

1 Like

and one more related request, sorry to keep bothering you, would we have a way to reroute these messages into another category? They are really cluttering the support categories.

The key piece of code to read to understand what is going on here is the status handler

The reason this specific phenomenon is happening is because this job runs every 10 mins

And currently the Custom Wizard CI is failing. It gets “resolved” because the tests passed canary (which I’ve been rebuilding) will post an “OK” status. This logic in the “status” helpers will treat that OK status as a resolution

  def self.working?(status)
    compatible?(status)
  end

  def self.not_working?(status)
    incompatible?(status) || tests_failing?(status)
  end

This should probably read

  def self.working?(status)
    compatible?(status) && !tests_failing?(status)
  end

  def self.not_working?(status)
    incompatible?(status) || tests_failing?(status)
  end

I’ll take a look at this underlying issue.

This is also why these issue reports have no details. Because the error details are not being picked up correctly from the CI error.

1 Like

I’ve pushed a fix for the tests status handling

I’ll deploy it tonight because the PMS still requires a full site rebuild. That’s actually not really necessary anymore, and I’ll remove that special build setup from the PMS server soon (still necessary with the PMS guard).

There are still some tweaks to be made in the details of the error report, and maybe it needs a tag to show it’s a “tests failing” issue. But now when you see one you know what it means.

1 Like

See Discourse Multilingual main does not work on tests-passed
We’re still getting those topics and they’re still being resolved. I am completely lost here.

Is this a bug or is this intentional? Those topics clutter the site, the tags, and even my mailbox.

What are the steps we need to take to

  • not have all these redundant topics and/or
  • make them more informative about what is going on and/or
  • move them to a better place

That’s because I haven’t deployed the fix yet.

2 Likes

For the vast majority of the users, it would make sense to have the automated tag muted so they don’t see those topics at all. I’ve done this for myself as they are unhelpful for me - and my experience has improved markedly!

I wonder if it would be best to have those muted by default, but ‘normalled’ for @plugin_admins.

1 Like

This is now deployed.

2 Likes

Thanks Nathan, good suggestion, I have now muted #automated for all users.

Angus, thank you for deploying, the issues are indeed not auto closed any more, which is a step into the good direction.

Someone will need to

  • perhaps add a tag for reports generated from failing tests
  • figure out how to ingest error descriptions from CI

I’m not sure what you’re referring to here?

Sorry, I looked at yesterdays message by accident, so I mixed up the error message for Events and Multilingual. You can ignore the last question.

I’m closing this an opening up a new ticket for improvements to the details of automated error messages.

@richard FYI, as I’m guessing this will be the next question :wink:

  1. The events plugin status was tests failing because CI was failing. This is why the automated issue topic was created.

  2. CI is now passing, so the “test status” for the plugin is no longer failing, however since that switch the PMS has yet to receive a “Compatible” status from the tests-passed canary

  3. A “Compatible” status on the PMS requires BOTH a compatible ping from the relevant canary and tests passing. The compatible status update currently must come while tests are passing.

So the status will be “Unknown” until the tests passed canary sends a compatible status.

1 Like