Skip to content

[🐛 Bug]: Keda is not scaling when i set the scale type is deployment and my job is goin gin queue #2791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Kamalb2592 opened this issue Apr 17, 2025 · 8 comments

Comments

@Kamalb2592
Copy link

Kamalb2592 commented Apr 17, 2025

What happened?

Operatoe log
unable to get external metric selenium-grid-qas/s0-selenium-grid-chrome--/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: selenium-grid-selenium-node-chrome,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: the server was unable to return a response in the time allotted, but may still be processing the request (get s0-selenium-grid-chrome--.external.metrics.k8s.io)

keda enablement in values file
keda: enabled: true metricsServer: useHostNetwork: true

Command used to start Selenium Grid with Docker (or Kubernetes)

Operatoe log 
`unable to get external metric selenium-grid-qas/s0-selenium-grid-chrome--/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: selenium-grid-selenium-node-chrome,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: the server was unable to return a response in the time allotted, but may still be processing the request (get s0-selenium-grid-chrome--.external.metrics.k8s.io)`


keda enablement in values file 
`keda:
    enabled: true
    metricsServer:
      useHostNetwork: true`

Relevant log output

`unable to get external metric selenium-grid-qas/s0-selenium-grid-chrome--/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: selenium-grid-selenium-node-chrome,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: the server was unable to return a response in the time allotted, but may still be processing the request (get s0-selenium-grid-chrome--.external.metrics.k8s.io)`

Operator log
"keda-operator.selenium-grid-qas.svc.cluster.local:9666", }. Err: connection error: desc = "transport: Error while dialing: dial tcp XXXXXXXXXXX:9666: connect: connection refused" I0417 13:32:34.062543 1 provider.go:64] "msg"="Connection to KEDA Metrics Service gRPC server has been successfully established" "logger"="keda_metrics_adapter.provider" "server"="keda-operator.selenium-grid-qas.svc.cluster.local:9666"

keda apiserver log
E0417 13:57:00.700425 1 writers.go:135] "Unhandled Error" err="apiserver was unable to write a fallback JSON response: http: Handler timeout" logger="UnhandledError" E0417 13:57:00.701566 1 timeout.go:140] "Post-timeout activity" logger="UnhandledError" timeElapsed="70.788205ms" method="GET" path="/apis/external.metrics.k8s.io/v1beta1/namespaces/selenium-grid-qas/s0-selenium-grid-chrome--" result=null

Operating System

EKS

Docker Selenium version (image tag)

latest

Selenium Grid chart version (chart version)

0.42.1

Copy link

@Kamalb2592, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@VietND96
Copy link
Member

Did you deploy by using Helm chart in this repo or your own YAML/charts?

@Kamalb2592
Copy link
Author

Kamalb2592 commented Apr 17, 2025

Using Helm Chart only and below the chart file content
`apiVersion: v2
name: selenium-cqe
version: "1.0.0"
kubeVersion: ">= 1.19.0-0"
description: chart to deploy a selenium-grid
dependencies:

Image

@Kamalb2592
Copy link
Author

My Value file
selenium-grid: ingress: # Enable or disable ingress resource enabled: true # Name of ingress class to select which controller will implement ingress resource className: "nginx-external" # Custom annotations for ingress resource annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/force-ssl-redirect: "true" nginx.ingress.kubernetes.io/proxy-read-timeout: "3600" hostname: "selenium-grid.XXXXX" # Default host path for the ingress resource path: / autoscaling: enabled: true scalingType: deployment scaledOptions: minReplicaCount: 0 maxReplicaCount: 100 pollingInterval: 10 hub: extraEnvironmentVariables: - name: SE_SESSION_RETRY_INTERVAL value: "5000" - name: SE_SESSION_REQUEST_TIMEOUT value: "300000" firefoxNode: enabled: false deploymentEnabled: false edgeNode: enabled: false deploymentEnabled: false chromeNode: chromeNode: hpa: platformName: "Linux" resources: requests: memory: "1Gi" cpu: "1" limits: memory: "1Gi" cpu: "1" extraEnvironmentVariables: - name: SE_VNC_NO_PASSWORD value: "2" - name: SE_NODE_OVERRIDE_MAX_SESSIONS value: "true" - name: SE_NODE_SESSION_TIMEOUT value: "600" keda: metricsApiServer: args: - --request-timeout=60s metricsServer: useHostNetwork: true

@VietND96
Copy link
Member

I see the request in queue looks like without platformName set
so, in chart values, set

chromeName:
  hpa:
    platformName: ""

or update your client code with .set_capability('platformName', 'Linux') (example in Python) to align with current chart values.

@Kamalb2592
Copy link
Author

Kamalb2592 commented Apr 17, 2025

Made the changes and added in values file and post that it not even scaling a single crome node now

Image

and getting error during executing test
mHTTPSConnectionPool(host='selenium-grid-XXXX-XXXXX.XXXXXXXX.com', port=443): Read timed out. (read timeout=120)�[0m�[0m

Image

@VietND96
Copy link
Member

VietND96 commented Apr 18, 2025

There is mismatch between platformName in request and scaler config.
Via screenshot, I can see request without platformName set (no Linux icon appear, or click in (i) icon, there is no capability platformName)
So, as my comment above, you need adjust chart value to empty, or add platformName capability in client code.

@Kamalb2592
Copy link
Author

Kamalb2592 commented Apr 21, 2025

Thanks, it's working now. However, the jobs are being assigned sequentially rather than in parallel — only one job is picked up at a time while the others remain in the queue. I also noticed that only one Chrome node is being created at a time.

My autoscaling type is deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants