diff --git a/community-solutions/ssh-password-migration/overview.mdx b/community-solutions/ssh-password-migration/overview.mdx index 7ee2165e..37d40be3 100644 --- a/community-solutions/ssh-password-migration/overview.mdx +++ b/community-solutions/ssh-password-migration/overview.mdx @@ -6,7 +6,7 @@ icon: "arrows-rotate" ## What are these tools? -These are simple bash and Python scripts that solve a critical problem: **migrating data between Runpod instances when you need to move Pods** (e.g., when your Pod gets stuck with zero GPUs or you need to switch to a different instance). +These are simple bash and Python scripts that solve a critical problem: **migrating data between Runpod instances when you need to move Pods** (e.g., when your original GPU becomes unavailable on your Pod's machine or you need to switch to a different instance). Check the repository for additional features, updates, and documentation: [github.com/justinwlin/Runpod-SSH-Password](https://github.com/justinwlin/Runpod-SSH-Password) @@ -15,7 +15,7 @@ Check the repository for additional features, updates, and documentation: [githu ## The problem it solves When Runpod users encounter issues like: -- Pod stuck with **zero GPUs allocated**. +- Pod's **original GPU becomes unavailable** on its physical machine. - Need to **migrate to a different GPU type**. - Want to **transfer data between Pods**. - Need to **backup data before terminating a Pod**. @@ -111,4 +111,4 @@ Before using these tools, make sure you have: - [Managing Pods](/pods/manage-pods) - [Pod storage overview](/pods/storage/types) -- [Network volumes](/pods/storage/create-network-volumes) \ No newline at end of file +- [Network volumes](/storage/network-volumes) \ No newline at end of file diff --git a/pods/manage-pods.mdx b/pods/manage-pods.mdx index db8b0dce..ce8c0647 100644 --- a/pods/manage-pods.mdx +++ b/pods/manage-pods.mdx @@ -15,12 +15,17 @@ runpodctl config --apiKey RUNPOD_API_KEY ## Deploy a Pod + +You can deploy preconfigured Pods from the repos listed in the [Runpod Hub](/hub/overview). For more info, see the [Hub deployment guide](/hub/overview#deploy-as-a-pod). + + + To create a Pod using the Runpod console: 1. Open the [Pods page](https://www.console.runpod.io/pods) in the Runpod console and click the **Deploy** button. -2. (Optional) Specify a [network volume](/pods/storage/create-network-volumes) if you need to share data between multiple Pods, or to save data for later use. +2. (Optional) Specify a [network volume](/storage/network-volumes) if you need to share data between multiple Pods, or to save data for later use. 3. Select **GPU** or **CPU** using the buttons in the top-left corner of the window, and follow the configuration steps below. GPU configuration: @@ -150,12 +155,12 @@ With custom templates, you can: ## Stop a Pod -If your Pod has a [network volume](/pods/storage/create-network-volumes) attached, it cannot be stopped, only terminated. When you terminate the Pod, data in the `/workspace` directory will be preserved in the network volume, and you can regain access by deploying a new Pod with the same network volume attached. +If your Pod has a [network volume](/storage/network-volumes) attached, it cannot be stopped, only terminated. When you terminate the Pod, data in the `/workspace` directory will be preserved in the network volume, and you can regain access by deploying a new Pod with the same network volume attached. When a Pod is stopped, data in the container volume is cleared, but data in the `/workspace` directory is preserved. To learn more about how Pod storage works, see [Storage overview](/pods/storage/types). -By stopping a Pod you are effectively releasing the GPU on the machine, and you may be reallocated 0 GPUs when you start the Pod again. For more info, see the [FAQ](/references/faq#why-do-i-have-zero-gpus-assigned-to-my-pod%3F). +By stopping a Pod you are effectively releasing the GPU on the machine, and your original GPU may become unavailable when you restart the Pod. Runpod provides automatica migration options to help you get back to work quickly. For more info, see the [FAQ](/references/faq#why-do-i-have-zero-gpus-assigned-to-my-pod%3F). After a Pod is stopped, you will still be charged for its [disk volume](/pods/storage/types#disk-volume) storage. If you don't need to retain your Pod environment, you should terminate it completely. @@ -254,7 +259,7 @@ pod "wu5ekmn69oh1xr" started with $0.290 / hr -Terminating a Pod permanently deletes all associated data that isn't stored in a [network volume](/pods/storage/create-network-volumes). Be sure to export or download any data that you'll need to access again. +Terminating a Pod permanently deletes all associated data that isn't stored in a [network volume](/storage/network-volumes). Be sure to export or download any data that you'll need to access again. diff --git a/references/faq.mdx b/references/faq.mdx index b19781c1..f5a20c06 100644 --- a/references/faq.mdx +++ b/references/faq.mdx @@ -138,19 +138,40 @@ We don't currently support Windows. We want to do this in the future, but we do Runpod needs to provide you with reliable servers. All of our listed servers must meet minimum reliability, and most are running in a data center! However, if you want the highest level of reliability and security, use Secure Cloud. Runpod calculates server reliability by maintaining a heartbeat with each server in real-time. -### Why do I have zero GPUs assigned to my Pod? +### Why am I being asked to migrate my Pod? -Most of our machines have between 4 and 8 GPUs per physical machine. When you start a Pod, it is locked to a specific physical machine. If you keep it running (On-Demand), then that GPU cannot be taken from you. However, if you stop your Pod, it becomes available for a different user to rent. When you want to start your Pod again, your specific machine may be wholly occupied. In this case, we give you the option to spin up your Pod with zero GPUs so you can retain access to your data. +In most cases, our machines have 4–8 GPUs per physical machine. When you start a Pod, this locks your Pod to that specific physical machine. If you keep your Pod running, it means your Pod will not be impacted if all GPUs of your type get taken — your instance charges stay the same: -Remember that this does not mean there are no more GPUs of that type available, just none on the physical machine that specific Pod is locked to. Note that transfer Pods have limited computing capabilities, so transferring files using a UI may be difficult, and you may need to resort to terminal access or cloud sync options. +* Important for longer-running, important sessions where you don't want to lose data/work. -If you want to avoid this, using network volumes is the best choice. [Learn how to use them here](/pods/storage/create-network-volumes). +* If you **stop your Pod**, it immediately becomes available for a different user to rent. If when you DO want to use it, your specific machine is now full (i.e. someone rented all 4-8 GPUs), you will get a zero GPUs available message. + +If you try to start a Pod on a machine without any available GPUs, you have three options: + +1. **Do nothing**: If you don't want to migrate their data, you can now simply do nothing and wait/come back a few minutes later to start their Pod when GPU resources become available again. + +2. **Start Pod with CPUs**: For users who don't require GPUs immediately, Runpod allows you to start your Pod with CPUs only so you can still access your data or even manually migrate your data if you want to. + +3. **Automatically Migrate Pod Data**: This l spins-up a new Pod with the same specs as the current one and migrates user data automatically so users can get back to work quickly. This 1–click migration process will find a new machine with the requested GPU type, spin up the instance, and migrate the user's Network Volume data automatically from their old Pod into their new Pod. + + + +When you use this feature, you **will** get a new Pod and IP address. This is because of how Runpod is architecturally built — Pod ID's are tied to a specific physical machine. This will affect you only if: +1. You have a pod ID hardcoded in an API call +2. You have a proxy URL hardcoded: e.g. `b63b243b47bd340becc72fbe9b3e642c.proxy.runpod.net` +3. You have a firewall or VPN setup with a specific Pod ID in it +4. You have a firewall or VPN setup with a specific Pod IP address in it +5. You are using a specific URL for your server (when you start a new Pod, you will get a new URL for the UI or server you've setup, etc) + + + +It doesn't mean that there are no more GPUs of that type available: just none on the Physical Machine that your specific Pod is locked to. #### What are Network Volumes? -Network volumes allow you to share data between Pods and generally be more mobile with your important data. This feature is only available in specific secure cloud data centers, but we are actively rolling it out to more and more of our secure cloud footprint. If you use network volumes, you should rarely run into situations where you cannot use your data with a GPU without a file transfer. +Network volumes allow you to share data between Pods and generally be more mobile with your important data. This feature is only available in certain secure cloud data centers, but we are actively rolling it out to more and more of our secure cloud footprint. If you use network volumes, you should rarely run into situations where you cannot use your data with a GPU without a file transfer. -[Read about it here](/pods/storage/create-network-volumes). +[Read about it here](/storage/network-volumes). ## What if?