The Wayback Machine - https://web.archive.org/web/20221003155006/https://repost.aws/
By using AWS re:Post, you agree to the Terms of Use

Learn AWS faster by following popular topics

see all
1/18

Recent questions

see all
1/18

Ubuntu Managed Nodes creation failure in Fully-Private cluster

Hi, For some reason I am not able to create Ubuntu managed nodes in fully private cluster. Though, managed Amazon-Linux nodes and all other self-managed nodes are joining the cluster successfully. I have followed all the guides and troubleshooting aws websites already still I am not successful. I have also run the troubleshooting script. Below is the result ``` HERE IS A SUMMARY OF THE ITEMS THAT REQUIRE YOUR ATTENTION: [WARNING]: Worker node's AMI ami-0ebb49de26355a371 differs from the public EKS Optimized AMIs. Ensure that the Kubelet daemon is at the same version as your cluster's version 1.23 or only one minor version behind. Please review this URL for further details: https://kubernetes.io/releases/version-skew-policy/ . [WARNING]: No secondary private IP addresses are assigned to worker node i-0e775ca75fe57bb70, ensure that the CNI plugin is running properly. Please review this URL for further details: https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html [WARNING]: As SSM agent is not reachable on worker node, this document did not check the status of Containerd, Docker and Kubelet daemons. Ensure that required daemons (containerd, docker, kubelet) are running on the worker node using command "systemctl status <daemon-name>". ============================================================================================================================================ Here are the detailed steps of the document execution: [X] Checking EKS cluster test-cluster: EKS Cluster: test-cluster is in Active state. 1. Checking if the cluster Security Group is allowing traffic from the worker node: Passed: The cluster Security Group sg-068d83e5a68aff9c7 is allowing traffic from the worker node. 2. Checking DHCP options of the cluster VPC: Passed: AmazonProvidedDNS is enabled 3. Checking cluster IAM role arn:aws:iam::782534010321:role/eksctl-test-cluster-cluster-ServiceRole-9WPNFE2Q3N2T for the required permissions: Passed: IAM role for cluster test-cluster has the required IAM policies attached. Passed: The cluster IAM role arn:aws:iam::782534010321:role/eksctl-test-cluster-cluster-ServiceRole-9WPNFE2Q3N2T has the required trust relationship for the EKS service. 4. Checking control plane Elastic Network Interfaces(ENIs) in the cluster VPC: Passed: The cluster Elastic Network Interfaces(ENIs) exist. 5. Cluster Endpoint Private access is disabled for your cluster, checking if the Public CIDR ranges include worker node i-0e775ca75fe57bb70 outbound IP: Passed: The cluster allows public access from 0.0.0.0/0 6. Checking cluster VPC for required DNS attributes: Passed: Cluster VPC vpc-0cbb4879c588fb52a has the required DNS attributes correctly set. -------------------------------------------------------------------------------------------------------------------------------------------- [X] Checking worker node i-0e775ca75fe57bb70 state: The instance is Running. 1. Checking if the EC2 instance family is supported: Passed: EC2 instance family m5.xlarge is supported. 2. Checking the worker node network configuration: Passed: Worker node is created in a private subnet without a NAT Gateway so VPC endpoints need to be used. Passed: Checking VPC Endpoints setup: Passed: The VPC Endpoint com.amazonaws.eu-west-2.ec2 exists. Checking its configuration: Passed: Security groups [{'GroupId': 'sg-01294c96494e79aaa', 'GroupName': 'eksctl-test-cluster-cluster-ClusterSharedNodeSecurityGroup-1TIY3P78RYQOZ'}] applied to VPC Endpoint com.amazonaws.eu-west-2.ec2 is allowing the worker node to reach the endpoint. Passed: The default VPC Endpoint Policy is being used. Passed: The VPC Endpoint com.amazonaws.eu-west-2.ecr.api exists. Checking its configuration: Passed: Security groups [{'GroupId': 'sg-01294c96494e79aaa', 'GroupName': 'eksctl-test-cluster-cluster-ClusterSharedNodeSecurityGroup-1TIY3P78RYQOZ'}] applied to VPC Endpoint com.amazonaws.eu-west-2.ecr.api is allowing the worker node to reach the endpoint. Passed: The default VPC Endpoint Policy is being used. Passed: The VPC Endpoint com.amazonaws.eu-west-2.ecr.dkr exists. Checking its configuration: Passed: Security groups [{'GroupId': 'sg-01294c96494e79aaa', 'GroupName': 'eksctl-test-cluster-cluster-ClusterSharedNodeSecurityGroup-1TIY3P78RYQOZ'}] applied to VPC Endpoint com.amazonaws.eu-west-2.ecr.dkr is allowing the worker node to reach the endpoint. Passed: The default VPC Endpoint Policy is being used. Passed: The VPC Endpoint com.amazonaws.eu-west-2.sts exists. Checking its configuration: Passed: Security groups [{'GroupId': 'sg-01294c96494e79aaa', 'GroupName': 'eksctl-test-cluster-cluster-ClusterSharedNodeSecurityGroup-1TIY3P78RYQOZ'}] applied to VPC Endpoint com.amazonaws.eu-west-2.sts is allowing the worker node to reach the endpoint. Passed: The default VPC Endpoint Policy is being used. Passed: S3 gateway endpoint ['vpce-0e01febb999cf1735', 'vpce-08ef7ffac589c3d01'] is added to the worker's VPC. Passed: Worker node's route table rtb-0d26fefe5a613e64f has the required route for the S3 endpoint vpce-0e01febb999cf1735. Passed: Worker node's route table rtb-0d26fefe5a613e64f has the required route for the S3 endpoint vpce-08ef7ffac589c3d01. 3. Checking the IAM instance Profile of the worker node: Passed: The instance profile arn:aws:iam::782534010321:instance-profile/eks-d0c1cede-94e5-c6bc-75fd-2f86043a90eb is used with the worker node i-0e775ca75fe57bb70 . Passed: IAM role arn:aws:iam::782534010321:role/eksctl-test-cluster-nodegroup-ser-NodeInstanceRole-12LZWOL4EJAJX is attached to Instance Profile. Passed: IAM role arn:aws:iam::782534010321:role/eksctl-test-cluster-nodegroup-ser-NodeInstanceRole-12LZWOL4EJAJX has the required IAM policies attached. Passed: No issues detected with the trust relationship policy of the arn:aws:iam::782534010321:role/eksctl-test-cluster-nodegroup-ser-NodeInstanceRole-12LZWOL4EJAJX role. 4. Checking worker node's UserData bootstrap script: Passed: The UserData of the worker node contains the required bootstrap script. 5. Checking the worker node i-0e775ca75fe57bb70 tags: Passed: Worker node i-0e775ca75fe57bb70 has the required cluster tags. 6. Checking the AMI version for EC2 instance i-0e775ca75fe57bb70: [WARNING]: Worker node's AMI ami-0ebb49de26355a371 differs from the public EKS Optimized AMIs. Ensure that the Kubelet daemon is at the same version as your cluster's version 1.23 or only one minor version behind. Please review this URL for further details: https://kubernetes.io/releases/version-skew-policy/ . 7. Checking worker node i-0e775ca75fe57bb70 Elastic Network Interfaces(ENIs) and Private IP addresses to check if CNI is running: [WARNING]: No secondary private IP addresses are assigned to worker node i-0e775ca75fe57bb70, ensure that the CNI plugin is running properly. Please review this URL for further details: https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html 8. Checking the outbound SG rules for worker node i-0e775ca75fe57bb70 Passed: The Outbound security group rules for worker node i-0e775ca75fe57bb70 are sufficient to allow traffic to the EKS cluster endpoint 9. Checking if the worker node is running in AWS Outposts subnet Passed: Worker node's subnet subnet-092b2d53a74ac655f is not running in AWS Outposts 10. Checking basic NACL rules Passed: NACL acl-0507434ac9745b3dc has sufficient rules to allow cluster traffic. 11. Checking STS regional endpoint availability: Passed: STS endpoint is activated within region eu-west-2. 12. Checking if Instance Metadata http endpoint is enabled on the worker node: Passed: Instance metadata endpoint is enabled on the worker node. 13. Checking if SSM agent is running and reachable on worker node: [WARNING]: As SSM agent is not reachable on worker node, this document did not check the status of Containerd, Docker and Kubelet daemons. Ensure that required daemons (containerd, docker, kubelet) are running on the worker node using command "systemctl status <daemon-name>". ============================================================================================================================================ Here is a list of other possible causes that were NOT checked by this document: [-] Ensure that Instance IAM role is added to aws-auth configmap, please check: https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html. [-] If your account is a part of AWS Organizations Service, confirm that no Service Control Policy (SCP) is denying required permissions, please check: https://aws.amazon.com/premiumsupport/knowledge-center/eks-node-status-ready/. ``` One strange thing is that in point 5 it says private endpoint access is disabled but I have already created the cluster fully private. ``` { "update": { "id": "4c429db1-80bb-48d2-bf98-3f84524c0b83", "status": "Successful", "type": "EndpointAccessUpdate", "params": [ { "type": "EndpointPublicAccess", "value": "false" }, { "type": "EndpointPrivateAccess", "value": "true" }, { "type": "PublicAccessCidrs", "value": "[\"0.0.0.0/0\"]" } ], "createdAt": "2022-10-03T09:35:42.092000+00:00", "errors": [] } } ``` Also, when I run the command `sudo systemctl status kubelet` I get the result. `Unit kubelet.service could not be found.` My cluster config is as below ``` apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: test-cluster region: eu-west-2 version: "1.23" privateCluster: enabled: true vpc: id: vpc-ID subnets: private: hscn-private1-subnet: id: subnet-1 hscn-private2-subnet: id: subnet-2 managedNodeGroups: - name: serv-test-1 ami: ami-0ebb49de26355a371 instanceType: m5.xlarge desiredCapacity: 1 volumeType: gp2 volumeSize: 50 privateNetworking: true disableIMDSv1: true subnets: - hscn-private2-subnet ssh: allow: true tags: kubernetes.io/cluster/test-cluster: owned overrideBootstrapCommand: | #!/bin/bash /etc/eks/bootstrap.sh test-cluster --kubelet-extra-args '--node-labels=eks.amazonaws.com/nodegroup=serv-test-1,eks.amazonaws.com/nodegroup-image=ami-0ebb49de26355a371' --dns-cluster-ip 10.100.0.10 --apiserver-endpoint {My endpoint} --b64-cluster-ca {My-CA} ```
0
answers
0
votes
9
views
asked an hour ago

AWS Pytorch Neuron Compliation Error

I followed user guide on updating torch neuron and then started compiling the model to neuron. But got an error, from which I don't understand what's wrong. In Neuron SDK you claim that it should compile all operations, even not supported ones, they just should run on CPU. The error: ``` INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 3345, fused = 3345, percent fused = 100.0% INFO:Neuron:Number of neuron graph operations 8175 did not match traced graph 9652 - using heuristic matching of hierarchical information INFO:Neuron:Compiling function _NeuronGraph$3362 with neuron-cc INFO:Neuron:Compiling with command line: '/home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]} --verbose 35' ..............................................................................INFO:Neuron:Compile command returned: -9 WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$3362; falling back to native python function call ERROR:Neuron:neuron-cc failed with the following command line call: /home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35 Traceback (most recent call last): File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 382, in op_converter item, inputs, compiler_workdir=sg_workdir, **kwargs) File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/decorators.py", line 220, in trace 'neuron-cc failed with the following command line call:\n{}'.format(command)) subprocess.SubprocessError: neuron-cc failed with the following command line call: /home/ubuntu/alias/neuron/neuron_env/bin/neuron-cc compile /tmp/tmpmp8qvhtb/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpmp8qvhtb/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 768, 768], "float32"]}, "outputs": ["aten_sigmoid/Sigmoid:0"]}' --verbose 35 INFO:Neuron:Number of arithmetic operators (post-compilation) before = 3345, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 942 [supported] INFO:Neuron: => aten::_convolution: 107 [supported] INFO:Neuron: => aten::add: 104 [supported] INFO:Neuron: => aten::batch_norm: 1 [supported] INFO:Neuron: => aten::cat: 1 [supported] INFO:Neuron: => aten::contiguous: 4 [supported] INFO:Neuron: => aten::div: 104 [supported] INFO:Neuron: => aten::dropout: 208 [supported] INFO:Neuron: => aten::feature_dropout: 1 [supported] INFO:Neuron: => aten::flatten: 60 [supported] INFO:Neuron: => aten::gelu: 52 [supported] INFO:Neuron: => aten::layer_norm: 161 [supported] INFO:Neuron: => aten::linear: 264 [supported] INFO:Neuron: => aten::matmul: 104 [supported] INFO:Neuron: => aten::mul: 52 [supported] INFO:Neuron: => aten::permute: 210 [supported] INFO:Neuron: => aten::relu: 1 [supported] INFO:Neuron: => aten::reshape: 262 [supported] INFO:Neuron: => aten::select: 104 [supported] INFO:Neuron: => aten::sigmoid: 1 [supported] INFO:Neuron: => aten::size: 278 [supported] INFO:Neuron: => aten::softmax: 52 [supported] INFO:Neuron: => aten::transpose: 216 [supported] INFO:Neuron: => aten::upsample_bilinear2d: 4 [supported] INFO:Neuron: => aten::view: 52 [supported] Traceback (most recent call last): File "to_neuron.py", line 14, in <module> model_neuron = torch.neuron.trace(model, example_inputs=[image.cuda()]) File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 184, in trace cu.stats_post_compiler(neuron_graph) File "/home/ubuntu/alias/neuron/neuron_env/lib/python3.7/site-packages/torch_neuron/convert.py", line 493, in stats_post_compiler "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```
0
answers
0
votes
2
views
asked 2 hours ago

create-export-task | Filter CloudWatch logs using JMESpath

My objective is to create a mechanism for exporting CloudWatch logs to S3 on a case-by-case basis. Given my logs appear in the following format: ``` { "level": "error", "message": "Oops", "errorCode": "MY_ERROR_CODE_1" } { "level": "info", "message": "All good" } { "level": "info", "message": "Something else" } ``` I'd like the export to **only** include the error logs. Using [create-export-task](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/logs/create-export-task.html), is it possible to use the `query` param to filter the response data given the above log structure? I'm not sure whether the log structure is incorrect for this use or if I have misunderstood the purpose of the query param. My JMESPath attempts so far have been unsuccessful. Some attempts include: ``` aws logs create-export-task \ --log-group-name myGroup \ --log-stream-name-prefix myGroup-test \ --from 1664537580000 \ --to 1664537640000 \ --destination myGroup-archive-ab1 \ --destination-prefix test \ --query '{Message: message, Error: errorCode}' ``` and same command, but with the following query `--query '{Message: .message, Error: .errorCode}'` which produces the following error: *Bad value for --query {Message: .message, Error: .errorCode}: invalid token: Parse error at column 10, token "." (DOT), for expression: "{Message: .message, Error: .errorCode}"*
0
answers
0
votes
6
views
asked 2 hours ago

Trouble with AWS lambda runtime API with docker image

## Short version I am running a lambda function in a docker container, and all executions are marked as failures with a Runtime.ExitError, even though I am using the runtime API and the lambda added as on_success destination is running. ## Longer version, with context I have a setup with a bunch of functions chained using API invocations and destinations. One of them requires a custom runtime (handler is a PHP command), I have been using a docker image for that. In order to get it running correctly, I am getting the request ID in the entrypoint, and in the command, running both my command and a curl to the runtime API, like so: ``` CMD ["/bin/bash", "-c", "/app/bin/my-super-command && curl --silent -X POST \"http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/${REQUEST_ID}/response\" -d 'SUCCESS'"] ``` I know the request id is correct (I am printing it in the entrypoint), and at the end of the logs, I am getting the following lines (edited of course): ``` End of my-super-command {"status":"OK"} END RequestId: 123456-abcd-1234-abcd-12345678910 REPORT RequestId: 123456-abcd-1234-abcd-12345678910 Duration: 39626.80 ms Billed Duration: 39777 ms Memory Size: 384 MB Max Memory Used: 356 MB Init Duration: 149.26 ms RequestId: 123456-abcd-1234-abcd-12345678910 Error: Runtime exited without providing a reason Runtime.ExitError Beginning of the entrypoint ``` The first line is from my command, the second line looks is the output from the curl (it looks like a success, and the API documentation seems to agree with me), but as we can see, the call seems to be marked as failed later. The weird stuff: * The lambda logs a failure even though the Runtime API returns an OK to my call for success * The lambda is marked as failed in the monitoring * The function I put after this one in the workflow, in a destination, with the `on_success` condition, runs ! The problems I have had, and then processed: * I am getting the request id with a combination of grep/sed/trim because there's a \r somewhere, that's not optimal but I am printing it and appears correctly (I have printed the full curl command too, just in case) * I have had issues with timeout/OOM, but as you can see above, it is not the case here. Am I missing something here ? Maybe I did not understand the usage of the runtime API. As you can see the next run seems to be launched but interrupted, so there might be some timing issue.
0
answers
0
votes
8
views
asked 3 hours ago

Recent articles

see all
1/18