Community Help

msarnold · ‎03-14-2024

Hello,

I am trying to use DAPSession.stream_resource to stream resources/files directly into an S3 bucket.

The idea is to pipe the stream directly into S3 without downloading and storing the entire file locally first, and then reuploading it into S3.

However, I am not sure how to use that method. First, I am not sure how to handle the async nature of it, and how/if/when to await it. Second, I am confused by it returning an Iteration of StreamReaders (i.e. potentially several), rather than just one. Since we are passing in one single resource, with that single resource representing one single download URL, shouldn't there be only one single StreamReader, rather than an iteration of them?!

On the S3/AWS side of my code, I want to use the boto3 S3 client's upload_fileobj() method (see https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_fileobj....)

So given a job ID, what would my python code need to look like to get the resources for that job id and streaming them into s3 (self._dap_session is an instance of DAPSession)?

I tried this, but got an error:

    async def stream_job_files_to_s3(self, job_id: str, namespace:str, tablename:str😞
        async with self._dap_session as session:
            objects = await session.get_objects(job_id)
            if objects is not None:
                urls = await session.get_resources(objects)
                for key in urls.keys():
                    rsrc = urls[key]
                    key = key.replace('/', '_')
                    aiter = await session.stream_resource(rsrc)
                    s3_key = f'{self._prefix}/{namespace}/{tablename}/{key.strip("/")}'
                    async for stream in aiter:
                        self._s3_client.upload_fileobj(stream, self._bucket, s3_key)
                    keys.append(s3_key)
        return keys

The error is:

TypeError: object async_generator can't be used in 'await' expression

and it is happening in this line:

aiter = await session.stream_resource(rsrc)

It will also obviously not behave well in case there are more than one Stream returned in the Iterator, but that's the next question once the basic mechanism is solved...

Thank you,

Mark

jwals · ‎03-15-2024

Hi @msarnold,

We don't use DAPSession.stream_resource, but we use the following code to stream weblogs files to S3 after the query job has finished on Instructure's end. This isn't the complete Lambda function (there is some custom setup, etc.) but hopefully it's helpful! Let me know if you have any questions. (The libraries being used are boto3, json, and requests.)

ETA: To be clear, there's no async happening here. One Lambda function hits the API to start the query job, then another Lambda function polls the server until the job is finished, and then the below function downloads the files. Those functions are all orchestrated in a state machine.

    # Get details about the completed job from Instructure
    cj_response = loads(requests.get(
        f"https://api-gateway.instructure.com/dap/job/{event['job_id']}",
        headers={"x-instauth" : event["access_token"]},
    ).text)
    logger.info(f"Received response from Instructure: {cj_response}")

    # Get the list of files for this request
    objs_response = loads(requests.post(
        "https://api-gateway.instructure.com/dap/object/url",
        headers={"x-instauth" : event["access_token"]},
        json=cj_response["objects"],
    ).text)

    # Stream those files to S3
    urls = objs_response["urls"]
    for key in urls.keys():
        logger.info(f"Uploading file {key} to S3")
        with requests.get(urls[key]["url"], stream=True) as stream:
            s3_client.upload_fileobj(stream.raw, secret["S3_BUCKET"], secret["S3_PREFIX"] + key.split("/")[1])

LeventeHunyadi · ‎03-15-2024

Relying on the proprietary header parameter X-InstAuth to pass the authentication token is deprecated, and will no longer be available as an authentication option in a future version of Instructure API Gateway. Instead, use the standard HTTP header parameter Authorization, passing the same token. Authorization is the standard header parameter tools like curl would employ.

jwals · ‎03-15-2024

Thanks very much for flagging that, @LeventeHunyadi. Is there a changelog or something where that was communicated? I appreciate your heroic attention to the forum but I'm also slightly concerned that I would have completely missed this otherwise. I see the change reflected in the API Gateway documentation now, but I wouldn't have revisited that documentation as long as my application continued to work.

msarnold · ‎03-15-2024

Yeah, that's what I had done so far.

I got side-tracked by a dependency conflict issue that I thought was caused by a dependency of requests (rpds). So I was trying to get rid of requests and find some other way of streaming pieces directly into S3 instead of downloading the entire file locally.

I don't really care about sync vs async, because as you said, Lambda is only running short "atomic" steps which are orchestrated outside via Step Functions; but since the dap2 client is doing everything async, that's what you have to deal with...
Although... async might still come in handy if the server decides to split a result for a request into multiple files - using async might result in getting those downloads done concurrently instead of sequentially...

I finally figured out that the dependency issue problem had a completely different root cause; so I'm back to that original code now...

I'm still curious what the correct way is to use that stream_resource() function, though...

Thanks,

Mark

Using DAPSession.stream_resource

cd2 dap

DAP client library

"Malformed HTTP response"

How to Access Page Views using Canvas Data 2 Table...

AWS Harvard Data 1 extract conversion to Data 2

CD1 to CD2 schema mapping document.

Is there a way to translate bash script to Azure w...

"Malformed HTTP response"

Finding Course Pages and Module Pages separately

Assignment Points Possible Zero Other Explanation?

CD1 to CD2 schema documentation Deleted

CD2 dap issues

You're signed out

Using DAPSession.stream_resource

Community Help

View our top guides and resources: