2 minutes
Use Mule Requester Module to Download Files From SFTP
Recently, we have got a requirement that we need to download multiple files from a remote SFTP server in one go. In Mule 3, the SFTP component can only download one resource one time and you have to poll it continuously. Hence, we need a utility to download all available files once off from server. This is where the Mule Requester Module comes in to help. (I’ll call it MRM in the rest of article)
Per the documentation, MRM provides two operations:
- mulerequester:request requests one reource one time
- mulerequester:request-collection requests a collection of resources.
Even though request-collection seems a perfect solution to our problem, it still has some compatiblity problem with the sftp component in my opinion.
Firstly, it makes sequential call to the resource endpoint and completely relies on the underlying resource endpoint to manage the resource. If the resource endpoint doesn’t delete the requested resource, same resource will requested again and again. I realised this dilemma until my mule application always go into an infinite requesting resource loop.
Secondly, the sftp endpoint actually returns an SFTPInputStream type that is a lazy initialized stream. It downloads the actual file only when you do actual operation on it (e.g transformation) and delete file once it is downloaded if you have set the autoDelete flag to true in configuration. From a performance perspective, this is a good design. However, it has become a problem when we hook MRM with sftp resource. Hence, in MRM, you have to use the attribue: returnClass to invoke a transformation (mule will lookup the most appropriate transformer from registry based on the specified class type).
But here comes the tradeoff. I specified java.lang.String as the return class and MRM did the transformation as well as not entering into dead loop, which is good! However, with this strategy, we lose the lazy initialization and put file content into memory, which can be a risk if later the process failed in the middle. We will lose the file and can not resume the process but will need a heap of manual intervention.
What I did finally is writing a small custom class that initialize the SftpClient class from mule library and count how many eligible files are on SFTP server. Then, use the for each component in Mule to iteratively download all files and put them into a safe place. After the download is complete, we start the process and can determine when delete the files (for example, after it is processed successfully)