是什么原因导致了GCS接收器抛出OutOfMemoryExceptionexception

我正在使用flume写入Google云端存储。 Flume监听HTTP:9000 。我花了一些时间来使其工作（添加gcs库，使用凭证文件…），但现在它似乎通过networking进行通信。

我发送非常小的HTTP请求为我的testing，我有足够的RAM可用：

 curl -X POST -d '[{ "headers" : { timestamp=1417444588182, env=dev, tenant=myTenant, type=myType }, "body" : "some body ONE" }]' localhost:9000

我第一次请求遇到这个内存exception（当然，它停止工作）：

 2014-11-28 16:59:47,748 (hdfs-hdfs_sink-call-runner-0) [INFO - com.google.cloud.hadoop.util.LogUtil.info(LogUtil.java:142)] GHFS version: 1.3.0-hadoop2 2014-11-28 16:59:50,014 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:467)] process failed java.lang.OutOfMemoryError: Java heap space at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79) at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)

（请参阅完整的堆栈跟踪作为全部细节的要点）

奇怪的是文件夹和文件是按照我想要的方式创build的，但是文件是空的。

 gs://my_bucket/dev/myTenant/myType/2014-12-01/14-36-28.1417445234193.json.tmp

configurationflume + GCS的方式有问题吗？还是GCS.jar中的错误？

我应该在哪里检查以收集更多数据？

ps：我在docker里面运行flume-ng。

我的flume.conf文件：

 # Name the components on this agent a1.sources = http a1.sinks = hdfs_sink a1.channels = mem # Describe/configure the source a1.sources.http.type = org.apache.flume.source.http.HTTPSource a1.sources.http.port = 9000 # Describe the sink a1.sinks.hdfs_sink.type = hdfs a1.sinks.hdfs_sink.hdfs.path = gs://my_bucket/%{env}/%{tenant}/%{type}/%Y-%m-%d a1.sinks.hdfs_sink.hdfs.filePrefix = %H-%M-%S a1.sinks.hdfs_sink.hdfs.fileSuffix = .json a1.sinks.hdfs_sink.hdfs.round = true a1.sinks.hdfs_sink.hdfs.roundValue = 10 a1.sinks.hdfs_sink.hdfs.roundUnit = minute # Use a channel which buffers events in memory a1.channels.mem.type = memory a1.channels.mem.capacity = 10000 a1.channels.mem.transactionCapacity = 1000 # Bind the source and sink to the channel a1.sources.http.channels = mem a1.sinks.hdfs_sink.channel = mem

在我的flume / gcs旅程中提出的相关问题：在谷歌云存储上用水槽写入HDFS / GS所需的最小设置是什么？

上传文件时，GCS Hadoop FileSystem实现为每个FSDataOutputStream（文件打开写入）留出一个相当大的（64MB）写入缓冲区。这可以通过在“core-site.xml ”中将“fs.gs.io.buffersize.write”设置为一个较小的值（以字节为单位）来更改。我想1MB就足够小容量的日志收集。

另外，请检查启动JVM for flume时最大堆大小的设置。 flume-ng脚本将默认的JAVA_OPTS值设置为-Xmx20m，以将堆限制为20MB。这可以在flume-env.sh中设置为一个更大的值（有关详细信息，请参阅flume tarball分发中的conf / flume-env.sh.template）。

是什么原因导致了GCS接收器抛出OutOfMemoryExceptionexception

docker使用从registry的以前的构buildcaching构build

在运行时设置docker-compose.yml的端口

多服务node.js Web应用程序后端在一个免费的docker库中

远程访问Docker容器中的Web服务器

Docker，Supervisord和日志logging – 如何整合docker日志中的日志？

当使用docker-machine时，拒绝在postgres容器中访问/ pgdata

docker中的-i标志是做什么的？

如何捆绑安装本地pathgem与docker？

Docker安全问题

有没有计算Docker容器开销的公式？