问题描述
我正在尝试从数据库中读取数百万行并写入文本文件.
I'm trying to read millions of rows from a database and write to a text file.
这是我问题的延续 数据库转储到具有副作用的文本文件
我现在的问题似乎是在程序完成之前不会进行日志记录.我没有懒惰处理的另一个指标是在程序完成之前根本不会写入文本文件.
My problem now seems to be that the logging doesn't happen until the program completes. Another indicator that i'm not processing lazily is that the text file isn't written at all until the program finishes.
根据 IRC 提示,我的问题似乎与 :result-set-fn
和 clojure.java 中的默认
代码区域.doall
有关.jdbc/query
Based on an IRC tip it seems my issue is likely having to do with :result-set-fn
and defaulting to doall
in the clojure.java.jdbc/query
area of the code.
我尝试用 for
函数替换它,但仍然发现内存消耗很高,因为它将整个结果集拉入内存.
I have tried to replace this with a for
function but still discover that memory consumption is high as it pulls the entire result set into memory.
我怎么能有一个 :result-set-fn
不像 doall
那样把所有东西都放进去?如何在程序运行时逐步写入日志文件,而不是在 -main
执行完成后转储所有内容?
How can i have a :result-set-fn
that doesn't pull everything in like doall
? How can I progressively write the log file as the program is running, rather then dump everything once the -main
execution is finished?
(let [
db-spec local-postgres
sql "select * from public.f_5500_sf "
log-report-interval 1000
fetch-size 100
field-delim " "
row-delim "
"
db-connection (doto ( j/get-connection db-spec) (.setAutoCommit false))
statement (j/prepare-statement db-connection sql :fetch-size fetch-size )
joiner (fn [v] (str (join field-delim v ) row-delim ) )
start (System/currentTimeMillis)
rate-calc (fn [r] (float (/ r (/ ( - (System/currentTimeMillis) start) 100))))
row-count (atom 0)
result-set-fn (fn [rs] (lazy-seq rs))
lazy-results (rest (j/query db-connection [statement] :as-arrays? true :row-fn joiner :result-set-fn result-set-fn))
]; }}}
(.setAutoCommit db-connection false)
(info "Started dbdump session...")
(with-open [^java.io.Writer wrtr (io/writer "output.txt")]
(info "Running query...")
(doseq [row lazy-results]
(.write wrtr row)
))
(info (format "Completed write with %d rows" @row-count))
)
推荐答案
我通过将 [org.clojure/java.jdbc "0.3.0-beta1"]
在我的 project.clj 依赖项列表中.这个增强/纠正了 :as-arrays?
功能描述 这里.clojure.java.jdbc/query
的真实
I took the recent fixes for clojure.java.jdbc
by putting [org.clojure/java.jdbc "0.3.0-beta1"]
in my project.clj dependencies listing. This one enhances/corrects the :as-arrays? true
functionality of clojure.java.jdbc/query
described here.
我认为这有所帮助,但是我可能仍然能够将 :result-set-fn
覆盖为 vec
.
I think this helped somewhat however I may still have been able to override the :result-set-fn
to vec
.
通过将所有行逻辑放入 :row-fn
解决了核心问题.最初的 OutOfMemory 问题与遍历 j/query
结果集有关,而不是定义特定的 :row-fn
.
The core issue was resolved by tucking all row logic into :row-fn
. The initial OutOfMemory problems had to do with iterating through j/query
result sets rather than defining the specific :row-fn
.
新的(工作)代码如下:
New (working) code is below:
(defn -main []
(let [; {{{
db-spec local-postgres
source-sql "select * from public.f_5500 "
log-report-interval 1000
fetch-size 1000
row-count (atom 0)
field-delim "u0001" ; unlikely to be in source feed,
; although i should still check in
; replace-newline below (for when " "
; is used especially)
row-delim "
" ; unless fixed-width, target doesn't
; support non-printable chars for recDelim like
db-connection (doto ( j/get-connection db-spec) (.setAutoCommit false))
statement (j/prepare-statement db-connection source-sql :fetch-size fetch-size :concurrency :read-only)
start (System/currentTimeMillis)
rate-calc (fn [r] (float (/ r (/ ( - (System/currentTimeMillis) start) 100))))
replace-newline (fn [s] (if (string? s) (clojure.string/replace s #"
" " ") s))
row-fn (fn [v]
(swap! row-count inc)
(when (zero? (mod @row-count log-report-interval))
(info (format "wrote %d rows" @row-count))
(info (format " rows/s %.2f" (rate-calc @row-count)))
(info (format " Percent Mem used %s " (memory-percent-used))))
(str (join field-delim (doall (map #(replace-newline %) v))) row-delim ))
]; }}}
(info "Started database table dump session...")
(with-open [^java.io.Writer wrtr (io/writer "./sql/output.txt")]
(j/query db-connection [statement] :as-arrays? true :row-fn
#(.write wrtr (row-fn %))))
(info (format " Completed with %d rows" @row-count))
(info (format " Completed in %s seconds" (float (/ (- (System/currentTimeMillis) start) 1000))))
(info (format " Average rows/s %.2f" (rate-calc @row-count)))
nil)
)
我尝试的其他事情(成功有限)包括音色记录和关闭标准输出;我想知道使用 REPL 是否可以在显示回我的编辑器(vim 壁炉)之前缓存结果,我不确定这是否占用了大量内存.
Other things i experimented (with limited success) involved the timbre logging and turning off stardard out; i wondered if with using a REPL it might cache the results before displaying back to my editor (vim fireplace) and i wasn't sure if that was utilizing a lot of the memory.
此外,我使用 (.freeMemory (java.lang.Runtime/getRuntime))
添加了有关内存空闲的日志记录部分.我对 VisualVM 不太熟悉,无法准确指出我的问题出在哪里.
Also, I added the logging parts around memory free with (.freeMemory (java.lang.Runtime/getRuntime))
. I wasn't as familiar with VisualVM and pinpointing exactly where my issue was.
我对它现在的工作方式很满意,感谢大家的帮助.
I am happy with how it works now, thanks everyone for your help.
这篇关于clojure.java.jdbc/查询大结果集懒惰的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论