- **Factoring**
(*https://www.mersenneforum.org/forumdisplay.php?f=19*)

- - **akruppa - B2 help please**
(*https://www.mersenneforum.org/showthread.php?t=9716*)

akruppa - B2 help pleaseI'm tracking ECM effort in the v5 server. I compute total_ecm_effort as sum (curve_count * B1). When this total gets to certain levels I issue ECM assignments with a larger B1 value.
The problem? The summed B1 values assume a B2 value of 100*B1. When I use GMP-ECM B2 is often much higher than 100*B1. I need a simple formula that given B1 and B2 it returns a multiplier. I apply this multiplier to the curve count to get a rough equivalent of the number of B2=B1*100 curves. Accuracy is not critical - a 10-20% error would be fine. For example, on a number I'm currently atttacking B2 = 7000 * B1. According to GMP-ECM one of these curves equals 2.3 curves of B2 = 100 * B1. My placeholder routine on the server is this: function normalized_B1 ($B1, $B2) { $ratio = (1.0 * $B2) / $B1; if ($ratio < 7) return ($B1 * 0.4); if ($ratio < 30) return ($B1 * 0.6); if ($ratio < 70) return ($B1 * 0.8); if ($ratio < 300) return ($B1); if ($ratio < 700) return ($B1 * 1.4); if ($ratio < 3000) return ($B1 * 1.7); if ($ratio < 7000) return ($B1 * 2.3); return ($B1 * 2.8); } |

I get these figures for the expected number of curves, all without Brent-Suyama extension:
B1=1e6 B2: Curves to find p35: Ratio: 1e6 14317 0.12 1e7 3388 0.49 1e8 1674 1 1e9 993 1.68 1e10 639 2.62 1e11 430 3.89 1e12 297 5.62 B1=1e7 B2: Curves to find p45: Ratio: 1e7 94363 0.12 1e8 22594 0.5 1e9 11286 1 1e10 6779 1.66 1e11 4421 2.55 1e12 3015 3.74 1e13 2113 5.34 B1=1e8 B2: Curves to find p55: Ratio: 1e8 410456 0.12 1e9 101581 0.51 1e10 51549 1 1e11 31434 1.64 1e12 20823 2.47 1e13 14436 3.57 1e14 10291 5.00 The closest function I got to these ratios is 0.12 + 0.88 * (log_10(B2 / B1) / 2) ^ 1.5, which produces for log_10(B2 / B1) = 0 ... 6 0: 0.1200000000000000000000000000 1: 0.4311269837220809107363715193 2: 1.000000000000000000000000000 3: 1.736663230236897544810207489 4: 2.609015869776647285890972155 5: 3.598505426185217265198782899 6: 4.692614131981836054912458342 The values for 1 and 6 are somewhat too low, but such B2/B1 ratios will probably not be used much. For 3 it's somewhat too high. If you want really accurate estimates, you could call Pari/GP as an external program and let the rho.gp script compute probabilities. Or I'll write a small wrapper around the C code in GMP-ECM. Alex |

Thanks, Alex. Your formula was exactly what I needed. Like I said too much accuracy in curve counting is not critical as the only consequence is you move on to a larger B1 a liitle too early or a little too late - no big deal.
Another ECM puzzle for you: I want to hand out ECM assignments such that I roughly maximize the chance of finding a factor for a given effort. What formula would you use given the quantities total_ecm_effort (the sum of completed B1 values) and N (the Mersenne exponent). Ignore the step function nature of FFT lengths and assume the time it takes to run an ECM curve is proportional to N (log N). A simpler way to phrase this is: If an exponent has had a total_ecm_effort X, what is the probability the next ECM curve will find a factor (to make life easier, ignore the step nature of assigned B1 values). Given such a formula I can then use the expected CPU run time to maximize factors found. |

This isn't an easy question. You'd want to compute the distribution of remaining factors given an a priori distribution (i.e. 1/n probability of n-bit factor), adjust by taking previous factoring effort into account and choose the ECM parameters so that the scalar product of that distribution and ECM's probability of finding factors of n bits, divided by the time ECM with those parameters would use is maximal. This is a bit messy.
This is what Silverman and Wagstaff described in the "Practical Analysis" paper (which I have to admit still haven't read as carefully as I should! :alex:). As I understood it, the scheme of doing the expected number of curves for a certain factor size, then moving on to a size 5 digits larger is a good approximation to exactly this goal: maximising each curve's probability of finding a factor per unit time. If you want more precise parameter choice, all I can offer is code to compute ECM's proability of success for given parameters/factor sizes, so you can model the Bayesian process. Alex |

All times are UTC. The time now is 00:53. |

Powered by vBulletin® Version 3.8.11

Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.